CBO: A Configuration Roadmap
Christian Antognini Trivadis AG Hotsos Symposium 2005 6-10 March, Dallas (USA)
Who am I >
Senior consultant and trainer at Trivadis AG in Zurich, Switzerland » christian.antognini@trivadis.com » www.trivadis.com
>
>
» Logical and physical database design » Application performance management » Integration with J2EE
Christian Antognini Be in the know.
CBO: A Configuration Roadmap
Focus: get the most out of Oracle
>
10 years experience with Oracle
2
© 2004
Disclaimer >
The CBO changes from release to release
>
It’s difficult to give general advice about many releases
>
Therefore, this presentation focuses on the most important versions at present » Oracle9i Release 2 » Oracle 10g Release 1
>
When information applies to only one version, one of the following icons will be used
CBO: A Configuration Roadmap
3
© 2004
Correctly configuring the CBO (1) >
At least since Oracle9i the CBO works well, i.e. it generates good execution plans for most SQL statements
>
This is only true when
>
» It is correctly configured » The database has been designed to take advantage of all its features
CBO: A Configuration Roadmap
Correctly configuring the CBO is not an easy job
This is no good reason for not doing it however!
4
© 2004
Correctly configuring the CBO (2) >
There is no single configuration that is good for every system » Each application has its own requirements » Each system has its own characteristics
>
Therefore, I’m not able to provide you with the “magic configuration”…
>
I can only show you how I proceed to do such a configuration
CBO: A Configuration Roadmap
5
© 2004
Configuration = INIT.ORA parameters + statistics >
Statistics
>
» Object statistics » System statistics >
>
» » » » »
Basic » » » » »
OPTIMIZER_MODE OPTIMIZER_FEATURES_ENABLE OPTIMIZER_DYNAMIC_SAMPLING OPTIMIZER_INDEX_COST_ADJ OPTIMIZER_INDEX_CACHING
>
» QUERY_REWRITE_ENABLED » QUERY_REWRITE_INTEGRITY » STAR_TRANSFORMATION_ENABLED 6
WORKAREA_SIZE_POLICY PGA_AGGREGATE_TARGET HASH_AREA_SIZE SORT_AREA_SIZE BITMAP_MERGE_AREA_SIZE
Miscellaneous » » » » »
Query transformation
CBO: A Configuration Roadmap
Memory
DB_FILE_MULTIBLOCK_READ_COUNT CURSOR_SHARING SKIP_UNUSABLE_INDEXES PARALLEL_* Many undocumented parameters…
© 2004
Set the right parameter! (1) >
Each INIT.ORA parameter has been introduced for a specific reason!
>
Understanding how it changes the behavior of the CBO is essential
>
Don’t tweak the configuration randomly, instead » Understand the current situation » Define the goal to be achieved » Find out which parameter should be changed to achieve the goal
CBO: A Configuration Roadmap
7
© 2004
Choose the right tool!
CBO: A Configuration Roadmap
8
Š 2004
Set the right parameter! (2)
DE X
_C
AC
HI NG
NA M IC _S AM PL
OP TI M
IZ
ER _M
OD E
QUERY_REWRITE_ENABLED
CBO: A Configuration Roadmap
AD _C OU NT
statist ics
9
M TI P O
ER IZ
_
S RE U AT E F
D LE B NA _E
ET RG TA E_ ICY AT OL EG _P GR IZE _S AG A E SIZE A_ R EA_ KA PG _AR OR RGE W ME AP_ BITM
IN G
system
_R E
HASH_AR EA_SIZE
OP TI M IZ ER _D Y
M UL TI BL OC K
obj ect stat istic s
R_ IN
DJ COST_A _INDEX_
OP TIM IZE
_F ILE _
SO RT _A RE A_ SIZ E
ER OPTIMIZ
DB
© 2004
Configuration roadmap
gather system and object statistics
Once set to their optimal value they should not be modified further; the only exception is when tweaking the index related parameters does not achieve good results!
OPTIMIZER_MODE OPTIMIZER_FEATURES_ENABLE DB_FILE_MULTIBLOCK_READ_COUNT OPTIMIZER_DYNAMIC_SAMPLING WORKAREA_SIZE_POLICY PGA_AGGREGATE_TARGET
auto
manual
HASH_AREA_SIZE SORT_AREA_SIZE BITMAP_MERGE_AREA_SIZE
test the application no
most of the execution plans are good
OPTIMIZER_INDEX_COST_ADJ OPTIMIZER_INDEX_CACHING
yes
CBO: A Configuration Roadmap
gather missing histograms
10
Š 2004
Object statistics >
The object statistics describe the data stored in the database » Table statistics: number of rows, number of blocks below the high water mark and average row length » Column statistics: number of distinct values, number of NULL values and data distribution (a.k.a. histograms) » Index statistics: number of distinct keys, high, number of leaf blocks and clustering factor
>
The CBO can generate good execution plans only if the statistics are good as well, i.e. they must reflect existing data
>
Missing, not up-to-date, or wrong statistics can lead to poor execution plans
>
Object statistics are automatically gathered CBO: A Configuration Roadmap
11
© 2004
Gathering object statistics (1) >
Gather statistics for tables, columns and indexes at the same time » DBMS_STATS.GATHER_SCHEMA_STATS » DBMS_STATS.GATHER_TABLE_STATS
>
Use small estimate percentage (<10%)
» Oracle increases the specified value if necessary ☺ » For very large tables reduce it to 1% or less
>
Activate table monitoring to refresh the statistics when data is changed (by default a threshold of 10% is used) » DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING » STATISTICS_LEVEL = TYPICAL (default)
>
For partitioned tables gather statistics at all levels
>
For large tables gather statistics in parallel CBO: A Configuration Roadmap
12
© 2004
Gathering object statistics (2) >
Histograms are essential, gather them for all columns containing skewed data that are referenced in WHERE clauses » Even for non-indexed columns! » For simplicity use SIZE SKEWONLY, if it takes too much time try SIZE AUTO; if it is still too slow or the chosen number of buckets it not good, manually specify the list of columns » To get the most out of them, bind variables must not be used
>
The following command is a good starting point dbms_stats.gather_schema_stats( ownname => user, estimate_percent => 5, cascade => TRUE, method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY', options => 'GATHER STALE' ); CBO: A Configuration Roadmap
13
© 2004
Gathering object statistics (3) >
All procedures that gather optimizer statistics no longer have hardcoded default values, instead, they are stored in the data dictionary
» You can get and set them with the DBMS_STATS procedures GET_PARAM and SET_PARAM SQL> SELECT sname, nvl(to_char(sval1),spare4) value 2 FROM sys.optstat_hist_control$; SNAME -------------------CASCADE ESTIMATE_PERCENT DEGREE METHOD_OPT NO_INVALIDATE GRANULARITY
>
VALUE -----------------------------DBMS_STATS.AUTO_CASCADE DBMS_STATS.AUTO_SAMPLE_SIZE NULL FOR ALL COLUMNS SIZE AUTO DBMS_STATS.AUTO_INVALIDATE AUTO
Check the definition of the scheduler window group MAINTENANCE_WINDOW_GROUP CBO: A Configuration Roadmap
14
© 2004
System statistics (1) >
System statistics are essential for a successful configuration
>
The CBO used to base its estimations on the number of I/O needed to execute a SQL statement (a.k.a. “I/O cost model”)
>
With system statistics a new model, called “CPU cost model”, is enabled
>
They give information about the performance of the system where Oracle runs » Performance of the I/O subsystem » Average I/O size for multi-block read operations » Performance of the CPU
>
To use the “I/O cost model”, when system statistics have been gathered, specify the hint NO_CPU_COSTING CBO: A Configuration Roadmap
15
© 2004
System statistics (2) >
There are two kinds of system statistics
>
Without workload » Not really useful… » They are automatically gathered at startup if they don’t exist dbms_stats.gather_system_stats('noworkload');
>
With workload » They should be gathered when the system is charged with a common workload » The instance is “monitored” for a period of time (interval) dbms_stats.gather_system_stats( gathering_mode => 'interval', interval => 30 Statistics are gathered ); over the next 30 minutes CBO: A Configuration Roadmap
16
© 2004
System statistics (3) SQL> 2 3 4
SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';
PNAME PVAL1 ------------ ---------CPUSPEEDNW 886.233 IOSEEKTIM 10 IOTFRSPEED 4096 CPUSPEED 928 SREADTIM 8.138 MREADTIM 16.799 MBRC 9 MAXTHR 162238464 SLAVETHR 277504
CBO: A Configuration Roadmap
>
Without workload » CPUSPEEDNW: CPU speed in millions of operations per second » IOSEEKTIM: I/O seek time [ms] » IOFRSPEED: I/O transfer speed [B/s]
>
With workload » CPUSPEED: CPU speed in millions of operations per second » SREADTIM: single-block read time [ms] » MREADTIM: multi-block read time [ms] » MBRC: average number of blocks read during a multi-block read operation » MAXTHR: max. throughput [MB/s] » SLAVETHR: max. slave throughput [MB/s]
17
© 2004
OPTIMIZER_MODE >
This is the most important parameter, unfortunately, too often it is not set, i.e. its default value is used » Default value is CHOOSE
- ALL_ROWS is used if object statistics are available, otherwise RULE - In some situations ALL_ROWS has to be used even if no statistics are available (e.g. for partitioned tables)
» Default value is ALL_ROWS >
If fast delivery of the last row is important, ALL_ROWS should be used » Typical example: reporting systems and data warehouses
>
If fast delivery of the first row(s) is important, FIRST_ROWS_n should be used » Where “n” is [ 1 | 10 | 100 | 1000 ] row(s) » Typical example: OLTP systems CBO: A Configuration Roadmap
18
© 2004
OPTIMIZER_FEATURES_ENABLE >
Each database version introduces new features in the optimizer
>
OPTIMIZER_FEATURES_ENABLE can be used to set which “version” of the optimizer should be used » Valid values are database versions (e.g. 8.1.7, 9.0.1, 9.2.0, …) » A complete list of the features which are enabled/disabled by each version is available in the “Oracle Database Reference” manual » The default value is the current database version » Not all new features are enabled/disabled by this parameter, this means that even if you set it to 8.1.7 in, for example, Oracle9i you won’t get the 8.1.7 optimizer!
>
It could be useful to set it when an application is upgraded to a new database version, otherwise leave it at the default value CBO: A Configuration Roadmap
19
© 2004
DB_FILE_MULTIBLOCK_READ_COUNT (1) >
DB_FILE_MULTIBLOCK_READ_COUNT specifies the maximum number of blocks that Oracle reads during a multi-block operation (e.g. full table scan)
>
Three common situations lead to multi-block reads that are smaller than the specified value » Oracle reads headers with single-block reads » Oracle never does an I/O that spans more extents » Oracle never reads a block that is already in the buffer cache
>
Multi-block reads are a performance feature, therefore, the parameter should be set to achieve the best performance » Higher values do not provide better performance in all cases! » It makes no sense to exceed the maximum physical I/O size CBO: A Configuration Roadmap
20
© 2004
DB_FILE_MULTIBLOCK_READ_COUNT (2) >
When system statistics with workload are not used this parameter has a direct impact on the full table scan costs, therefore, too high values lead to excessive full scans » Without system statistics
Blocks I / O Cost FTS ≈ 1.6765 ⋅ DFMBRC 0.6581 » System statistics without workload
Blocks I / O Cost FTS ≈ 1 + A ⋅ ln (DFMBRC) + B ⋅ ln( DFMBRC ) 2 + ... >
When system statistics with workload are used, MBRC in SYS.AUX_STATS$ is used instead of DFMBRC to compute the cost of full table scans CBO: A Configuration Roadmap
21
© 2004
DB_FILE_MULTIBLOCK_READ_COUNT (3) A simple full table scan with different values gives useful information about the impact of this parameter and, therefore, assists in finding the “best” value 100 100 80 Gain in %
>
60 40 20 0 -20 -20 1 1
5 5
9 9
CBO: A Configuration Roadmap
13 21 17 13 17 21 DFMBRC // MBRC DFMBRC MBRC 22
25 25 25
29 29 29
I/O cost cost without without I/O system statistics system I/O cost statistics without I/Ocost coststatistics with def. system I/O without I/O cost with def. def. I/O coststatistics with system system statistics system statistics I/O cost without system statistics I/O cost def. System with 1 system statistics system statistics System 1 def. (max. 55 I/O cost System with 1MB/s) (max. 55 MB/s) system statistics (max. 55 MB/s) System 2 System 1 (max. 55 402MB/s) MB/s) System (max. System (max. 4023MB/s) System (max. 40 MB/s) (max. 58 MB/s) System 3 System (max. 584MB/s) (max. 230 MB/s) © 2004
OPTIMIZER_DYNAMIC_SAMPLING (1) >
The CBO used to base its estimations on statistics found in the data dictionary only
>
With dynamic sampling some statistics can be gathered during the parse phase as well, this means that some queries are executed against the referenced objects to gather additional information
>
The statistics gathered by dynamic sampling are not stored in the data dictionary, they are simply shared at cursor level
>
The value of the parameter OPTIMIZER_DYNAMIC_SAMPLING specifies how and when dynamic sampling is used Âť Range: from 0 to 10 (0=dynamic sampling disabled) CBO: A Configuration Roadmap
23
Š 2004
OPTIMIZER_DYNAMIC_SAMPLING (2) # Blocks Param Hint
Level On which table?
1 2 3 4 5 6 7 8 9 10
All non-analyzed tables, if at least one table » Is part of a join, subquery or non-mergeable view » Has no index » Has more blocks than the number of blocks used for the sampling All non-analyzed tables All tables which fulfill the level-2 criterion plus all tables for which a standard selectivity estimation is used
All tables which fulfill level-3 criteria plus all tables which have single table predicates referencing more than two attributes
CBO: A Configuration Roadmap
24
32
32 64
64
128 256 1024 4096 All © 2004
128 256 512 1024 2048 4096 8192 All
OPTIMIZER_DYNAMIC_SAMPLING (3) >
Default values » If OPTIMIZER_FEATURES_ENABLE ≥ 10.0.0: 2 » If OPTIMIZER_FEATURES_ENABLE = 9.2.0: 1 » If OPTIMIZER_FEATURES_ENABLE ≤ 9.0.1: 0
>
Level 1 and 2 are not very useful, in fact the tables should be analyzed on a regular basis! » An exception is when temporary tables are used, in fact, usually, no statistics are available for them
>
Level 3 and up are useful for improving selectivity estimations of predicates » If the CBO is not able to do correct estimations start by setting it to 4 » It is possible to do such corrections with SQL profiles as well CBO: A Configuration Roadmap
25
© 2004
PGA management >
It is possible to choose between two methods to manage the PGA » Manual: the DBA has full control over the size of the PGA » Automatic: the DBA delegates the management of the PGA to Oracle
>
Except in the following situations it is recommended to use automatic PGA management » Very large PGA needed (>100MB, see Metalink note 147806.1) - Except if you want to set _PGA_MAX_SIZE
» Fine tuning is needed » Shared server (former MTS) is used >
Usually a larger PGA makes merge/hash joins and sort operations faster, therefore, you should devote the “unused” memory that is available on the system to it
>
The PGA size has an influence on the costs computed by the CBO CBO: A Configuration Roadmap
26
© 2004
Manual PGA management >
To enable manual PGA management the INIT.ORA parameter WORKAREA_SIZE_POLICY must be set to MANUAL
>
Then the following INIT.ORA parameters should be set » HASH_AREA_SIZE » SORT_AREA_SIZE » BITMAP_MERGE_AREA_SIZE
>
It is practically impossible to give advice about their value, anyway, here some general rules
» Usually a minimal value of 512KB to 1MB should be used » For small PGAs, to take advantage of hash join, the HASH_AREA_SIZE should be at least 3-4 times the SORT_AREA_SIZE
>
Each parameter specifies the maximum amount of memory that can be used by each server process CBO: A Configuration Roadmap
27
© 2004
Automatic PGA management >
To enable auto PGA management the INIT.ORA parameter WORKAREA_SIZE_POLICY must be set to AUTO
>
Contrary to manual PGA management, the INIT.ORA parameter PGA_AGGREGATE_TARGET specifies the total amount of memory use for all PGAs, i.e. not for the single server processes
>
In some situations the PGA_AGGREGATE_TARGET could also be exceeded » It’s a target, not a maximum value! » This usually happens when a value which is too small is specified
CBO: A Configuration Roadmap
28
© 2004
OPTIMIZER_INDEX_COST_ADJ >
This parameter is used to change the cost of table accesses through index scans
>
Range of values 1..10000 (default 100)
» Values greater than 100 make index scans more expensive » Values lower than 100 make index scans less expensive
>
>
>
Correction for an index range scan
OICA I / O Cost ≈ (BLevel + (LeafBlocks + ClustFactor ) ⋅ Selectivity ) ⋅ 100 Correction for an index unique scan
OICA I / O Cost ≈ (BLevel + 1) ⋅ 100
This parameter flattens costs and makes the clustering factor much less significant, therefore, small values should be carefully specified! With system statistics the default value is usually good CBO: A Configuration Roadmap
29
© 2004
OPTIMIZER_INDEX_CACHING >
This parameter is used to specify the expected amount (in percent) of index blocks cached in the buffer cache during a nested loop join » This parameter has no influence on index range scans!
>
Range of values 0..100 (default 0)
» Values greater than 0 make nested loops less expensive
>
Correction for an index range scan
>
Correction for index unique scan
>
With system statistics the default value is usually good
⎛ OIC ⎞ I / O Cost ≈ (BLevel + LeafBlocks ⋅ Selectivity ) ⋅ ⎜1 − ⎟ ⎝ 100 ⎠ + ClustFactor ⋅ Selectivity ⎛ OIC ⎞ I / O Cost ≈ BLevel ⋅ ⎜1 − ⎟ +1 ⎝ 100 ⎠
CBO: A Configuration Roadmap
30
© 2004
Core messagesâ&#x20AC;Ś >
The CBO works well if it is correctly configured!
>
Correctly configuring the CBO is not an easy job
This is no good reason for not doing it!
> >
from Trivadis At the core it's all about data.
CBO: A Configuration Roadmap
Understanding how each parameter changes the behavior of the CBO is essential
31
Š 2004
Thank you for your attention
Questions?