Antognini - CBO Configuration Roadmap by rocker12

CBO: A Configuration Roadmap

Christian Antognini Trivadis AG Hotsos Symposium 2005 6-10 March, Dallas (USA)

Who am I >

Senior consultant and trainer at Trivadis AG in Zurich, Switzerland » christian.antognini@trivadis.com » www.trivadis.com

» Logical and physical database design » Application performance management » Integration with J2EE

Christian Antognini Be in the know.

CBO: A Configuration Roadmap

Focus: get the most out of Oracle

10 years experience with Oracle

Disclaimer >

The CBO changes from release to release

It’s difficult to give general advice about many releases

Therefore, this presentation focuses on the most important versions at present » Oracle9i Release 2 » Oracle 10g Release 1

When information applies to only one version, one of the following icons will be used

CBO: A Configuration Roadmap

Correctly configuring the CBO (1) >

At least since Oracle9i the CBO works well, i.e. it generates good execution plans for most SQL statements

This is only true when

» It is correctly configured » The database has been designed to take advantage of all its features

CBO: A Configuration Roadmap

Correctly configuring the CBO is not an easy job

This is no good reason for not doing it however!

Correctly configuring the CBO (2) >

There is no single configuration that is good for every system » Each application has its own requirements » Each system has its own characteristics

Therefore, I’m not able to provide you with the “magic configuration”…

I can only show you how I proceed to do such a configuration

CBO: A Configuration Roadmap

Configuration = INIT.ORA parameters + statistics >

Statistics

» Object statistics » System statistics >

» » » » »

Basic » » » » »

OPTIMIZER_MODE OPTIMIZER_FEATURES_ENABLE OPTIMIZER_DYNAMIC_SAMPLING OPTIMIZER_INDEX_COST_ADJ OPTIMIZER_INDEX_CACHING

» QUERY_REWRITE_ENABLED » QUERY_REWRITE_INTEGRITY » STAR_TRANSFORMATION_ENABLED 6

WORKAREA_SIZE_POLICY PGA_AGGREGATE_TARGET HASH_AREA_SIZE SORT_AREA_SIZE BITMAP_MERGE_AREA_SIZE

Miscellaneous » » » » »

Query transformation

CBO: A Configuration Roadmap

Memory

DB_FILE_MULTIBLOCK_READ_COUNT CURSOR_SHARING SKIP_UNUSABLE_INDEXES PARALLEL_* Many undocumented parameters…

Set the right parameter! (1) >

Each INIT.ORA parameter has been introduced for a specific reason!

Understanding how it changes the behavior of the CBO is essential

Don’t tweak the configuration randomly, instead » Understand the current situation » Define the goal to be achieved » Find out which parameter should be changed to achieve the goal

CBO: A Configuration Roadmap

Choose the right tool!

CBO: A Configuration Roadmap

ÂŠ 2004

Set the right parameter! (2)

DE X

HI NG

NA M IC _S AM PL

OP TI M

ER _M

OD E

QUERY_REWRITE_ENABLED

CBO: A Configuration Roadmap

AD _C OU NT

statist ics

M TI P O

ER IZ

S RE U AT E F

D LE B NA _E

ET RG TA E_ ICY AT OL EG _P GR IZE _S AG A E SIZE A_ R EA_ KA PG _AR OR RGE W ME AP_ BITM

IN G

system

_R E

HASH_AR EA_SIZE

OP TI M IZ ER _D Y

M UL TI BL OC K

obj ect stat istic s

R_ IN

DJ COST_A _INDEX_

OP TIM IZE

_F ILE _

SO RT _A RE A_ SIZ E

ER OPTIMIZ

Configuration roadmap

gather system and object statistics

Once set to their optimal value they should not be modified further; the only exception is when tweaking the index related parameters does not achieve good results!

OPTIMIZER_MODE OPTIMIZER_FEATURES_ENABLE DB_FILE_MULTIBLOCK_READ_COUNT OPTIMIZER_DYNAMIC_SAMPLING WORKAREA_SIZE_POLICY PGA_AGGREGATE_TARGET

auto

manual

HASH_AREA_SIZE SORT_AREA_SIZE BITMAP_MERGE_AREA_SIZE

test the application no

most of the execution plans are good

OPTIMIZER_INDEX_COST_ADJ OPTIMIZER_INDEX_CACHING

yes

CBO: A Configuration Roadmap

gather missing histograms

ÂŠ 2004

Object statistics >

The object statistics describe the data stored in the database » Table statistics: number of rows, number of blocks below the high water mark and average row length » Column statistics: number of distinct values, number of NULL values and data distribution (a.k.a. histograms) » Index statistics: number of distinct keys, high, number of leaf blocks and clustering factor

The CBO can generate good execution plans only if the statistics are good as well, i.e. they must reflect existing data

Missing, not up-to-date, or wrong statistics can lead to poor execution plans

Object statistics are automatically gathered CBO: A Configuration Roadmap

Gathering object statistics (1) >

Gather statistics for tables, columns and indexes at the same time » DBMS_STATS.GATHER_SCHEMA_STATS » DBMS_STATS.GATHER_TABLE_STATS

Use small estimate percentage (<10%)

» Oracle increases the specified value if necessary ☺ » For very large tables reduce it to 1% or less

Activate table monitoring to refresh the statistics when data is changed (by default a threshold of 10% is used) » DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING » STATISTICS_LEVEL = TYPICAL (default)

For partitioned tables gather statistics at all levels

For large tables gather statistics in parallel CBO: A Configuration Roadmap

Gathering object statistics (2) >

Histograms are essential, gather them for all columns containing skewed data that are referenced in WHERE clauses » Even for non-indexed columns! » For simplicity use SIZE SKEWONLY, if it takes too much time try SIZE AUTO; if it is still too slow or the chosen number of buckets it not good, manually specify the list of columns » To get the most out of them, bind variables must not be used

The following command is a good starting point dbms_stats.gather_schema_stats( ownname => user, estimate_percent => 5, cascade => TRUE, method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY', options => 'GATHER STALE' ); CBO: A Configuration Roadmap

Gathering object statistics (3) >

All procedures that gather optimizer statistics no longer have hardcoded default values, instead, they are stored in the data dictionary

» You can get and set them with the DBMS_STATS procedures GET_PARAM and SET_PARAM SQL> SELECT sname, nvl(to_char(sval1),spare4) value 2 FROM sys.optstat_hist_control$; SNAME -------------------CASCADE ESTIMATE_PERCENT DEGREE METHOD_OPT NO_INVALIDATE GRANULARITY

VALUE -----------------------------DBMS_STATS.AUTO_CASCADE DBMS_STATS.AUTO_SAMPLE_SIZE NULL FOR ALL COLUMNS SIZE AUTO DBMS_STATS.AUTO_INVALIDATE AUTO

Check the definition of the scheduler window group MAINTENANCE_WINDOW_GROUP CBO: A Configuration Roadmap

System statistics (1) >

System statistics are essential for a successful configuration

The CBO used to base its estimations on the number of I/O needed to execute a SQL statement (a.k.a. “I/O cost model”)

With system statistics a new model, called “CPU cost model”, is enabled

They give information about the performance of the system where Oracle runs » Performance of the I/O subsystem » Average I/O size for multi-block read operations » Performance of the CPU

To use the “I/O cost model”, when system statistics have been gathered, specify the hint NO_CPU_COSTING CBO: A Configuration Roadmap

System statistics (2) >

There are two kinds of system statistics

Without workload » Not really useful… » They are automatically gathered at startup if they don’t exist dbms_stats.gather_system_stats('noworkload');

With workload » They should be gathered when the system is charged with a common workload » The instance is “monitored” for a period of time (interval) dbms_stats.gather_system_stats( gathering_mode => 'interval', interval => 30 Statistics are gathered ); over the next 30 minutes CBO: A Configuration Roadmap

System statistics (3) SQL> 2 3 4

SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';

PNAME PVAL1 ------------ ---------CPUSPEEDNW 886.233 IOSEEKTIM 10 IOTFRSPEED 4096 CPUSPEED 928 SREADTIM 8.138 MREADTIM 16.799 MBRC 9 MAXTHR 162238464 SLAVETHR 277504

CBO: A Configuration Roadmap

Without workload » CPUSPEEDNW: CPU speed in millions of operations per second » IOSEEKTIM: I/O seek time [ms] » IOFRSPEED: I/O transfer speed [B/s]

With workload » CPUSPEED: CPU speed in millions of operations per second » SREADTIM: single-block read time [ms] » MREADTIM: multi-block read time [ms] » MBRC: average number of blocks read during a multi-block read operation » MAXTHR: max. throughput [MB/s] » SLAVETHR: max. slave throughput [MB/s]

OPTIMIZER_MODE >

This is the most important parameter, unfortunately, too often it is not set, i.e. its default value is used » Default value is CHOOSE

- ALL_ROWS is used if object statistics are available, otherwise RULE - In some situations ALL_ROWS has to be used even if no statistics are available (e.g. for partitioned tables)

» Default value is ALL_ROWS >

If fast delivery of the last row is important, ALL_ROWS should be used » Typical example: reporting systems and data warehouses

If fast delivery of the first row(s) is important, FIRST_ROWS_n should be used » Where “n” is [ 1 | 10 | 100 | 1000 ] row(s) » Typical example: OLTP systems CBO: A Configuration Roadmap

OPTIMIZER_FEATURES_ENABLE >

Each database version introduces new features in the optimizer

OPTIMIZER_FEATURES_ENABLE can be used to set which “version” of the optimizer should be used » Valid values are database versions (e.g. 8.1.7, 9.0.1, 9.2.0, …) » A complete list of the features which are enabled/disabled by each version is available in the “Oracle Database Reference” manual » The default value is the current database version » Not all new features are enabled/disabled by this parameter, this means that even if you set it to 8.1.7 in, for example, Oracle9i you won’t get the 8.1.7 optimizer!

It could be useful to set it when an application is upgraded to a new database version, otherwise leave it at the default value CBO: A Configuration Roadmap

DB_FILE_MULTIBLOCK_READ_COUNT (1) >

DB_FILE_MULTIBLOCK_READ_COUNT specifies the maximum number of blocks that Oracle reads during a multi-block operation (e.g. full table scan)

Three common situations lead to multi-block reads that are smaller than the specified value » Oracle reads headers with single-block reads » Oracle never does an I/O that spans more extents » Oracle never reads a block that is already in the buffer cache

Multi-block reads are a performance feature, therefore, the parameter should be set to achieve the best performance » Higher values do not provide better performance in all cases! » It makes no sense to exceed the maximum physical I/O size CBO: A Configuration Roadmap

DB_FILE_MULTIBLOCK_READ_COUNT (2) >

When system statistics with workload are not used this parameter has a direct impact on the full table scan costs, therefore, too high values lead to excessive full scans » Without system statistics

Blocks I / O Cost FTS ≈ 1.6765 ⋅ DFMBRC 0.6581 » System statistics without workload

Blocks I / O Cost FTS ≈ 1 + A ⋅ ln (DFMBRC) + B ⋅ ln( DFMBRC ) 2 + ... >

When system statistics with workload are used, MBRC in SYS.AUX_STATS$ is used instead of DFMBRC to compute the cost of full table scans CBO: A Configuration Roadmap

DB_FILE_MULTIBLOCK_READ_COUNT (3) A simple full table scan with different values gives useful information about the impact of this parameter and, therefore, assists in finding the “best” value 100 100 80 Gain in %

60 40 20 0 -20 -20 1 1

5 5

9 9

CBO: A Configuration Roadmap

13 21 17 13 17 21 DFMBRC // MBRC DFMBRC MBRC 22

25 25 25

29 29 29

I/O cost cost without without I/O system statistics system I/O cost statistics without I/Ocost coststatistics with def. system I/O without I/O cost with def. def. I/O coststatistics with system system statistics system statistics I/O cost without system statistics I/O cost def. System with 1 system statistics system statistics System 1 def. (max. 55 I/O cost System with 1MB/s) (max. 55 MB/s) system statistics (max. 55 MB/s) System 2 System 1 (max. 55 402MB/s) MB/s) System (max. System (max. 4023MB/s) System (max. 40 MB/s) (max. 58 MB/s) System 3 System (max. 584MB/s) (max. 230 MB/s) © 2004

OPTIMIZER_DYNAMIC_SAMPLING (1) >

The CBO used to base its estimations on statistics found in the data dictionary only

With dynamic sampling some statistics can be gathered during the parse phase as well, this means that some queries are executed against the referenced objects to gather additional information

The statistics gathered by dynamic sampling are not stored in the data dictionary, they are simply shared at cursor level

The value of the parameter OPTIMIZER_DYNAMIC_SAMPLING specifies how and when dynamic sampling is used Âť Range: from 0 to 10 (0=dynamic sampling disabled) CBO: A Configuration Roadmap

ÂŠ 2004

OPTIMIZER_DYNAMIC_SAMPLING (2) # Blocks Param Hint

Level On which table?

1 2 3 4 5 6 7 8 9 10

All non-analyzed tables, if at least one table » Is part of a join, subquery or non-mergeable view » Has no index » Has more blocks than the number of blocks used for the sampling All non-analyzed tables All tables which fulfill the level-2 criterion plus all tables for which a standard selectivity estimation is used

All tables which fulfill level-3 criteria plus all tables which have single table predicates referencing more than two attributes

CBO: A Configuration Roadmap

32 64

128 256 512 1024 2048 4096 8192 All

OPTIMIZER_DYNAMIC_SAMPLING (3) >

Default values » If OPTIMIZER_FEATURES_ENABLE ≥ 10.0.0: 2 » If OPTIMIZER_FEATURES_ENABLE = 9.2.0: 1 » If OPTIMIZER_FEATURES_ENABLE ≤ 9.0.1: 0

Level 1 and 2 are not very useful, in fact the tables should be analyzed on a regular basis! » An exception is when temporary tables are used, in fact, usually, no statistics are available for them

Level 3 and up are useful for improving selectivity estimations of predicates » If the CBO is not able to do correct estimations start by setting it to 4 » It is possible to do such corrections with SQL profiles as well CBO: A Configuration Roadmap

PGA management >

It is possible to choose between two methods to manage the PGA » Manual: the DBA has full control over the size of the PGA » Automatic: the DBA delegates the management of the PGA to Oracle

Except in the following situations it is recommended to use automatic PGA management » Very large PGA needed (>100MB, see Metalink note 147806.1) - Except if you want to set _PGA_MAX_SIZE

» Fine tuning is needed » Shared server (former MTS) is used >

Usually a larger PGA makes merge/hash joins and sort operations faster, therefore, you should devote the “unused” memory that is available on the system to it

The PGA size has an influence on the costs computed by the CBO CBO: A Configuration Roadmap

Manual PGA management >

To enable manual PGA management the INIT.ORA parameter WORKAREA_SIZE_POLICY must be set to MANUAL

Then the following INIT.ORA parameters should be set » HASH_AREA_SIZE » SORT_AREA_SIZE » BITMAP_MERGE_AREA_SIZE

It is practically impossible to give advice about their value, anyway, here some general rules

» Usually a minimal value of 512KB to 1MB should be used » For small PGAs, to take advantage of hash join, the HASH_AREA_SIZE should be at least 3-4 times the SORT_AREA_SIZE

Each parameter specifies the maximum amount of memory that can be used by each server process CBO: A Configuration Roadmap

Automatic PGA management >

To enable auto PGA management the INIT.ORA parameter WORKAREA_SIZE_POLICY must be set to AUTO

Contrary to manual PGA management, the INIT.ORA parameter PGA_AGGREGATE_TARGET specifies the total amount of memory use for all PGAs, i.e. not for the single server processes

In some situations the PGA_AGGREGATE_TARGET could also be exceeded » It’s a target, not a maximum value! » This usually happens when a value which is too small is specified

CBO: A Configuration Roadmap

OPTIMIZER_INDEX_COST_ADJ >

This parameter is used to change the cost of table accesses through index scans

Range of values 1..10000 (default 100)

» Values greater than 100 make index scans more expensive » Values lower than 100 make index scans less expensive

Correction for an index range scan

OICA I / O Cost ≈ (BLevel + (LeafBlocks + ClustFactor ) ⋅ Selectivity ) ⋅ 100 Correction for an index unique scan

OICA I / O Cost ≈ (BLevel + 1) ⋅ 100

This parameter flattens costs and makes the clustering factor much less significant, therefore, small values should be carefully specified! With system statistics the default value is usually good CBO: A Configuration Roadmap

OPTIMIZER_INDEX_CACHING >

This parameter is used to specify the expected amount (in percent) of index blocks cached in the buffer cache during a nested loop join » This parameter has no influence on index range scans!

Range of values 0..100 (default 0)

» Values greater than 0 make nested loops less expensive

Correction for an index range scan

Correction for index unique scan

With system statistics the default value is usually good

⎛ OIC ⎞ I / O Cost ≈ (BLevel + LeafBlocks ⋅ Selectivity ) ⋅ ⎜1 − ⎟ ⎝ 100 ⎠ + ClustFactor ⋅ Selectivity ⎛ OIC ⎞ I / O Cost ≈ BLevel ⋅ ⎜1 − ⎟ +1 ⎝ 100 ⎠

CBO: A Configuration Roadmap

Core messagesâ&#x20AC;Ś >

The CBO works well if it is correctly configured!

Correctly configuring the CBO is not an easy job

This is no good reason for not doing it!

> >

from Trivadis At the core it's all about data.

CBO: A Configuration Roadmap

Understanding how each parameter changes the behavior of the CBO is essential

ÂŠ 2004

Thank you for your attention

Questions?