Author: Donald Burleson
The cost-based optimizer has shown considerable improvements with each new release of Oracle. One note-worthy enhancement in Oracle9 i is the consideration of external influences like CPU and I/O cost when formulating an execution plan. As Oracle evolves into Oracle10g we see even more improvements in the ability of the CBO to always get the optimal execution plan for a query, but in the meantime, every Oracle developer must understand these mechanisms to properly tune SQL.
clustering_factor:
While important characteristics of column data within tables are known to the CBO, the most important characteristics are the clustering factor for the column and the selectivity of column values. Oracle provides a column called clustering_factor in the dba_indexes view that provides information on how the table rows are synchronized with the index. The table rows are synchronized with the index when the clustering factor is close to the number of data blocks and the column value is not row-ordered when the clustering_factor approaches the number of rows in the table.
Assessing common rows:
For queries that access common rows with a table unordered tables can experience huge I/O as the index retrieves a separate data block for each row requested. If we group like rows together (as measured by the clustering_factor in dba_indexes) we can get all of the row with a single block read because the rows are together. You can use 10g hash cluster tables, single table clusters, or manual row re-sequencing (CTAS with ORDER BY) to achieve this goal.
Filtering result set using a column value:
Consider below query that filters the result set using a column value
select
customer_name
from
customer
where
ustomer_state = ‘ New Mexico ';
Index or a Full-Table Scan:
The decision to use an index or a full-table scan is at least partially determined by the percentage of customers in New Mexico . An index scan is faster for this query if the percentage of customers in New Mexico is small and the values are clustered on the data blocks.
CBO choose to perform a full-table scan when only a small number of rows are retrieved because the CBO is considering the clustering of column values within the table.
CBO Decision:
Four factors work together to help the CBO decide whether to use an index or a full-table scan. These include
- Selectivity of a column value
- db_block_size
- avg_row_len
- Cardinality
An index scan is usually faster if a data column has high selectivity and a low clustering_factor .
Reducing overhead by index range scan:
Many Oracle database use the same index for the vast majority of queries. If these queries always do an index range scan (e.g. select all orders for a customer), them row re-sequencing can greatly reduce Oracle overhead.
Oracle storage mechanisms:
Oracle provides several storage mechanisms to fetch a customer row and all related orders.
- In Oracle 10g Sorted hash clusters is a great way to sequence rows for super-fast SQL
- Multi-table hash cluster tables cluster the customer rows with the order rows, often on a single data block.
- dbms_redefinition utility can be used to periodically re-sequence rows into index order.
Re-sequencing Table Rows:
To maintain row order, the DBA will periodically re-sequence table rows cases where a majority of the SQL references a column with a high clustering_factor, a large db_block_size , and a small avg_row_len . This removes the full-table scan, places all adjacent rows in the same data block, and makes the query up to thirty times faster.
Additional I/O for index range scans:
As the clustering_factor nears the number of rows in the table, the rows fall out of sync with the index. This high clustering_factor , where the value is close to the number of rows in the table ( num_rows) , indicates that the rows are out of sequence with the index and an additional I/O may be required for index range scans.
Columns with high selectivity:
Even when a column has high selectivity, a high clustering_factor, and small avg_row_len indicates that column values are randomly distributed in the table, and an additional I/O will be required to obtain the rows. An index range scan would cause a huge amount of unnecessary I/O thus making a full-table scan more efficient.
Conclusion:
The CBOs decision to perform a full-table vs. an index range scan is influenced by the clustering_factor , db_block_size , and avg_row_len . It is important to understand how the CBO uses these statistics to determine the fastest way to deliver the desired rows.
More Oracle Articles, Database Articles and DBA Tips
Database Security: Step by step guideline
Quick Oracle Database Recovery with Minimal Downtime!!
Great Tips on Reusing Space after deletion of database data!!
Oracle Recovery from import errors!!
Common Mistakes in Oracle Recovery, Interesting facts!
|