Summary
Is index columns order important? Use the following example. Setup 2 identical tables with 1.000.000 rows with 3 columns,
with the same index, but with different columns order.
CREATE TABLE HELPDESK.EXAMPLE
(ID NUMBER(9),
GROUP_NAME VARCHAR2(30),
ACCOUNT_NAME VARCHAR2(30))
TABLESPACE HELPDESK;
CREATE SEQUENCE HELPDESK.EXAMPLE_SEQ
START WITH 0
INCREMENT BY 1
MINVALUE 0
NOCACHE
NOCYCLE
NOORDER;
Now run the pl/sql code to fill the table
BEGIN
FOR i IN 1..10 LOOP
FOR k IN 1..100000 LOOP
INSERT INTO HELPDESK.EXAMPLE VALUES (helpdesk.example_seq.NEXTVAL, 'GROUP' || i, 'ACCOUNT' || k);
END LOOP;
COMMIT;
END LOOP;
END;
/
It took
16min in a 16CPUs HP-UX PA-RISC Superdome.
Now query to see the results
helpdesk@OTE> SELECT GROUP_NAME, COUNT(*)
2 FROM HELPDESK.EXAMPLE
3 GROUP BY GROUP_NAME;
GROUP_NAME COUNT(*)
------------------------------ ----------
GROUP1 100000
GROUP10 100000
GROUP2 100000
GROUP3 100000
GROUP4 100000
GROUP5 100000
GROUP6 100000
GROUP7 100000
GROUP8 100000
GROUP9 100000
10 rows selected.
Elapsed: 00:00:03.84
Create the second identical table HELPDESK.EXAMPLE1
CREATE TABLE HELPDESK.EXAMPLE1 TABLESPACE HELPDESK
AS SELECT * FROM HELPDESK.EXAMPLE;
Create the two indexes with different column order and calculate the statistics.
CREATE INDEX HELPDESK.EXAMPLE_N1 ON
HELPDESK.EXAMPLE (GROUP_NAME, ACCOUNT_NAME)
TABLESPACE HELPDESK;
CREATE INDEX HELPDESK.EXAMPLE_N2 ON
HELPDESK.EXAMPLE1 (ACCOUNT_NAME, GROUP_NAME)
TABLESPACE HELPDESK;
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname=>'HELPDESK', tabname=>'EXAMPLE',
estimate_percent=>100, CASCADE=>TRUE, DEGREE=>8);
END;
/
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname=>'HELPDESK', tabname=>'EXAMPLE1',
estimate_percent=>100, CASCADE=>TRUE, DEGREE=>8);
END;
/
After the statistics you can see the CLUSTERING_FACTOR of the indexes!
SELECT INDEX_NAME, BLEVEL, DISTINCT_KEYS, CLUSTERING_FACTOR
FROM DBA_INDEXES WHERE INDEX_NAME LIKE 'EXAMPLE_N%';
INDEX_NAME BLEVEL DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ------------- -----------------
EXAMPLE_N1 2 1000000 193240
EXAMPLE_N2 2 1000000 1000000
Now lets examine the behavior of the two indexes with two sql queries.
1. First query search for GROUP_NAME and ACCOUNT_NAME.
SELECT * FROM HELPDESK.EXAMPLE
2 WHERE GROUP_NAME = 'GROUP5'
3 AND ACCOUNT_NAME = 'ACCOUNT4567';
ID GROUP_NAME ACCOUNT_NAME
---------- ------------------------------ ------------------------------
514566 GROUP5 ACCOUNT4567
Elapsed: 00:00:23.65
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=25)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EXAMPLE' (Cost=5 Card=1
Bytes=25)
2 1 INDEX (RANGE SCAN) OF 'EXAMPLE_N1' (NON-UNIQUE) (Cost=4
Card=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
295 bytes sent via SQL*Net to client
236 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
> SELECT * FROM HELPDESK.EXAMPLE1
2 WHERE GROUP_NAME = 'GROUP5'
3 AND ACCOUNT_NAME = 'ACCOUNT4567';
ID GROUP_NAME ACCOUNT_NAME
---------- ------------------------------ ------------------------------
514566 GROUP5 ACCOUNT4567
Elapsed: 00:00:23.71
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=25)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EXAMPLE1' (Cost=6 Card=1
Bytes=25)
2 1 INDEX (RANGE SCAN) OF 'EXAMPLE_N2' (NON-UNIQUE) (Cost=4
Card=1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
295 bytes sent via SQL*Net to client
236 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can see when searching with
all index columns the Cost is the same. Just 5 consistent gets.
2. Now query search for ACCOUNT_NAME only.
SELECT * FROM HELPDESK.EXAMPLE
2 WHERE ACCOUNT_NAME = 'ACCOUNT4567';
ID GROUP_NAME ACCOUNT_NAME
---------- ------------------------------ ------------------------------
114566 GROUP1 ACCOUNT4567
1014566 GROUP10 ACCOUNT4567
214566 GROUP2 ACCOUNT4567
314566 GROUP3 ACCOUNT4567
414566 GROUP4 ACCOUNT4567
514566 GROUP5 ACCOUNT4567
614566 GROUP6 ACCOUNT4567
714566 GROUP7 ACCOUNT4567
814566 GROUP8 ACCOUNT4567
914566 GROUP9 ACCOUNT4567
10 rows selected.
Elapsed: 00:00:23.76
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=15 Card=10 Bytes=250
)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EXAMPLE' (Cost=15 Card=1
0 Bytes=250)
2 1 INDEX (SKIP SCAN) OF 'EXAMPLE_N1' (NON-UNIQUE) (Cost=13
Card=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
50 consistent gets
0 physical reads
0 redo size
459 bytes sent via SQL*Net to client
236 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SELECT * FROM HELPDESK.EXAMPLE1
2 WHERE ACCOUNT_NAME = 'ACCOUNT4567';
ID GROUP_NAME ACCOUNT_NAME
---------- ------------------------------ ----------------------------
114566 GROUP1 ACCOUNT4567
1014566 GROUP10 ACCOUNT4567
214566 GROUP2 ACCOUNT4567
314566 GROUP3 ACCOUNT4567
414566 GROUP4 ACCOUNT4567
514566 GROUP5 ACCOUNT4567
614566 GROUP6 ACCOUNT4567
714566 GROUP7 ACCOUNT4567
814566 GROUP8 ACCOUNT4567
914566 GROUP9 ACCOUNT4567
10 rows selected.
Elapsed: 00:00:25.76
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=10 Bytes=250
)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EXAMPLE1' (Cost=14 Card=
10 Bytes=250)
2 1 INDEX (RANGE SCAN) OF 'EXAMPLE_N2' (NON-UNIQUE) (Cost=4
Card=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
459 bytes sent via SQL*Net to client
236 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
The second index EXAMPLE_N2 has better cost than EXAMPLE_N1, only 14 consistent gets against 50.
So the conclusion is that when you create
concatenated indexes it would be better if you put the
columns with more distinct values first
Tip:For more information about
Predicate Selectivity you can see
Metalink Note: 68992.1
Other links
Change CBO Stats Selectivity for given Index and Column