Display_Cursor

September 20th, 2011

Oracle 10 added the awesome procedure dbms_xplan.display_cursor but unfortunately the documentation of the package is a bit lacking and the options and output can be confusing, so here are few clarifications.

The procedure display_cursor gives the *real* execution plan instead of an estimated execution plan which is a huge relief after “explain plan”, “set autotrace on” and  “dbms_xplan.display” which all output the expected execution plan. Of the the three, “set autotrace on” is the most peculiar as we can actually run the query and yet the output is the expect plan not the actually executed plan. Pheww – what a relief with dbms_xplan.display_cursor.

But, hold on to the seat of your pants, because that’s not all folks. The coolest thing is that display_cursor will output the actual execution statistics for each row source line in the real execution plan. Now that’s cool. But to access these statistics, one has to enable the collection of the statistics.

To use, run a query in SQL*Plus and include the hint

 /*+ gather_plan_statistics */

then,  immediately following the query execution run:

 select * from table(dbms_xplan.display_cursor(null,null,’ALLSTATS’));

Then you will see the following columns in the output (minus the color coding)

A few confusing things about the output. Some of the columns are estimated statistics from the optimizer. Some of the columns are actual statistics from executing the query. And some of the columns are not per row statistics but statistics that include the statistics of the child rows.

The columns E-Rows and A-Rows can be used for Tuning by Cardinality Feedback (TCF). TCF refers to the process of comparing “E-Row” and “A-Rows” to determine whether the optimizer is optimizing correctly. The two columns “E-Rows” and “A-Rows” which represent estimated rows and actual rows can be compared to highlight how close or how far off the optimizers predictions where. If the optimizers predictions are far of, it’s an indication, though not a proof, that the query has been inadequately optimized. To get the optimizer to  create an optimal execution path, one can look for ways of bringing the optimizers estimated statistics in line with the actual statistics. See Tuning by Cardinality feedback by Wolfgang Breitling.

In V$sql_plan_statistics  these “should” be equal

  • cardinality ~= output_rows/starts
  • starts*cardinality ~= output_rows

And in the output above

  • E-Rows=cardinality
  • A-Rows=output_rows

Thus in order to compare E-Rows to A-Rows, we have to multiply E-Rows by starts.

Other display_cursor 3rd argument options

There are other options besides specifying “ALLSTATS” in the display_cursor. Here are a list of the options and the fields they show:

By default statistics are total for all executions of the cursor but “LAST” can be appended to the 3rd argument to get the stats for the last execution. NOTE:  the last four columns on memory and temp space always show total or last as indicated above. Here is an example of using “LAST”:

select * from table ( dbms_xplan.display_cursor (null,null,’MEMSTATS LAST‘));

Notice that for memory consumption, some of the stats are bytes and some are kilo-bytes.

The above tables are images. The following table is in text for searching and copy/paste.

Arguments giving only optimizer estimates

Rows Bytes TempSpc  Cost Time
BASIC
 null    *    *  *
TYPICAL  *  *  *  *  *
SERIAL  *  *    *  *
ALL  *  *   *  *
ADVANCED  *  *  *  *

 Arguments that give actual row source executions statistics (the red asterisks are estimates)

(the following two tables show the same arguments, but are broken in two to fit the width)

E-Rows starts A-Rows Buffers Reads Writes A-Time
MEMSTATS  *  *  *  *
ALLSTATS  *  *  *  *  *  *  *
RUNSTATS_LAST  *  LAST  LAST  LAST  LAST  LAST  LAST
RUNSTATS_TOT  *   TOT  TOT  TOT  TOT  TOT  TOT
IOSTATS  *  *  *  *  *  *   *

 

0Mem 1MEM 0/1/M Used-Mem Used-Tmp Max-Tmp
MEMSTATS  *   TOT LAST LAST TOT
ALLSTATS  *  * TOT LAST LAST TOT
RUNSTATS_LAST
RUNSTATS_TOT
IOSTATS

 Arguments for special cases

TQ IN-OUT PQ Distrib pstart pstop Instance
PARTITION X X X
PARALLEL X X
REMOTE X

List of arguments for non-statistical output

predicate info note Query Block Name Column projection info outline bind vars
BASIC
null X X
TYPICAL X X
SERIAL X X
ALL X X X X
ADVANCED X X X X X X
PREDICATES X
NOTE X
ALIAS X
PROJECTION X
OUTLINES X
PEEK_BINDS X

 

Enabling extended rows source execution statistics gathering

There are three ways to gather row source executions stats:

  1.  /*+ gather_plan_statistics */
  2. sql_trace=true;
  3. Statistics_level=all

The first two are cheaper (sets _rowsource_statistics_sampfreq =128 ) but less accurate than the last option which though more accurate can consume signficantly more CPU (_rowsource_statistics_sampfreq =1).

 Manually querying extended row source execution statistics

The data retrieved with display_cursor can be queried directly from

  • v$sql_plan_statistics
  • v$sql_plan_statistics_all

 

desc v$sql_plan_statistics
ADDRESS
HASH_VALUE
SQL_ID
PLAN_HASH_VALUE
CHILD_ADDRESS
CHILD_NUMBER
OPERATION_ID
EXECUTIONS
LAST_STARTS          STARTS
LAST_OUTPUT_ROWS     OUTPUT_ROWS
LAST_CR_BUFFER_GETS  CR_BUFFER_GETS
LAST_CU_BUFFER_GETS  CU_BUFFER_GETS
LAST_DISK_READS      DISK_READS
LAST_DISK_WRITES     DISK_WRITES
LAST_ELAPSED_TIME    ELAPSED_TIME

and

desc v$sql_plan_statistics_all
 ADDRESS
 HASH_VALUE
 SQL_ID      PLAN_HASH_VALUE CHILD_ADDRESS CHILD_NUMBER
 TIMESTAMP
 OPERATION
 OPTIONS
 OBJECT_NODE  OBJECT# OBJECT_OWNER OBJECT_NAME OBJECT_ALIAS OBJECT_TYPE
 OPTIMIZER
 ID PARENT_ID DEPTH POSITION
 SEARCH_COLUMNS
 COST  CARDINALITY  BYTES
 OTHER_TAG
 PARTITION_START PARTITION_STOP PARTITION_ID
 OTHER
 DISTRIBUTION
 CPU_COST  IO_COST  TEMP_SPACE
 ACCESS_PREDICATES
 FILTER_PREDICATES
 PROJECTION
 TIME (seconds estimated by optimizer)
 QBLOCK_NAME
 REMARKS
 OTHER_XML (bind vars and other info)
 EXECUTIONS
 LAST_STARTS           STARTS
 LAST_OUTPUT_ROWS      OUTPUT_ROWS
 LAST_CR_BUFFER_GETS   CR_BUFFER_GETS
 LAST_CU_BUFFER_GETS   CU_BUFFER_GETS
 LAST_DISK_READS       DISK_READS
 LAST_DISK_WRITES      DISK_WRITES
 LAST_ELAPSED_TIME     ELAPSED_TIME (microseconds)
 POLICY
 ESTIMATED_OPTIMAL_SIZE (bytes)
 ESTIMATED_ONEPASS_SIZE (bytes)
 LAST_MEMORY_USED (bytes)
 LAST_EXECUTION (LAST whether work area was optimal, one pass or multi)
 LAST_DEGREE
 TOTAL_EXECUTIONS (number of times work area was active)
 OPTIMAL_EXECUTIONS
 ONEPASS_EXECUTIONS
 MULTIPASSES_EXECUTIONS
 ACTIVE_TIME (centi-seconds, time work area is active)
 MAX_TEMPSEG_SIZE (kb -  mis-documented as bytes, though correctly bytes in v$sql_workarea )
 LAST_TEMPSEG_SIZE (kb - "" , see  http://www.antognini.ch/2009/05/wrong-information-about-temporary-space-usage/ ) )

so you can write a query to get the data directly such as

col operation for a45
SELECT
      LPAD(' ',depth)||P.OPERATION||'_'||P.OPTIONS||' '||P.OBJECT_NAME  operation
    , last_starts * cardinality e_rows_x_starts
    , last_output_rows  a_rows
    , LAST_CR_BUFFER_GETS bgets
    , LAST_DISK_READS     pread
    , LAST_DISK_WRITES    pwrites
    , LAST_ELAPSED_TIME   elapsed
    , LAST_MEMORY_USED
    , LAST_TEMPSEG_SIZE
    , LAST_EXECUTION
  FROM
       V$SQL_PLAN_statistics_all P
  WHERE
        sql_id='&sql_id'
  order by child_number,id
/

which makes the comparison between estimate and actual easier because I can include starts*cardinality to get the e-rows_x_starts which I can compare directly to a-rows, ie output_rows.

Putting it all together

We can take the TCF ideas farther by putting them together in a query such that the output is easier to  read:

col cn format 99
col ratio format 99
col ratio1 format A6
--set pagesize 1000
set linesize 140
break on sql_id on cn
col lio_rw format 999
col "operation" format a60
col a_rows for 999,999,999
col e_rows for 999,999,999
col elapsed for 999,999,999

Def v_sql_id=&SQL_ID

select
       -- sql_id,
       --hv,
       childn                                         cn,
       --ptime, stime,
       case when stime - nvl(ptime ,0) > 0 then
          stime - nvl(ptime ,0)
        else 0 end as elapsed,
       nvl(trunc((lio-nvl(plio,0))/nullif(a_rows,0)),0) lio_ratio,
       --id,
       --parent_id,
       --starts,
       --nvl(ratio,0)                                    TCF_ratio,
       ' '||case when ratio > 0 then
                rpad('-',ratio,'-')
             else
               rpad('+',ratio*-1 ,'+')
       end as                                           TCF_GRAPH,
       starts*cardinality                              e_rows,
                                                       a_rows,
       --nvl(lio,0) lio, nvl(plio,0)                      parent_lio,
                                                         "operation"
from (
  SELECT
      stats.LAST_ELAPSED_TIME                             stime,
      p.elapsed                                  ptime,
      stats.sql_id                                        sql_id
    , stats.HASH_VALUE                                    hv
    , stats.CHILD_NUMBER                                  childn
    , to_char(stats.id,'990')
      ||decode(stats.access_predicates,null,null,'A')
      ||decode(stats.filter_predicates,null,null,'F')     id
    , stats.parent_id
    , stats.CARDINALITY                                    cardinality
    , LPAD(' ',depth)||stats.OPERATION||' '||
      stats.OPTIONS||' '||
      stats.OBJECT_NAME||
      DECODE(stats.PARTITION_START,NULL,' ',':')||
      TRANSLATE(stats.PARTITION_START,'(NRUMBE','(NR')||
      DECODE(stats.PARTITION_STOP,NULL,' ','-')||
      TRANSLATE(stats.PARTITION_STOP,'(NRUMBE','(NR')      "operation",
      stats.last_starts                                     starts,
      stats.last_output_rows                                a_rows,
      (stats.last_cu_buffer_gets+stats.last_cr_buffer_gets) lio,
      p.lio                                                 plio,
      trunc(log(10,nullif
         (stats.last_starts*stats.cardinality/
          nullif(stats.last_output_rows,0),0)))             ratio
  FROM
       v$sql_plan_statistics_all stats
       , (select sum(last_cu_buffer_gets + last_cr_buffer_gets) lio,
                 sum(LAST_ELAPSED_TIME) elapsed,
                 child_number,
                 parent_id,
                 sql_id
         from v$sql_plan_statistics_all
         group by child_number,sql_id, parent_id) p
  WHERE
    stats.sql_id='&v_sql_id'  and
    p.sql_id(+) = stats.sql_id and
    p.child_number(+) = stats.child_number and
    p.parent_id(+)=stats.id
)
order by sql_id, childn , id
/

which gives output like

    Enter value for sql_id: g2w9n4gksyys6
    old  59:     stats.sql_id='&v_sql_id'  and
    new  59:     stats.sql_id='g2w9n4gksyys6'  and

     CN   ELAPSED    LIO_RATIO TCF_GRAPH   E_ROWS       A_ROWS operation
    --- ------------ --------- ------ ------------ ------------ ------------------------------------------------------------
      0            0         0                                1 SELECT STATEMENT
           5,720,456         0                   1            1  HASH GROUP BY
              29,711         0                            1,909   NESTED LOOPS
                   0         0  +++              1        1,909    NESTED LOOPS
           1,969,304         0  +++              1        1,909     NESTED LOOPS
                   0         0  +++              1        2,027      NESTED LOOPS
           7,939,649         0  +++              1        1,656       NESTED LOOPS
             716,054         0  +++              1        1,657        NESTED LOOPS
             270,201         0  ++              39       23,171         HASH JOIN
                  23         0                   5            1          JOIN FILTER CREATE :BF0000
                  31         1                   5            1           TABLE ACCESS BY INDEX ROWID PS_PAY_CALENDAR
                  14         2                   5            1            INDEX RANGE SCAN PS0PAY_CALENDAR
             141,467         0              18,503       23,171          VIEW  VW_SQ_1
           3,032,120         0              18,503       23,171           HASH GROUP BY
             152,564         0             163,420       33,020            JOIN FILTER USE :BF0000
             407,746         0             163,420       33,020             MERGE JOIN
                  55         0                   5            1              SORT JOIN
                  12         2                   5            1               INDEX RANGE SCAN PS0PAY_CALENDAR
              79,435         0              40,000       33,020              SORT JOIN
             119,852         0              40,000       40,000               INDEX FAST FULL SCAN WB_JOB
           2,959,031        13  -           23,171        1,657         TABLE ACCESS BY INDEX ROWID WB_JOB
             944,887         1              23,171       23,174          INDEX RANGE SCAN WB_JOB
             102,650         0               1,657        1,656        VIEW PUSHED PREDICATE  VW_SQ_2
              73,769         0               1,657        1,657         SORT AGGREGATE
              25,617         0               1,657        1,657          FIRST ROW
             225,497         1               1,657        1,657           INDEX RANGE SCAN (MIN/MAX) WB_JOB
             357,872         0               3,312        2,027       TABLE ACCESS BY INDEX ROWID WB_RETROPAY_EARNS
           3,655,774         1               3,312        2,027        INDEX RANGE SCAN WB_RETROPAY_EARNS_IDX1
             199,884         0               2,027        1,909      TABLE ACCESS BY INDEX ROWID PS_RETROPAY_RQST
             317,793         1               2,027        1,909       INDEX RANGE SCAN PS_RETROPAY_RQST
              71,534         0               1,909        1,909     INDEX RANGE SCAN PS#RETROPAYPGM_TBL
              18,396         0               1,909        1,909    TABLE ACCESS BY INDEX ROWID PS_RETROPAYPGM_TBL

The 3 important parts of this query are

  • Elapsed is per row source, not cumulative of it’s children
  • LIO_RATIO
  • TCP_GRAPH

Elapsed time format has a huge drawback in the display_cursor output as each lines elapsed time includes the elapsed time of all the children which makes an execution plan difficult to scan and see where the time is being spent. In the above output the elapsed time represents the elapsed time of each row source line.

LIO_RATIO shows the number of buffers accessed per row returned. Ideally 1 buffer or less is accessed per row returned. When the number of buffers per row becomes large, it’s a good indication that there is a more optimal method to get the rows.  The I/O stats include the stats of the child row source, so the query has to get the I/O from the childern and subtract from the parent, making the query a bit more complex.

TCP_GRAPH graphically shows the ratio of estimated rows to actual rows. The estimated rows used is cardinality* starts, not just cardinality. This value can be compared directly to actual_rows and the difference in order of magnitude is shown. Each ‘+’ represents and order of magnitude larger and each “-” represents an order of magnitude smaller. The more orders of magnitude, either way, the more the optimizers calculations are off and thus like more pointing to a possible plan that is suboptimal.

In the above output there   are 5 lines where the optimizer only expect 1 row and the actual results were over 1000, ie 3 orders of magnitude difference. These are the three lines with “+++”
There is one line with “-” where actual was an order of magnitude smaller. On that same line we see it’s one of the slower lines almost 3 seconds and that the were 13 lio’s per row returned, which is sign of inefficiency.


Uncategorized

  1. Trackbacks

  1. Comments

  2. Jean-Christophe DAUCHY
    September 27th, 2011 at 10:41 | #1

    Hello Kyle,

    Really nice article. It was hard for me to get what is estimated and real.

    When doing parallel processing, How are evaluated the memory and Temp usage figures, are they Max(value per slave) or Sum(value per slave) ?

    I am working on a Parallel query doing a lot of sorting and I try to figure out what would be the optimal PGA settings to improve the sorts.

    Thanks again for this nice colorfull article :)

  3. September 27th, 2011 at 23:35 | #2

    Hi Jean-Christophe,
    thanks for stopping by.
    Good questions on the parallel query statistics. Unfortunately I haven’t spent any time looking at extended row source statistics with parallel query. People that come to mind that might know would be Greg Rahn or Christian Antognini. If you find the info, let me know so I can link it into this post!
    – Kyle

  4. Jean-Christophe DAUCHY
    September 29th, 2011 at 07:41 | #3

    Thanks Kyle, I’ll send them an Email to see if they have investigated those figures in the parallel scenario.. And i’ll post back some information if I can understand them :).

    Christophe.

  5. Kumar Ramalingam
    December 8th, 2011 at 16:14 | #4

    Hello Kyle,
    Really interesting. I have been using this TCF technique for a while for most of my issues. Well, there are a handful that slip through the cracks. One of them is a function call in a select statement that delays returning the result set when compared to the same statement without the function call. I compared both with allstats and the one without function call tells me accurately the time taken to return the results back to the terminal whereas the other one, spends more time, but not recorded in the execution plan. It may be a “feature” as per oracle, just curious if we can nail that timing down someway. Appreciate your response on this.
    Thanks
    Kumar Ramalingam

  6. Bertrand
    December 1st, 2012 at 07:34 | #5

    Good Post ! In case you don’t have access to the sql code, you can add the hint “gather_plan_statistics” thank’s to Kerry Osborne’s script : http://kerryosborne.oracle-guy.com/2010/02/gather_plan_statistics/

  7. kais
    March 31st, 2013 at 07:08 | #6

    Nice article thanks:)

You must be logged in to post a comment.