Oracle CPU time

July 21st, 2011

There are 3 kinds of CPU in the Oracle stats.

  1. Oracle CPU used
  2. System CPU used
  3. Oracle demand for CPU

Starting in 10g Oracle records both the CPU used by the instance as well as the load on the system in v$sysmetric. This is awesome as we can see how busy the system is and how much of the CPU Oracle is responsible for:

col metric_name for a25
col metric_unit for a25
select metric_name, value, metric_unit from v$sysmetric where metric_name like'%CPU%' where group_id=2;
METRIC_NAME                         VALUE METRIC_UNIT
------------------------------ ---------- ------------------------------
CPU Usage Per Sec              251.067016 CentiSeconds Per Second
CPU Usage Per Txn              5025.52477 CentiSeconds Per Txn
Host CPU Utilization (%)       11.6985845 % Busy/(Idle+Busy)
Database CPU Time Ratio        76.3291033 % Cpu/DB_Time

Now the question is how do we convert these to something useful? For me I put it into the equivalent of AAS and compare it to the core count:

   select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                             AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                        CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)    sAAS
            from
              ( select value busy from v$sysmetric
                where metric_name='Host CPU Utilization (%)'
                 and group_id=2 ) prcnt,
             ( select value cpu_count
                 from v$parameter
                where name='cpu_count' )  parameter;

CLASS                  AAS
---------------- ----------
CPU_ORA_CONSUMED       .002
CPU_OS                 .022

An AAS of 1 is equivalent to 100% of a core, so, OS CPU is about 2% of a core and of that Oracle used 0.2% of a core.
Not a very active system, and we can look at an active system later, but what I wanted to point out is that this query is missing an important statistic: the demand for CPU by Oracle. We can only add that, AFAIK, by joing in ASH:

   select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                             AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                         CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)     AAS
            from
              ( select value busy from v$sysmetric
                 where metric_name='Host CPU Utilization (%)'
                   and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter
                 where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60));

CLASS                   AAS
---------------- ----------
CPU_ORA_CONSUMED       .001
CPU_ORA_DEMAND          .02
CPU_OS                 .019

So the demand for CPU was higher than the amount consumed. Now the demand for CPU is coming from ASH which is sampled so the accuracy is weak, but in larger sample sets or busier systems it’s pretty darn good. The demand alert us to CPU starvation on a busy  system.

I like to wrap all this up into a query with all the wait classes to see the overall load on Oracle including CPU consumed by Oracle, CPU demanded by Oracle and CPU used at the OS level:

select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait')
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS
            from
              ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60));

CLASS                   AAS
---------------- ----------
CPU_ORA_CONSUMED       .002
CPU_ORA_DEMAND          .03
CPU_OS                 .023
Commit                    0
User I/O                  0
Wait                      0

Ideally I’d want the CPU stats to be subsets of each other so that I could have a graphically stack-able set of statistics

now rolling it all together
with AASSTAT as (
           select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait')
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS
            from
              ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60))
)
select
       ( decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) +
       CPU_ORA_CONSUMED +
        decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED ))) CPU_TOTAL,
       decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) CPU_OS,
       CPU_ORA_CONSUMED CPU_ORA,
       decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED )) CPU_ORA_WAIT,
       COMMIT,
       READIO,
       WAIT
from (
select
       sum(decode(CLASS,'CPU_ORA_CONSUMED',AAS,0)) CPU_ORA_CONSUMED,
       sum(decode(CLASS,'CPU_ORA_DEMAND'  ,AAS,0)) CPU_ORA_DEMAND,
       sum(decode(CLASS,'CPU_OS'          ,AAS,0)) CPU_OS,
       sum(decode(CLASS,'Commit'          ,AAS,0)) COMMIT,
       sum(decode(CLASS,'User I/O'        ,AAS,0)) READIO,
       sum(decode(CLASS,'Wait'            ,AAS,0)) WAIT
from AASSTAT)
/

    CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ------------ ---------- ---------- ----------
       .02       .002            0          0          0          0

Now let’s run up some load on a machine and database.
Take two databases, run up the CPU demand on both and add some wait contention. The machine has 24 cores so there is a definitely a problem when the CPU_TOTAL goes over 24. I’m running 14 sessions each trying to burn a core on two different databases. The first few lines the test is ramping up

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    14.887       .387     13.753         .747          0          0       .023

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    21.989      7.469     12.909        1.611          0          0       .044

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    26.595     12.125     11.841        2.629          0          0       .025

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    27.045     12.125     11.841        3.079          0          0       .025

Historically CPU used by Oracle was derived from

v$sysstat.name=’CPU used by this session’

but this statistic had problems as the value was only updated every time a call ended. A call could be a 1 hour PL/SQL procedure which would thus report zero cpu usage in the stats until it finished and the CPU would spike off the scale.

ASH had always been the most stable way to gather CPU demand, though Oracle has made improvements in gathering CPU statistics. I believe that the time model gathers CPU every 5 seconds in 10g, and in 11g it’s possible that CPU stats are gathered every second

Here is a visual example of a machine that has server memory contention, massive amounts of paging. There is OS CPU being used, but hardly any CPU being used by Oracle which makes sense as it’s an idle database, but what is revealing is the massive amount of CPU wait by Oracle. Oracle only has a little bit of work to do to take care of an idle database but we can see that most of Oracle’s CPU time is wait for CPU time as when it wants to work, pages have to be read back in,

I have my doubts as to the clarity of the layout of the above graph. A possibly clearly graph would be simpling adding a line representing available CPU and take out the OSCPU bars. In the above graph I’ve charted OSCPU usage as AAS, ie average active sessions, mixing AAS of the database with AAS at the OS level. I think a  possible clear representation would be to show the Core count line, and draw the OSCPU usage shown upside down from the # of core lines, thus the space from the bottom axis to where the OSCPU reaches down would be available CPU.


Uncategorized

  1. Trackbacks

  1. Comments

  2. chris stephens
    July 26th, 2011 at 20:28 | #1

    these queries are very useful. however, the results from my system don’t really make sense.

    CLASS AAS
    —————- ———-
    CPU_ORA_CONSUMED .661
    CPU_ORA_DEMAND 1.75
    CPU_OS 1.67
    Commit .004
    User I/O .007
    Wait .013

    Any idea how Oracle could report it’s CPU usage as higher than the server it is running on?

    11.2.0.2 on RHEL 5 on VMWare.

  3. July 26th, 2011 at 21:33 | #2

    Hey Chris,

    Those stats are quite feasible.
    Oracle’s CPU demand was 1.75 of which only .661 was consumed, so there was 1.089 was waiting.
    The OS CPU was 1.67 of which Oracle consumed .661.
    Now, why the OS didn’t give Oracle more in that period is a good question – were there any memory issues on the system at that point in time?
    I like the last version of the script that lays out the components better, ie OS CPU used, Oracle CPU used and Oracle CPU wait.

    - Kyle

  4. September 8th, 2011 at 17:31 | #3

    I found this to be VERY useful. Thanks!

  5. goran
    February 16th, 2012 at 12:09 | #4

    very good … thanks!

  6. Simon
    February 22nd, 2012 at 12:09 | #5

    Hi Kyle,

    I’m interested in the group_id… I’ve looked in the docs to see why there are two (in my case) and the docs just say group_id = metric_id and metric_id has a description of metric_id… I may be missing something really simple here, but what’s the difference between the 2. In my DB I see the following:

    SQL> /

    GROUP_ID METRIC_ID METRIC_NAME VALUE METRIC_UNIT
    ———- ———- ————————- ———- ————————-
    2 2057 Host CPU Utilization (%) 58.9297201 % Busy/(Idle+Busy)
    2 2075 CPU Usage Per Sec 877.185667 CentiSeconds Per Second
    2 2076 CPU Usage Per Txn 77.3670053 CentiSeconds Per Txn
    2 2108 Database CPU Time Ratio 48.6910089 % Cpu/DB_Time
    3 2057 Host CPU Utilization (%) 70.0992814 % Busy/(Idle+Busy)
    3 2108 Database CPU Time Ratio 50.39545 % Cpu/DB_Time

    6 rows selected.

    SQL> l
    1* select group_id, metric_id, metric_name, value, metric_unit from v$sysmetric where metric_name like’%CPU%’
    SQL>

    So, clearly 2 different values, but what is the difference? Is it a different sampling frequency, I did wonder if that was the answer but then I noticed the 2 groups have a different number of metrics:

    SQL> select GROUP_ID, count(METRIC_ID) from v$sysmetric group by GROUP_ID;

    GROUP_ID COUNT(METRIC_ID)
    ———- —————-
    2 148
    3 42

    I’d appreciate it if you had the time to explain what the difference is?

    Thanks,

    Simon

  7. March 15th, 2012 at 05:37 | #6

    The two group_id values are the time periods.

    group_id 2 is 60 second deltas
    group_id 3 is 15 second deltas

    for v$sysmetric this represents the last 15 or 60 second delta.
    for v$sysmetric_history it will be an hour of 60 second deltas and 3 minutes of 15 second deltas.
    you can see this by looking at INTSIZE_CSEC in v$systemetric_history:

    select distinct INTSIZE_CSEC, GROUP_ID from v$sysmetric_history;

    INTSIZE_CSEC GROUP_ID
    ———— ———-
    1502 3
    5908 2
    6010 2
    5909 2
    1503 3
    1402 3
    6009 2
    6008 2
    There is a little bit of variability but basically GROUP_ID 2 is 60 seconds and GROUP_ID 3 is 15.

    select group_id, (max(END_TIME) – min(BEGIN_TIME))*60*24
    from v$sysmetric_history
    group by group_id

    GROUP_ID (MAX(END_TIME)-MIN(BEGIN_TIME))*60*24
    ———- ————————————-
    2 61.0166667
    3 3.26666667

    Oracle left much to be desired in explaining and documenting this info, most likely because it was designed for consumption by AWR, Alerts and OEM and not direct access.

  8. March 27th, 2012 at 09:00 | #7

    Hi Kyle,

    your article has been of great help, thanks a lot!

    Best regards,
    Nikolay

  9. Caesar Dutta
    April 18th, 2012 at 10:19 | #8

    Dear Sir:

    My query is very simple. I want to know that what percentage of CPU is utilized
    by an Oracle instance (serving queries, waiting to server queries, PMON, RMAN etc etc).

    I want to take the result every 1 hour and plot on a graph.

    Is the column CPU_ORA depicts %age used by total Oracle database system?

    Thanks in advance, Caesae

  10. Caesar Dutta
    April 23rd, 2012 at 06:21 | #9

    Dear Sir:

    My query is very simple. I want to know that what percentage of CPU is utilized
    by an Oracle instance (serving queries, waiting to server queries, PMON, RMAN etc etc).

    I want to take the result every 1 hour and plot on a graph.

    Is the column CPU_ORA depicts %age used by total Oracle database system?

    Thanks in advance, Caesar

  11. Caesar Dutta
    May 3rd, 2012 at 11:38 | #10

    I am running your query. The machine is 2 x Dual Core Intel Itanium, running on RHEL 4 for Itanium.

    The result I am getting is:
    CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT
    ———- ———- ———- ————
    211.454 0 211.454 0

    Please clear my doubts – why is Total CPU so high?

  12. May 7th, 2012 at 16:59 | #11

    Hi Caesar,
    Those numbers do look strange.
    Two things you could do:
    1. run the first query on : http://dboptimizer.com/2012/01/17/looking-at-awr-repositories-for-performance-issues/
    2. export your AWR and send it to me: http://gavinsoorma.com/2009/07/exporting-and-importing-awr-snapshot-data/
    AWR export might be a bit big, so limit it to a day where you see #’s like above. If it’s to big to email (compress it first), I’ll send you an ftp location
    - Kyle

  13. Caesar Dutta
    May 8th, 2012 at 12:21 | #12

    @Kyle Hailey

    I will definitely run the first query from http://dboptimizer.com/2012/01/17/looking-at-awr-repositories-for-performance-issues/ and will send it to you. Currently I am just getting “per second values” every hour using the following SQLs

    — Calculate today’s metrics
    SELECT MAX(NVL(ORA_CPU_USAGE_PER_SEC,0))
    INTO v_ora_cpu_util
    FROM (
    select round(value/100,3) ORA_CPU_USAGE_PER_SEC
    from v$sysmetric_history
    where metric_name=’CPU Usage Per Sec’
    and begin_time >= SYSDATE – INTERVAL ’1′ HOUR
    and end_time = SYSDATE – INTERVAL ’1′ HOUR
    and end_time sysdate – (60/(24*60*60));

  14. Caesar Dutta
    May 10th, 2012 at 06:22 | #13

    How will I send something to you?? The message getting truncated here. Please see my above message.

    Caesar

  15. Simon
    May 18th, 2012 at 16:25 | #14

    @Kyle Hailey
    Have only just seen the reply, thanks Kyle! Simon

  16. Sid
    July 9th, 2012 at 17:01 | #15

    Hi Kyle,

    Can the above queries be useful for gathering/raising alerts where active session usage – CPU crosses a certain percentage.

  17. Sid
    July 20th, 2012 at 04:33 | #16

    Hi Kyle,

    Using Oracle 10gR2. Want to set a proactive alert (through OEM db console, dbms_server_alerts) to trap the average active sessions/session usage > 70%.

  18. July 25th, 2012 at 20:29 | #17

    Hi Sid,
    By “session usage” do you mean connected sessions?
    First let’s start with alerts on Average Active Sessions (AAS). In OEM 10g there was no alert for AAS! Strange but true. There is a way to set alerts on AAS though. One way is to go into “Manage Metrics” link at the bottom of OEM home page or performance page for a database. Choose “edit thresholds”, then go to “Database Time (centiseconds per second)”. This statistics is 100*AAS, so if I want an alert on AAS=10, then I set an alert on “Database Time” to 1000.
    Now if you want to do calculations, such as AAS/(number of connected users) then you will have to create your own user defined metric. See the “User-Defined Metrics” link at the bottom of the database home page or performance page.
    I believe the alerts are better on 11g but I haven’t investigated them.

    - Kyle

  19. Sid
    August 1st, 2012 at 11:55 | #18

    Hi Kyle,
    Thanks for your reply.
    Yes connected sessions.
    Need to trap the Average Active Sessions Usage. Say over 50% usage it will generate a alert in OEM page alongwith a email notification. Now the usage will be a combination of CPU,User I/O, System I/O etc.
    What you mentioned above is Database Time for average active sessions. Or is it the same. ?
    If I want to do it through dbms_server_alerts or using a shell script is it possible.
    If yes then can provide a example on this.

  20. August 1st, 2012 at 17:01 | #19

    Hi Sid,

    Yes DB Time normalized by the elapsed time is Average Active Sessions when both the numerator and denominator have the same unit. The value “Database Time (centiseconds per second)” = (db time in centi-seconds)/ (elapsed seconds). Thus to convert it to AAS becomes
    ( (1 sec)/(100 centi-seconds) ) * ( db time in centi-seconds)/ seconds) = AAS.
    In 10g OEM to set an alert on (AAS/connected_sessions) you would have to set a custom alert. As far as setting custom alerts I don’t have any code examples. Let us know if you find any on other sites and we can reference that code here. It’s also possible that 11g and 12c OEM have options do alerts directly on AAS.

  21. Steve Jamro
    August 11th, 2012 at 18:30 | #20

    CPU wait only reported by Oracle as the difference between CPU cycles burned and time that Oracle process was runnable from the Oracle perspective.
    This information doesn’t come from any OS tools

    >>1-Is CPU wait always a wait on I/O?
    no, could simply be that no more CPU is available

    >>2-will Memory and Virtual memory have anything to do with CPU wait?
    Sure if a process is paged out, then it will probably have show CPU wait time. In this case it was waiting for page ins before it could use CPU

    >>3-if a system has lots of CPU wait will adding more CPU or cores fix this problem?
    Only if the issue is a CPU saturation issue and not something like a memory pressure issue

    >> 4-what is the best OS tool to report CPU wait other than VMSTAT?
    Only Oracle gives this info
    See http://dboptimizer.com/2011/07/21/oracle-cpu-time

    A request: ask these questions on the blog that way everyone can benefit

    Thanks

    Kyle

  22. Steve Jamro
    August 11th, 2012 at 18:32 | #21

    Kyle
    But CPU wait is reported by vmstat in WA column,and also Ganglia shows it,how can we get a breakdown of the causes of these waits if it is IO,network or memory.

  23. August 14th, 2012 at 18:29 | #22

    @Steve Jamro,

    AFAIK, the wait CPU in vmstat is I/O wait. Historically the wait column of vmstat has been mostly bunk, simply meaning someone is waiting for I/O. There is run queue depth, the first column of vmstat output, that gives some indication of wait for CPU. The run queue column varies from # of users queued for CPU to number of users queued on CPU plus # of users on CPU. Either way, it doesn’t report how much time is waited for CPU.

    THe first hit on google for “wait vmstat” turns up AIX vmstat documentation
    http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.prftungd%2Fdoc%2Fprftungd%2Fvmstat_command.htm
    saying “The reported CPU activity is a percentage breakdown of user mode, system mode, idle time, and waits for disk I/O.”
    “The wa column details the percentage of time the CPU was idle with pending local disk I/O”

  24. Stalin
    September 13th, 2012 at 23:05 | #23

    Hi Kyle, Nice post.

    Quick question with some numbers trying understand.

    When i run the below query, i see sometimes, the host_cpu_util% goes to 250. I’m getting the metric by instance as you see in the query, shouldn’t it be always within 100. My cpu_count is 8 per node.

    select instance_number node, trunc(begin_time, ‘HH24′) snapTime,
    round(sum(case metric_name when ‘Host CPU Utilization (%)’ then maxval end)) Host_CPU_util,
    round(sum(case metric_name when ‘CPU Usage Per Sec’ then maxval end)/100) DB_CPU_per_sec, — /100 for per sec conversion
    from dba_hist_sysmetric_summary
    where begin_time > trunc(sysdate)-1
    and begin_time < trunc(sysdate)
    group by instance_number, trunc(begin_time, 'HH24')
    order by instance_number, trunc(begin_time, 'HH24')

  25. Victor
    November 28th, 2012 at 16:26 | #24

    If you have get CPU information from DBA_HIST views what would be the query like?
    Could you help plz?

  26. Victor
    November 28th, 2012 at 16:27 | #25

    If you have get CPU information from DBA_HIST views what would be the query like?
    Could you help plz?

  27. rumata
    February 25th, 2013 at 17:50 | #26

    CPU_ORA_CONSUMED is time, CPU_ORA_DEMAND is count per second. How can we compare this numbers?

    Moreover CPU_ORA_DEMAND is not CPU demand. If we have 5 cores and 15 sessions(all 15 wants to use CPU) then CPU_ORA_DEMAND takes into account only 5 because 10 other is ‘waiting’ not ‘on cpu’.

  28. February 25th, 2013 at 20:24 | #27

    @rumata : yes CPU_ORA_CONSUMED is time and CPU_ORA_DEMAND is a count so a purist would say these aren’t comparable, but CPU_ORA_DEMAND is a proxy for time and approaches time demand the more data we have. CPU_ORA_DEMAND is more appropriately time under the curve where the curve is defined by session count of sessions in a CPU state. As we reduce the sampling time the curve becomes more defined and accurate as in taking the integral where the height is the session count and for our purposes the width is 1 second: http://hyperphysics.phy-astr.gsu.edu/hbase/integ.html so CPU demand becomes 1 second times sessions demanding CPU.
    .
    For the second point, CPU_ORA_DEMAND is the number or sessions who want to run on CPU.
    If you have 15 sessions who want CPU then their state in v$active_session_history will be “ON CPU”. If you have 15 session with state “ON CPU” and only 5 cores allocated to your machine (VM?) then only 5 of them can be actually on the CPU at any point in time.

  29. JB
    February 26th, 2013 at 18:17 | #28

    @Kyle Hailey
    Counting in-memory ASH samples and equating that to database time is not immediately and intuitively obvious to most people, but it works. This is what I call “ASH Math” and the basic statements are simple:

    COUNT(*) FROM V$ASH ~ DB Time (in seconds)
    COUNT(*) *10 FROM DBA_HIST_ASH ~ DB Time (in seconds)

    One semi-intuitive way to think of this is to analogize the ASH sampling mechanism to an old-style movie camera: every second a new “frame” is taken of all active sessions and what they are doing, e.g. session1 is waiting for IO, session2 is using CPU, session3 is waiting for an enqueue, etc. So if we were to play this movie back “as it happened” each “frame” would display for 1 second and then answering a question like “how much time did session3 spend waiting for enqueue” would be accomplished by adding up all the frames in which session3 is waiting for enqueue and counting each as 1 second of time. It’s not a perfect movie of what happened over the time period by any means, but it is our best unbiased estimate of what happened, since our movie-making mechanism is independent of the scenes it records.

    There is a mathematical explanation that should be more convincing but is best done with diagrams. In essence, it shows that adding up in-memory ASH samples over some continuous time period amounts to estimating the integral of the “active sessions function” which is to say estimating DB Time since the active sessions function over time is precisely the derivative of the DB Time function over time (rate of change of DB Time at time t0 = number of active sessions at time t0.)

  30. rumata
    February 26th, 2013 at 18:21 | #29

    My bad. This query seems correct.

    But it’s producing very strange results. At moment we have 4 sessions, 5 runnable processes, CPU load about 25%. Host has 8 cores.

    And sql results:
    CPU_ORA_CONSUMED 0,355
    CPU_ORA_DEMAND 0,83

    Demand is twice higher on absolutely free instance.
    Can anybody explain this results?

  31. JB
    February 26th, 2013 at 20:21 | #30

    @rumata
    You didn’t supply all the output nor state version of the DB, but there are potential “gotchas” with ASH-estimated CPU demand that could explain it.

    Basically, ASH samples “ON CPU” are not directly observed to be on the CPU but rather this state is derived basically as follows:

    1/ is the session in a database call?
    2.1/ is the session waiting on a non-idle wait event?
    2.2/ is the session waiting on an idle wait event?
    (note that 2.1 and 2.2 cannot both be true)

    If 1/ is TRUE and 2.1/ is TRUE then session is sampled into ASH as ‘WAITING’
    If 1/ is TRUE and 2.1/ is FALSE and 2.2/ is FALSE then session is sampled into ASH as ‘ON CPU’
    If 1/ is TRUE and 2.1/ is FALSE and 2.2/ is TRUE then session is not active and not sampled into ASH
    If 1/ is FALSE then session is not active and not sampled into ASH

    So being ‘ON CPU’ in ASH is derived and can inaccurate for a couple of known reasons: 1/ un-instrumented wait events, 2/ mis-classified idle events

    Both of these reasons are essentially bugs but have happened in past, so knowing the RDBMS version and what the workload is doing are useful to diagnosing if there is known issue on specific platform.

  32. Rumata
    February 27th, 2013 at 07:51 | #31

    We using 10.2.0.4.0 with OLTP system.

  33. JB
    February 27th, 2013 at 19:22 | #32

    @Rumata
    Rumata -
    Can you post the full outputs from running the script?

    My reply above is both overly complex and somewhat inaccurate, so I’ll try again.

    Simplification:
    The ASH sampler regards any sampled session that is in a database call and not waiting for a known event (idle or non-idle) to be “runnable” and therefore either on the CPU or waiting for the CPU so it is sampled and session_state is ‘ON CPU’.

    Correction:
    Mis-classified idle events will blow up DB Time estimates from ASH but not for the ‘ON CPU’ sessions but rather those ‘WAITING’ (on the mis-classified event.)

    Possibility:
    Of course the first thing we should question is whether the script is correct or not, and I had not looked it over in detail. However having now done so there are bugs in the script that may or may not explain your output.

    The script compares outputs from key metrics in v$sysmetric with ASH over the last 1-minute which is a very small amount of ASH data to be summarizing. Now the values in v$sysmetric show the last computed 1-minute metric value and these are stamped with a begin_time and end_time, and in general end_time SYSDATE. Kyle’s script simply selects ASH rows for the last minute instead of the ASH rows that are from the exact same minute as the v$sysmetric values, so technically different 1-minute time periods are being compared here. Bug, the script needs to join to ASH rows between begin_time and end_time for v$sysmetric.

    The script would do better to use v$sysmetric_history (60 1-minute metric values) properly joined to ASH, and then AVERAGE the values computed.

  34. JB
    February 27th, 2013 at 19:25 | #33

    I see the blog software swallowed my symbolic “not equals”…correction:

    “…and in general end_time is not equal to SYSDATE.”

  35. February 27th, 2013 at 23:27 | #34

    @JB: good catch JB. Such details are why having a vetted collection of ash script is important. Just added the above sql cpu_consumed_verses_cpuwait.sql to ashmasters at https://github.com/khailey/ashmasters/blob/master/cpu_consumed_verses_cpuwait.sql
    . Haven’t had a chance to add your suggested improvement. Maybe someone wants to take it on

  36. tytchong
    March 4th, 2013 at 18:34 | #35

    Hello Kyle, first of all thank you very much for this blog post it’s very useful.

    I am also interested in understanding how oracle interprets “Host CPU utilization”. I was trying to match up the output of the query against something like mpstat during the same time period but I don’t think it adds up. Is it just the CPU cycles spent in user mode? Or does it take sys mode, io wait, etc into account as well?

  37. karun B
    March 21st, 2013 at 18:37 | #36

    Hello Kyle,
    I executing one of your last query on this page that start with “with AASSTAT ”
    here are my output
    CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT
    ———- ———- ———- ———— ———- ———- ———-
    7.25 7.237 .013 0 0 .001 .003

    when I top on host machine my cpu utilitzation ~14% why this differenc.
    As my query is executing in parallel, does that impact you query output?

  38. March 21st, 2013 at 18:47 | #37

    @karun:
    according to the output, the machine is fairly busy but the Oracle database is not (well not as far as CPU goes)

  39. karun B
    March 22nd, 2013 at 01:40 | #38

    @Kyle Hailey
    thanks for respone, I understand the machine is busy, how do corelate this output 14% cpu usage on the machine.

  40. March 22nd, 2013 at 13:50 | #39

    @karun:
    the sql shows 7.25 average active session on CPU, ie 7.25 processes on CPU. If that represents ~14% of total CPU then
    # of cores = (1/~.14) * 7.25 = ~52
    How many cores does your machine have?

  41. karun B
    March 22nd, 2013 at 23:35 | #40

    there are 24 cores on the machine, I am little confused now

  42. March 22nd, 2013 at 23:43 | #41

    Well I’m wondering where the 14% CPU usage is coming and how it is calculated. If it involves hyperthreading then 24 cores could equal 48 hyperthreaded cores and 48 ~= 52
    But all of this doesn’t really matter to me. What matters to me is that there are on average 7 active sessions on CPU and 24 cores, so the machine has plenty of CPU capacity during this sample

  43. karun B
    March 25th, 2013 at 19:25 | #42

    Could you please explain with a example when you say thin in above article
    “An AAS of 1 is equivalent to 100% of a core, so, OS CPU is about 2% of a core and of that Oracle used 0.2% of a core.
    Not a very active system, and we can look at an active system later, but what I wanted to point out is that this query is missing an important statistic: the demand for CPU by Oracle. We can only add that, AFAIK, by joing in ASH:”

    How to co-relate this statement for multi core system especially for value for “CPU Usage Per Sec”, like with 24 cpu.

    1 AAS = 100% of a Core
    1 ASS = how much % on 24 cores?

    value of “CPU Usage Per Sec”/24

  44. March 26th, 2013 at 05:13 | #43

    1 aas on 24 core machine = 1/24 % = 4% CPU ( when aas state is all CPU )

  45. vijay
    April 20th, 2013 at 00:21 | #44

    Hi,
    I can’t understand:
    when we are getting Host CPU Utilization (%),why are we multiplying it with cpu count to get CPU_OS? If it the metric Host CPU Utilization (%) is representing only for single cpu,then how is that we are simply multiplying with cpu count to get total CPU_OS,as each cpu utilization may be different?

  46. April 20th, 2013 at 00:27 | #45

    @Vijay – multiplying CPU% by number of CPUs results in number of active sessions on CPU.
    If there is one CPU at 100% then it’s the equivalent of 1 session fully (100%) active on 1 CPU. Of course the activity might be coming from 100 sessions all using 1% CPU, but even those 100 sessions are all taking 100% of the CPU at a small enough slice in time. Overall effect is that on average one session is taking up 100% of the CPU, ie on average there is one active session on CPU.
    CPU_OS = average OS sessions active on CPU.
    1 average active session on 1 CPU core (ie OS_CPU )= 100% CPU
    2 average active sessions on 4 CPU cores (OS_CPU) = 50% CPU

  47. April 20th, 2013 at 00:29 | #46

    average active sessions = a magic number (hats off to De la Sol )

  48. vijay
    April 20th, 2013 at 00:43 | #47

    Other question is:
    The CPU_ORA_CONSUMED,CPU_OS are derived from v$sysmetric which has accumulated values from instance startup,while the CPU_DEMAND is derived from ash memory which doesn’t store all the cpu demand,as it has to flush to ash disk(hist) periodically. We may need to dervive cpu_demand with sample time starting from latest instance startup time.
    And many times we are not interested in the values from instance startup,but rather during a certain period.For this purpose,cpu_ora_consumed,cpu_os may be dervied from awr and cpu_demand from ash with conditions on sample time.

  49. April 20th, 2013 at 05:41 | #48

    @Vijay: was the last comment a question or a comment?

  50. vijay
    April 20th, 2013 at 05:53 | #49

    Hi,It’s a comment. I see there is need to change the query when querying ash to include from instance startup time.

  51. vivek
    April 28th, 2013 at 15:24 | #50

    Hi Chris,

    Thanks for the awesome report scripts, i have one query, i want historic CPU usage group by dbusers or osuser, as i have users that fall in different consumer groups.

    Your expertise help will really help me.

    thanks

  52. Ajay
    June 13th, 2013 at 21:37 | #51

    This post is very interesting but I could not understand it completely so need your help in interpreting results
    I am on 11.2.0.2 HP-UX Itanium with 12 CPU’s
    1]
    METRIC_NAME VALUE METRIC_UNIT
    —————————— ———- ————————-
    Host CPU Utilization (%) 68.0972937 % Busy/(Idle+Busy)
    CPU Usage Per Sec 293.686119 CentiSeconds Per Second
    CPU Usage Per Txn 2.42678109 CentiSeconds Per Txn
    Database CPU Time Ratio 9.17658878 % Cpu/DB_Time
    Background CPU Usage Per Sec 59.8417409 CentiSeconds Per Second
    Host CPU Usage Per Sec 540.916584 CentiSeconds Per Second

    2]
    CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT
    ———- ———- ———- ———— ———- ———- ———-
    10.361 2.511 2.937 4.913 .502 21.447 3.248

    How Do I co-relate numbers ?
    CPU_TOTAL = There are on average 10 sessions running on cpu ?
    I did not understand other columns in 2] and how do they corelate to 68% Host CPU utilization
    Pls help.

  53. Ajay
    June 13th, 2013 at 21:43 | #52

    Small correction in previous post. Its 8 CPU instead of 12

    NAME TYPE VALUE
    ———————————— ———– ——————————
    cpu_count integer 8
    parallel_threads_per_cpu integer 2
    resource_manager_cpu_allocation integer 8

    CLASS S
    —————- ——-
    CPU_ORA_CONSUMED 2.7
    CPU_ORA_DEMAND 8.6
    CPU_OS 5.0
    Commit 0.4
    User I/O 18.9
    Wait 2.2

    Basically I want to know. How much cpu is used by db processes, os processes, waiting on IO and how is the system

  54. June 14th, 2013 at 03:52 | #53

    @Ajay :

    From your first post

    CPU_TOTAL 10.361
    CPU_OS 2.511
    CPU_ORA 2.937
    CPU_ORA_WAIT 4.913
    COMMIT .502
    READIO 21.447
    WAIT 3.248

    68% is about 2/3 ( 2/3 = 66.6667 %)
    so 2/3 of 8 CPUs is (2*8)/3 = 5.3
    the CPU reported is 2.5 for CPU_OS and 2.9 for CPU_ORA making a total of 5.4 which is about 5.3

    CPU_OS is CPU used by processes not part of the database you are looking at
    CPU_ORA is the CPU being used by the database you are looking at
    CPU_ORA_WAIT is the CPU that the database you are looking at wants but isn’t getting

    in your stats most of the time is spent waiting on I/O. on average 21 oracle processes are waiting for I/O
    Your system has some CPU contention but the database is much more bottlenecked on I/O wait

  55. Ajay
    June 14th, 2013 at 19:19 | #54

    Thanks for detailed explanation/help along with numbers. Request you few more clarifications
    Database is bottlenecked on I/O. – I agree and understand it.
    What numbers shows that my system has CPU contention ? If my CPU usage is 68% then why rest of 32% is not used before declaring as CPU contention ?
    CPU_ORA_WAIT is 4.913 – why db is not getting cpu even though I have 32% free cpu ?
    How do I interpret WAIT=3.248 and COMMIT=.502 ? I mean what does it indicate ?

  56. June 14th, 2013 at 21:36 | #55

    Your CPU demand is greater than your CPU capabilities. You database + OS is demanding 10.3 CPUs on average by your machine only had 8, so you are CPU starved.
    What the waits are I don’t know. This article is mainly about CPU.
    For wait analysis check out the Top Activity chart in Enterprise Manager
    or use DB Optimizer from Embarcadero. The following URL
    https://sites.google.com/site/embtdbo/profiler
    explains more about waits and load

  57. Ajay
    June 19th, 2013 at 14:13 | #56

    Thanks for your help. I will dig more on cause of wait events.

  58. Ajay
    June 19th, 2013 at 18:02 | #57

    1 quick question about WAIT=3.248 and COMMIT=.502
    I will get into details of wait from db side but in this output, How do I interpret it ? Is it on avg 3.2 sessions are waiting for CPU ? Is there any direct relation of wait number and commit numbers in output with CPU_TOTAL or OS_CPU or any other columns ?
    what should do I infer looking at wait and commit numbers in above output

  59. June 19th, 2013 at 18:19 | #58

    “WAIT” is wait for non-CPU , non-COMMIT, non-I/O
    “WAIT” is other stuff like latches, locks, buffer access etc
    wait for CPU is CPU_ORA_WAIT
    your database load profile from the data you gave its

    CPU_ORA 2.937
    CPU_ORA_WAIT 4.913
    COMMIT .502
    READIO 21.447
    WAIT 3.248

    if we add all those numbers up we get 33 or so, meaning over that period of time there were 33 session active on average.
    Of that 33, there were 21 on average were waiting for I/O, so that’s your BIG bottleneck.
    Your next biggest area is CPU wait , CPU_ORA_WAIT, which is almost 5. You shouldn’t be waiting for CPU so that’s a problem. Your machine doesn’t have enough CPU to handle the load.
    As far as WAIT and COMMIT goes, they are small compared to your other problems. Don’t worry about them unless a particular session you care about is stuck on those mainly. You can see that by going to the “Top Activity” screen in Enterprise manager (or using a tool like DB Optimizer from Embarcadero)


four + 1 =