Oracle CPU time
There are 3 kinds of CPU in the Oracle stats.
- Oracle CPU used
- System CPU used
- Oracle demand for CPU
Starting in 10g Oracle records both the CPU used by the instance as well as the load on the system in v$sysmetric. This is awesome as we can see how busy the system is and how much of the CPU Oracle is responsible for:
col metric_name for a25 col metric_unit for a25 select metric_name, value, metric_unit from v$sysmetric where metric_name like'%CPU%' where group_id=2; METRIC_NAME VALUE METRIC_UNIT ------------------------------ ---------- ------------------------------ CPU Usage Per Sec 251.067016 CentiSeconds Per Second CPU Usage Per Txn 5025.52477 CentiSeconds Per Txn Host CPU Utilization (%) 11.6985845 % Busy/(Idle+Busy) Database CPU Time Ratio 76.3291033 % Cpu/DB_Time
Now the question is how do we convert these to something useful? For me I put it into the equivalent of AAS and compare it to the core count:
select 'CPU_ORA_CONSUMED' CLASS, round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS' CLASS , round((prcnt.busy*parameter.cpu_count)/100,3) sAAS from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter; CLASS AAS ---------------- ---------- CPU_ORA_CONSUMED .002 CPU_OS .022
An AAS of 1 is equivalent to 100% of a core, so, OS CPU is about 2% of a core and of that Oracle used 0.2% of a core.
Not a very active system, and we can look at an active system later, but what I wanted to point out is that this query is missing an important statistic: the demand for CPU by Oracle. We can only add that, AFAIK, by joing in ASH:
select 'CPU_ORA_CONSUMED' CLASS, round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS' CLASS , round((prcnt.busy*parameter.cpu_count)/100,3) AAS from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter union select 'CPU_ORA_DEMAND' CLASS, nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS from v$active_session_history ash where SAMPLE_TIME > sysdate - (60/(24*60*60)); CLASS AAS ---------------- ---------- CPU_ORA_CONSUMED .001 CPU_ORA_DEMAND .02 CPU_OS .019
So the demand for CPU was higher than the amount consumed. Now the demand for CPU is coming from ASH which is sampled so the accuracy is weak, but in larger sample sets or busier systems it’s pretty darn good. The demand alert us to CPU starvation on a busy system.
I like to wrap all this up into a query with all the wait classes to see the overall load on Oracle including CPU consumed by Oracle, CPU demanded by Oracle and CPU used at the OS level:
select decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait') CLASS, sum(round(m.time_waited/m.INTSIZE_CSEC,3)) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' group by decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait') union select 'CPU_ORA_CONSUMED' CLASS, round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS' CLASS , round((prcnt.busy*parameter.cpu_count)/100,3) AAS from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter union select 'CPU_ORA_DEMAND' CLASS, nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS from v$active_session_history ash where SAMPLE_TIME > sysdate - (60/(24*60*60)); CLASS AAS ---------------- ---------- CPU_ORA_CONSUMED .002 CPU_ORA_DEMAND .03 CPU_OS .023 Commit 0 User I/O 0 Wait 0
Ideally I’d want the CPU stats to be subsets of each other so that I could have a graphically stack-able set of statistics
now rolling it all together
with AASSTAT as ( select decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait') CLASS, sum(round(m.time_waited/m.INTSIZE_CSEC,3)) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' group by decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait') union select 'CPU_ORA_CONSUMED' CLASS, round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS' CLASS , round((prcnt.busy*parameter.cpu_count)/100,3) AAS from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter union select 'CPU_ORA_DEMAND' CLASS, nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS from v$active_session_history ash where SAMPLE_TIME > sysdate - (60/(24*60*60)) ) select ( decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) + CPU_ORA_CONSUMED + decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED ))) CPU_TOTAL, decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) CPU_OS, CPU_ORA_CONSUMED CPU_ORA, decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED )) CPU_ORA_WAIT, COMMIT, READIO, WAIT from ( select sum(decode(CLASS,'CPU_ORA_CONSUMED',AAS,0)) CPU_ORA_CONSUMED, sum(decode(CLASS,'CPU_ORA_DEMAND' ,AAS,0)) CPU_ORA_DEMAND, sum(decode(CLASS,'CPU_OS' ,AAS,0)) CPU_OS, sum(decode(CLASS,'Commit' ,AAS,0)) COMMIT, sum(decode(CLASS,'User I/O' ,AAS,0)) READIO, sum(decode(CLASS,'Wait' ,AAS,0)) WAIT from AASSTAT) / CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT ---------- ---------- ------------ ---------- ---------- ---------- .02 .002 0 0 0 0
Now let’s run up some load on a machine and database.
Take two databases, run up the CPU demand on both and add some wait contention. The machine has 24 cores so there is a definitely a problem when the CPU_TOTAL goes over 24. I’m running 14 sessions each trying to burn a core on two different databases. The first few lines the test is ramping up
SQL> / CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT ---------- ---------- ---------- ------------ ---------- ---------- ---------- 14.887 .387 13.753 .747 0 0 .023 SQL> / CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT ---------- ---------- ---------- ------------ ---------- ---------- ---------- 21.989 7.469 12.909 1.611 0 0 .044 SQL> / CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT ---------- ---------- ---------- ------------ ---------- ---------- ---------- 26.595 12.125 11.841 2.629 0 0 .025 SQL> / CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT ---------- ---------- ---------- ------------ ---------- ---------- ---------- 27.045 12.125 11.841 3.079 0 0 .025
Historically CPU used by Oracle was derived from
v$sysstat.name=’CPU used by this session’
but this statistic had problems as the value was only updated every time a call ended. A call could be a 1 hour PL/SQL procedure which would thus report zero cpu usage in the stats until it finished and the CPU would spike off the scale.
ASH had always been the most stable way to gather CPU demand, though Oracle has made improvements in gathering CPU statistics. I believe that the time model gathers CPU every 5 seconds in 10g, and in 11g it’s possible that CPU stats are gathered every second
Here is a visual example of a machine that has server memory contention, massive amounts of paging. There is OS CPU being used, but hardly any CPU being used by Oracle which makes sense as it’s an idle database, but what is revealing is the massive amount of CPU wait by Oracle. Oracle only has a little bit of work to do to take care of an idle database but we can see that most of Oracle’s CPU time is wait for CPU time as when it wants to work, pages have to be read back in,
I have my doubts as to the clarity of the layout of the above graph. A possibly clearly graph would be simpling adding a line representing available CPU and take out the OSCPU bars. In the above graph I’ve charted OSCPU usage as AAS, ie average active sessions, mixing AAS of the database with AAS at the OS level. I think a possible clear representation would be to show the Core count line, and draw the OSCPU usage shown upside down from the # of core lines, thus the space from the bottom axis to where the OSCPU reaches down would be available CPU.
Trackbacks
Comments
these queries are very useful. however, the results from my system don’t really make sense.
CLASS AAS
—————- ———-
CPU_ORA_CONSUMED .661
CPU_ORA_DEMAND 1.75
CPU_OS 1.67
Commit .004
User I/O .007
Wait .013
Any idea how Oracle could report it’s CPU usage as higher than the server it is running on?
11.2.0.2 on RHEL 5 on VMWare.
Hey Chris,
Those stats are quite feasible.
Oracle’s CPU demand was 1.75 of which only .661 was consumed, so there was 1.089 was waiting.
The OS CPU was 1.67 of which Oracle consumed .661.
Now, why the OS didn’t give Oracle more in that period is a good question – were there any memory issues on the system at that point in time?
I like the last version of the script that lays out the components better, ie OS CPU used, Oracle CPU used and Oracle CPU wait.
– Kyle
I found this to be VERY useful. Thanks!
very good … thanks!
Hi Kyle,
I’m interested in the group_id… I’ve looked in the docs to see why there are two (in my case) and the docs just say group_id = metric_id and metric_id has a description of metric_id… I may be missing something really simple here, but what’s the difference between the 2. In my DB I see the following:
SQL> /
GROUP_ID METRIC_ID METRIC_NAME VALUE METRIC_UNIT
———- ———- ————————- ———- ————————-
2 2057 Host CPU Utilization (%) 58.9297201 % Busy/(Idle+Busy)
2 2075 CPU Usage Per Sec 877.185667 CentiSeconds Per Second
2 2076 CPU Usage Per Txn 77.3670053 CentiSeconds Per Txn
2 2108 Database CPU Time Ratio 48.6910089 % Cpu/DB_Time
3 2057 Host CPU Utilization (%) 70.0992814 % Busy/(Idle+Busy)
3 2108 Database CPU Time Ratio 50.39545 % Cpu/DB_Time
6 rows selected.
SQL> l
1* select group_id, metric_id, metric_name, value, metric_unit from v$sysmetric where metric_name like’%CPU%’
SQL>
So, clearly 2 different values, but what is the difference? Is it a different sampling frequency, I did wonder if that was the answer but then I noticed the 2 groups have a different number of metrics:
SQL> select GROUP_ID, count(METRIC_ID) from v$sysmetric group by GROUP_ID;
GROUP_ID COUNT(METRIC_ID)
———- —————-
2 148
3 42
I’d appreciate it if you had the time to explain what the difference is?
Thanks,
Simon
The two group_id values are the time periods.
group_id 2 is 60 second deltas
group_id 3 is 15 second deltas
for v$sysmetric this represents the last 15 or 60 second delta.
for v$sysmetric_history it will be an hour of 60 second deltas and 3 minutes of 15 second deltas.
you can see this by looking at INTSIZE_CSEC in v$systemetric_history:
select distinct INTSIZE_CSEC, GROUP_ID from v$sysmetric_history;
INTSIZE_CSEC GROUP_ID
———— ———-
1502 3
5908 2
6010 2
5909 2
1503 3
1402 3
6009 2
6008 2
There is a little bit of variability but basically GROUP_ID 2 is 60 seconds and GROUP_ID 3 is 15.
select group_id, (max(END_TIME) – min(BEGIN_TIME))*60*24
from v$sysmetric_history
group by group_id
GROUP_ID (MAX(END_TIME)-MIN(BEGIN_TIME))*60*24
———- ————————————-
2 61.0166667
3 3.26666667
Oracle left much to be desired in explaining and documenting this info, most likely because it was designed for consumption by AWR, Alerts and OEM and not direct access.
Hi Kyle,
your article has been of great help, thanks a lot!
Best regards,
Nikolay
Dear Sir:
My query is very simple. I want to know that what percentage of CPU is utilized
by an Oracle instance (serving queries, waiting to server queries, PMON, RMAN etc etc).
I want to take the result every 1 hour and plot on a graph.
Is the column CPU_ORA depicts %age used by total Oracle database system?
Thanks in advance, Caesae
Dear Sir:
My query is very simple. I want to know that what percentage of CPU is utilized
by an Oracle instance (serving queries, waiting to server queries, PMON, RMAN etc etc).
I want to take the result every 1 hour and plot on a graph.
Is the column CPU_ORA depicts %age used by total Oracle database system?
Thanks in advance, Caesar
I am running your query. The machine is 2 x Dual Core Intel Itanium, running on RHEL 4 for Itanium.
The result I am getting is:
CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT
———- ———- ———- ————
211.454 0 211.454 0
Please clear my doubts – why is Total CPU so high?
Hi Caesar,
Those numbers do look strange.
Two things you could do:
1. run the first query on : http://dboptimizer.com/2012/01/17/looking-at-awr-repositories-for-performance-issues/
2. export your AWR and send it to me: http://gavinsoorma.com/2009/07/exporting-and-importing-awr-snapshot-data/
AWR export might be a bit big, so limit it to a day where you see #’s like above. If it’s to big to email (compress it first), I’ll send you an ftp location
– Kyle
@Kyle Hailey
I will definitely run the first query from http://dboptimizer.com/2012/01/17/looking-at-awr-repositories-for-performance-issues/ and will send it to you. Currently I am just getting “per second values” every hour using the following SQLs
— Calculate today’s metrics
SELECT MAX(NVL(ORA_CPU_USAGE_PER_SEC,0))
INTO v_ora_cpu_util
FROM (
select round(value/100,3) ORA_CPU_USAGE_PER_SEC
from v$sysmetric_history
where metric_name=’CPU Usage Per Sec’
and begin_time >= SYSDATE – INTERVAL ‘1’ HOUR
and end_time = SYSDATE – INTERVAL ‘1’ HOUR
and end_time sysdate – (60/(24*60*60));
How will I send something to you?? The message getting truncated here. Please see my above message.
Caesar
@Kyle Hailey
Have only just seen the reply, thanks Kyle! Simon
Hi Kyle,
Can the above queries be useful for gathering/raising alerts where active session usage – CPU crosses a certain percentage.
Hi Kyle,
Using Oracle 10gR2. Want to set a proactive alert (through OEM db console, dbms_server_alerts) to trap the average active sessions/session usage > 70%.
Hi Sid,
By “session usage” do you mean connected sessions?
First let’s start with alerts on Average Active Sessions (AAS). In OEM 10g there was no alert for AAS! Strange but true. There is a way to set alerts on AAS though. One way is to go into “Manage Metrics” link at the bottom of OEM home page or performance page for a database. Choose “edit thresholds”, then go to “Database Time (centiseconds per second)”. This statistics is 100*AAS, so if I want an alert on AAS=10, then I set an alert on “Database Time” to 1000.
Now if you want to do calculations, such as AAS/(number of connected users) then you will have to create your own user defined metric. See the “User-Defined Metrics” link at the bottom of the database home page or performance page.
I believe the alerts are better on 11g but I haven’t investigated them.
– Kyle
Hi Kyle,
Thanks for your reply.
Yes connected sessions.
Need to trap the Average Active Sessions Usage. Say over 50% usage it will generate a alert in OEM page alongwith a email notification. Now the usage will be a combination of CPU,User I/O, System I/O etc.
What you mentioned above is Database Time for average active sessions. Or is it the same. ?
If I want to do it through dbms_server_alerts or using a shell script is it possible.
If yes then can provide a example on this.
Hi Sid,
Yes DB Time normalized by the elapsed time is Average Active Sessions when both the numerator and denominator have the same unit. The value “Database Time (centiseconds per second)” = (db time in centi-seconds)/ (elapsed seconds). Thus to convert it to AAS becomes
( (1 sec)/(100 centi-seconds) ) * ( db time in centi-seconds)/ seconds) = AAS.
In 10g OEM to set an alert on (AAS/connected_sessions) you would have to set a custom alert. As far as setting custom alerts I don’t have any code examples. Let us know if you find any on other sites and we can reference that code here. It’s also possible that 11g and 12c OEM have options do alerts directly on AAS.
CPU wait only reported by Oracle as the difference between CPU cycles burned and time that Oracle process was runnable from the Oracle perspective.
This information doesn’t come from any OS tools
>>1-Is CPU wait always a wait on I/O?
no, could simply be that no more CPU is available
>>2-will Memory and Virtual memory have anything to do with CPU wait?
Sure if a process is paged out, then it will probably have show CPU wait time. In this case it was waiting for page ins before it could use CPU
>>3-if a system has lots of CPU wait will adding more CPU or cores fix this problem?
Only if the issue is a CPU saturation issue and not something like a memory pressure issue
>> 4-what is the best OS tool to report CPU wait other than VMSTAT?
Only Oracle gives this info
See http://dboptimizer.com/2011/07/21/oracle-cpu-time
A request: ask these questions on the blog that way everyone can benefit
Thanks
Kyle
Kyle
But CPU wait is reported by vmstat in WA column,and also Ganglia shows it,how can we get a breakdown of the causes of these waits if it is IO,network or memory.
@Steve Jamro,
AFAIK, the wait CPU in vmstat is I/O wait. Historically the wait column of vmstat has been mostly bunk, simply meaning someone is waiting for I/O. There is run queue depth, the first column of vmstat output, that gives some indication of wait for CPU. The run queue column varies from # of users queued for CPU to number of users queued on CPU plus # of users on CPU. Either way, it doesn’t report how much time is waited for CPU.
THe first hit on google for “wait vmstat” turns up AIX vmstat documentation
http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.prftungd%2Fdoc%2Fprftungd%2Fvmstat_command.htm
saying “The reported CPU activity is a percentage breakdown of user mode, system mode, idle time, and waits for disk I/O.”
“The wa column details the percentage of time the CPU was idle with pending local disk I/O”
Hi Kyle, Nice post.
Quick question with some numbers trying understand.
When i run the below query, i see sometimes, the host_cpu_util% goes to 250. I’m getting the metric by instance as you see in the query, shouldn’t it be always within 100. My cpu_count is 8 per node.
select instance_number node, trunc(begin_time, ‘HH24’) snapTime,
round(sum(case metric_name when ‘Host CPU Utilization (%)’ then maxval end)) Host_CPU_util,
round(sum(case metric_name when ‘CPU Usage Per Sec’ then maxval end)/100) DB_CPU_per_sec, — /100 for per sec conversion
from dba_hist_sysmetric_summary
where begin_time > trunc(sysdate)-1
and begin_time < trunc(sysdate)
group by instance_number, trunc(begin_time, 'HH24')
order by instance_number, trunc(begin_time, 'HH24')
If you have get CPU information from DBA_HIST views what would be the query like?
Could you help plz?
If you have get CPU information from DBA_HIST views what would be the query like?
Could you help plz?
CPU_ORA_CONSUMED is time, CPU_ORA_DEMAND is count per second. How can we compare this numbers?
Moreover CPU_ORA_DEMAND is not CPU demand. If we have 5 cores and 15 sessions(all 15 wants to use CPU) then CPU_ORA_DEMAND takes into account only 5 because 10 other is ‘waiting’ not ‘on cpu’.
@rumata : yes CPU_ORA_CONSUMED is time and CPU_ORA_DEMAND is a count so a purist would say these aren’t comparable, but CPU_ORA_DEMAND is a proxy for time and approaches time demand the more data we have. CPU_ORA_DEMAND is more appropriately time under the curve where the curve is defined by session count of sessions in a CPU state. As we reduce the sampling time the curve becomes more defined and accurate as in taking the integral where the height is the session count and for our purposes the width is 1 second: http://hyperphysics.phy-astr.gsu.edu/hbase/integ.html so CPU demand becomes 1 second times sessions demanding CPU.
.
For the second point, CPU_ORA_DEMAND is the number or sessions who want to run on CPU.
If you have 15 sessions who want CPU then their state in v$active_session_history will be “ON CPU”. If you have 15 session with state “ON CPU” and only 5 cores allocated to your machine (VM?) then only 5 of them can be actually on the CPU at any point in time.
@Kyle Hailey
Counting in-memory ASH samples and equating that to database time is not immediately and intuitively obvious to most people, but it works. This is what I call “ASH Math” and the basic statements are simple:
COUNT(*) FROM V$ASH ~ DB Time (in seconds)
COUNT(*) *10 FROM DBA_HIST_ASH ~ DB Time (in seconds)
One semi-intuitive way to think of this is to analogize the ASH sampling mechanism to an old-style movie camera: every second a new “frame” is taken of all active sessions and what they are doing, e.g. session1 is waiting for IO, session2 is using CPU, session3 is waiting for an enqueue, etc. So if we were to play this movie back “as it happened” each “frame” would display for 1 second and then answering a question like “how much time did session3 spend waiting for enqueue” would be accomplished by adding up all the frames in which session3 is waiting for enqueue and counting each as 1 second of time. It’s not a perfect movie of what happened over the time period by any means, but it is our best unbiased estimate of what happened, since our movie-making mechanism is independent of the scenes it records.
There is a mathematical explanation that should be more convincing but is best done with diagrams. In essence, it shows that adding up in-memory ASH samples over some continuous time period amounts to estimating the integral of the “active sessions function” which is to say estimating DB Time since the active sessions function over time is precisely the derivative of the DB Time function over time (rate of change of DB Time at time t0 = number of active sessions at time t0.)
My bad. This query seems correct.
But it’s producing very strange results. At moment we have 4 sessions, 5 runnable processes, CPU load about 25%. Host has 8 cores.
And sql results:
CPU_ORA_CONSUMED 0,355
CPU_ORA_DEMAND 0,83
Demand is twice higher on absolutely free instance.
Can anybody explain this results?
@rumata
You didn’t supply all the output nor state version of the DB, but there are potential “gotchas” with ASH-estimated CPU demand that could explain it.
Basically, ASH samples “ON CPU” are not directly observed to be on the CPU but rather this state is derived basically as follows:
1/ is the session in a database call?
2.1/ is the session waiting on a non-idle wait event?
2.2/ is the session waiting on an idle wait event?
(note that 2.1 and 2.2 cannot both be true)
If 1/ is TRUE and 2.1/ is TRUE then session is sampled into ASH as ‘WAITING’
If 1/ is TRUE and 2.1/ is FALSE and 2.2/ is FALSE then session is sampled into ASH as ‘ON CPU’
If 1/ is TRUE and 2.1/ is FALSE and 2.2/ is TRUE then session is not active and not sampled into ASH
If 1/ is FALSE then session is not active and not sampled into ASH
So being ‘ON CPU’ in ASH is derived and can inaccurate for a couple of known reasons: 1/ un-instrumented wait events, 2/ mis-classified idle events
Both of these reasons are essentially bugs but have happened in past, so knowing the RDBMS version and what the workload is doing are useful to diagnosing if there is known issue on specific platform.
We using 10.2.0.4.0 with OLTP system.
@Rumata
Rumata –
Can you post the full outputs from running the script?
My reply above is both overly complex and somewhat inaccurate, so I’ll try again.
Simplification:
The ASH sampler regards any sampled session that is in a database call and not waiting for a known event (idle or non-idle) to be “runnable” and therefore either on the CPU or waiting for the CPU so it is sampled and session_state is ‘ON CPU’.
Correction:
Mis-classified idle events will blow up DB Time estimates from ASH but not for the ‘ON CPU’ sessions but rather those ‘WAITING’ (on the mis-classified event.)
Possibility:
Of course the first thing we should question is whether the script is correct or not, and I had not looked it over in detail. However having now done so there are bugs in the script that may or may not explain your output.
The script compares outputs from key metrics in v$sysmetric with ASH over the last 1-minute which is a very small amount of ASH data to be summarizing. Now the values in v$sysmetric show the last computed 1-minute metric value and these are stamped with a begin_time and end_time, and in general end_time SYSDATE. Kyle’s script simply selects ASH rows for the last minute instead of the ASH rows that are from the exact same minute as the v$sysmetric values, so technically different 1-minute time periods are being compared here. Bug, the script needs to join to ASH rows between begin_time and end_time for v$sysmetric.
The script would do better to use v$sysmetric_history (60 1-minute metric values) properly joined to ASH, and then AVERAGE the values computed.
I see the blog software swallowed my symbolic “not equals”…correction:
“…and in general end_time is not equal to SYSDATE.”
@JB: good catch JB. Such details are why having a vetted collection of ash script is important. Just added the above sql cpu_consumed_verses_cpuwait.sql to ashmasters at https://github.com/khailey/ashmasters/blob/master/cpu_consumed_verses_cpuwait.sql
. Haven’t had a chance to add your suggested improvement. Maybe someone wants to take it on
Hello Kyle, first of all thank you very much for this blog post it’s very useful.
I am also interested in understanding how oracle interprets “Host CPU utilization”. I was trying to match up the output of the query against something like mpstat during the same time period but I don’t think it adds up. Is it just the CPU cycles spent in user mode? Or does it take sys mode, io wait, etc into account as well?
Hello Kyle,
I executing one of your last query on this page that start with “with AASSTAT ”
here are my output
CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT
———- ———- ———- ———— ———- ———- ———-
7.25 7.237 .013 0 0 .001 .003
when I top on host machine my cpu utilitzation ~14% why this differenc.
As my query is executing in parallel, does that impact you query output?
@karun:
according to the output, the machine is fairly busy but the Oracle database is not (well not as far as CPU goes)
@Kyle Hailey
thanks for respone, I understand the machine is busy, how do corelate this output 14% cpu usage on the machine.
@karun:
the sql shows 7.25 average active session on CPU, ie 7.25 processes on CPU. If that represents ~14% of total CPU then
# of cores = (1/~.14) * 7.25 = ~52
How many cores does your machine have?
there are 24 cores on the machine, I am little confused now
Well I’m wondering where the 14% CPU usage is coming and how it is calculated. If it involves hyperthreading then 24 cores could equal 48 hyperthreaded cores and 48 ~= 52
But all of this doesn’t really matter to me. What matters to me is that there are on average 7 active sessions on CPU and 24 cores, so the machine has plenty of CPU capacity during this sample
Could you please explain with a example when you say thin in above article
“An AAS of 1 is equivalent to 100% of a core, so, OS CPU is about 2% of a core and of that Oracle used 0.2% of a core.
Not a very active system, and we can look at an active system later, but what I wanted to point out is that this query is missing an important statistic: the demand for CPU by Oracle. We can only add that, AFAIK, by joing in ASH:”
How to co-relate this statement for multi core system especially for value for “CPU Usage Per Sec”, like with 24 cpu.
1 AAS = 100% of a Core
1 ASS = how much % on 24 cores?
value of “CPU Usage Per Sec”/24
1 aas on 24 core machine = 1/24 % = 4% CPU ( when aas state is all CPU )
Hi,
I can’t understand:
when we are getting Host CPU Utilization (%),why are we multiplying it with cpu count to get CPU_OS? If it the metric Host CPU Utilization (%) is representing only for single cpu,then how is that we are simply multiplying with cpu count to get total CPU_OS,as each cpu utilization may be different?
@Vijay – multiplying CPU% by number of CPUs results in number of active sessions on CPU.
If there is one CPU at 100% then it’s the equivalent of 1 session fully (100%) active on 1 CPU. Of course the activity might be coming from 100 sessions all using 1% CPU, but even those 100 sessions are all taking 100% of the CPU at a small enough slice in time. Overall effect is that on average one session is taking up 100% of the CPU, ie on average there is one active session on CPU.
CPU_OS = average OS sessions active on CPU.
1 average active session on 1 CPU core (ie OS_CPU )= 100% CPU
2 average active sessions on 4 CPU cores (OS_CPU) = 50% CPU
average active sessions = a magic number (hats off to De la Sol )
Other question is:
The CPU_ORA_CONSUMED,CPU_OS are derived from v$sysmetric which has accumulated values from instance startup,while the CPU_DEMAND is derived from ash memory which doesn’t store all the cpu demand,as it has to flush to ash disk(hist) periodically. We may need to dervive cpu_demand with sample time starting from latest instance startup time.
And many times we are not interested in the values from instance startup,but rather during a certain period.For this purpose,cpu_ora_consumed,cpu_os may be dervied from awr and cpu_demand from ash with conditions on sample time.
@Vijay: was the last comment a question or a comment?
Hi,It’s a comment. I see there is need to change the query when querying ash to include from instance startup time.
Hi Chris,
Thanks for the awesome report scripts, i have one query, i want historic CPU usage group by dbusers or osuser, as i have users that fall in different consumer groups.
Your expertise help will really help me.
thanks
This post is very interesting but I could not understand it completely so need your help in interpreting results
I am on 11.2.0.2 HP-UX Itanium with 12 CPU’s
1]
METRIC_NAME VALUE METRIC_UNIT
—————————— ———- ————————-
Host CPU Utilization (%) 68.0972937 % Busy/(Idle+Busy)
CPU Usage Per Sec 293.686119 CentiSeconds Per Second
CPU Usage Per Txn 2.42678109 CentiSeconds Per Txn
Database CPU Time Ratio 9.17658878 % Cpu/DB_Time
Background CPU Usage Per Sec 59.8417409 CentiSeconds Per Second
Host CPU Usage Per Sec 540.916584 CentiSeconds Per Second
2]
CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT
———- ———- ———- ———— ———- ———- ———-
10.361 2.511 2.937 4.913 .502 21.447 3.248
How Do I co-relate numbers ?
CPU_TOTAL = There are on average 10 sessions running on cpu ?
I did not understand other columns in 2] and how do they corelate to 68% Host CPU utilization
Pls help.
Small correction in previous post. Its 8 CPU instead of 12
NAME TYPE VALUE
———————————— ———– ——————————
cpu_count integer 8
parallel_threads_per_cpu integer 2
resource_manager_cpu_allocation integer 8
CLASS S
—————- ——-
CPU_ORA_CONSUMED 2.7
CPU_ORA_DEMAND 8.6
CPU_OS 5.0
Commit 0.4
User I/O 18.9
Wait 2.2
Basically I want to know. How much cpu is used by db processes, os processes, waiting on IO and how is the system
@Ajay :
From your first post
CPU_TOTAL 10.361
CPU_OS 2.511
CPU_ORA 2.937
CPU_ORA_WAIT 4.913
COMMIT .502
READIO 21.447
WAIT 3.248
68% is about 2/3 ( 2/3 = 66.6667 %)
so 2/3 of 8 CPUs is (2*8)/3 = 5.3
the CPU reported is 2.5 for CPU_OS and 2.9 for CPU_ORA making a total of 5.4 which is about 5.3
CPU_OS is CPU used by processes not part of the database you are looking at
CPU_ORA is the CPU being used by the database you are looking at
CPU_ORA_WAIT is the CPU that the database you are looking at wants but isn’t getting
in your stats most of the time is spent waiting on I/O. on average 21 oracle processes are waiting for I/O
Your system has some CPU contention but the database is much more bottlenecked on I/O wait
Thanks for detailed explanation/help along with numbers. Request you few more clarifications
Database is bottlenecked on I/O. – I agree and understand it.
What numbers shows that my system has CPU contention ? If my CPU usage is 68% then why rest of 32% is not used before declaring as CPU contention ?
CPU_ORA_WAIT is 4.913 – why db is not getting cpu even though I have 32% free cpu ?
How do I interpret WAIT=3.248 and COMMIT=.502 ? I mean what does it indicate ?
Your CPU demand is greater than your CPU capabilities. You database + OS is demanding 10.3 CPUs on average by your machine only had 8, so you are CPU starved.
What the waits are I don’t know. This article is mainly about CPU.
For wait analysis check out the Top Activity chart in Enterprise Manager
or use DB Optimizer from Embarcadero. The following URL
https://sites.google.com/site/embtdbo/profiler
explains more about waits and load
Thanks for your help. I will dig more on cause of wait events.
1 quick question about WAIT=3.248 and COMMIT=.502
I will get into details of wait from db side but in this output, How do I interpret it ? Is it on avg 3.2 sessions are waiting for CPU ? Is there any direct relation of wait number and commit numbers in output with CPU_TOTAL or OS_CPU or any other columns ?
what should do I infer looking at wait and commit numbers in above output
“WAIT” is wait for non-CPU , non-COMMIT, non-I/O
“WAIT” is other stuff like latches, locks, buffer access etc
wait for CPU is CPU_ORA_WAIT
your database load profile from the data you gave its
CPU_ORA 2.937
CPU_ORA_WAIT 4.913
COMMIT .502
READIO 21.447
WAIT 3.248
if we add all those numbers up we get 33 or so, meaning over that period of time there were 33 session active on average.
Of that 33, there were 21 on average were waiting for I/O, so that’s your BIG bottleneck.
Your next biggest area is CPU wait , CPU_ORA_WAIT, which is almost 5. You shouldn’t be waiting for CPU so that’s a problem. Your machine doesn’t have enough CPU to handle the load.
As far as WAIT and COMMIT goes, they are small compared to your other problems. Don’t worry about them unless a particular session you care about is stuck on those mainly. You can see that by going to the “Top Activity” screen in Enterprise manager (or using a tool like DB Optimizer from Embarcadero)
Hi Kyle,
This is very interesting.
Is this compatible with Oracle 12.2 ?
For RAC should I be running this per node basis ?
I also found this article https://blogs.oracle.com/alejandrovargas/check-io-scripts and the MOS 422414.1. This mathod only reports seconds.
Regards,
RG