Wait Event and Wait Class Metrics vs v$system_event
This post is a followup to the first metric post on statistics:
http://dboptimizer.com/2011/07/07/mining-awr-statistics-metrics-verses-statistics/
Unfortunately the wait event and wait class interface for metrics is not as friendly as it is for statistics.
For wait event views we have (at system level)
- V$SYSTEM_EVENT – wait events cumulative since startup
- V$EVENTMETRIC – wait event deltas last 60 seconds
- DBA_HIST_SYSTEM_EVENT – wait events by snapshot (hour) for last week, cumulative since startup
For wait class we have
- V$SYSTEM_WAIT_CLASS – cumulative since start up
- V$WAITCLASSMETRIC – last 60 seconds deltas
- V$WAITCLASSMETRIC_HISTORY – 60 seconds deltas for last hour
Remember DBA_HIST_WAITCLASSMETRIC_HISTORY is used for alerts and or baselines not everyday stats.
I use wait events for two things:
- load/bottlenecks on the system
- I/O Latencies
The data in V$ACTIVE_SESSION_HISTORY (ASH) is probably better for seeing load because load is made up of both wait time and cpu time. Cpu information is not in the event/waitclass views but is in ASH along with the waits.
The second part, the latencies, specifically I/O latencies, are only available in the wait event and waitclass views (and the filestat views on a per file basis)
Latencies
Latencies with WAIT CLASS
The waitclass views are mainly good for rollup and rolling up latencies is probably of little use. One possible use is determining the average read I/O for all the various kinds of read I/O and read sizes:
select 10*time_waited/nullif(wait_count,0) avg_io_ms -- convert centi-seconds to milliseconds from v$waitclassmetric m where wait_class_id= 1740759767 -- User I/O / AVG_IO_MS ---------- 8.916
One issue with V$WAITCLASSMETRIC is that the WAIT_CLASS name is not in the view, so we either have to use the WAIT_CLASS_ID (the hash of the name) as above or join to V$SYSTEM_WAIT_CLASS as below
select 10*m.time_waited/nullif(m.wait_count,0) avgms -- convert centisecs to ms from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class='User I/O' / AVG_IO_MS ---------- 8.916
Another issue is that the documentation for 11gR2 says that the TIME_WAITED is microseconds but in my tests it’s actually centisecs
desc V$SYSTEM_WAIT_CLASS Name Type ----------------------------------------- ---------------------------- WAIT_CLASS_ID NUMBER WAIT_CLASS# NUMBER WAIT_CLASS VARCHAR2(64) TOTAL_WAITS NUMBER TIME_WAITED NUMBER - centi-seconds
The view V$SYSTEM_WAIT_CLASS is not of much use since it’s just cumulative values from database startup and the deltas are already calculated in V$WAITCLASSMETRIC, on the other hand V$SYSTEM_WAIT_CLASS gives a list of wait classes and decodes the wait_class_id.
select wait_class_id , wait_class from V$SYSTEM_WAIT_CLASS ; WAIT_CLASS_ID WAIT_CLASS ------------- ---------------------------------------------------------------- 1893977003 Other 4217450380 Application 3290255840 Configuration 4166625743 Administrative 3875070507 Concurrency 3386400367 Commit 2723168908 Idle 2000153315 Network 1740759767 User I/O 4108307767 System I/O
Latencies with Wait Events
For specific I/O latencies there are two choices – v$eventmetric and v$system_event. With v$system_event it requires running multiple queries and taking the deltas but the deltas are are already calculated in v$eventmetric. Here is an example of getting I/O latencies from v$eventmetric
Latencies in the past minute
col name for a25 select m.intsize_csec, n.name , round(m.time_waited,3) time_waited, m.wait_count, round(10*m.time_waited/nullif(m.wait_count,0),3) avgms from v$eventmetric m, v$event_name n where m.event_id=n.event_id and n.name in ( 'db file sequential read', 'db file scattered read', 'direct path read', 'direct path read temp', 'direct path write', 'direct path write temp', 'log file sync', 'log file parallel write' ) / INTSIZE_CSEC NAME TIME_WAITED WAIT_COUNT AVGMS ------------ ------------------------- ----------- ---------- ---------- 6017 log file parallel write 2.538 4 6.345 6017 log file sync 2.329 1 23.287 6017 db file sequential read 0 0 6017 db file scattered read 0 0 6017 direct path read 0 0 6017 direct path read temp 0 0 6017 direct path write 0 0 6017 direct path write temp 0 0
Latencies averaged over each hour
select btime, round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3) avg_ms from ( select to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime, total_waits count_end, time_waited_micro/1000 time_ms_end, Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg, Lag (e.total_waits) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg from DBA_HIST_SYSTEM_EVENT e, DBA_HIST_SNAPSHOT s where s.snap_id=e.snap_id and e.event_name like '%&1%' order by begin_interval_time ) order by btime / BTIME AVG_MS --------------- ---------- 20-JUL-11 06:00 5.854 20-JUL-11 07:00 4.116 20-JUL-11 08:00 21.158 20-JUL-11 09:02 5.591 20-JUL-11 10:00 4.116 20-JUL-11 11:00 6.248 20-JUL-11 12:00 23.634 20-JUL-11 13:00 22.529 20-JUL-11 14:00 21.62 20-JUL-11 15:00 18.038 20-JUL-11 16:00 23.127
One issue with looking at I/O latencies is determining the I/O sizes. It would be awesome if there was a view with I/O counts, sizes and latencies in one place. ASH does have this information but ASH data is weighted to the longer latencies and sizes and not the average. The average sizes have to be gotten from system statistics. The I/O sizes for ‘db file sequential read’ are single block reads, but the other read events can vary in size. To get a general idea of I/O sizes one could just average across all I/O using the system statistics
Average I/O Size (across all I/O waits)
select sum(decode(metric_name,'Physical Reads Per Sec',value,0))*max(intsize_csec)/100 blocks_read, nullif(sum(decode(metric_name,'Physical Read IO Requests Per Sec',value,0)),0)*max(intsize_csec)/100 reads, sum(decode(metric_name,'Physical Reads Per Sec',value,0))/ nullif(sum(decode(metric_name,'Physical Read IO Requests Per Sec',value,0)),0) avg_blocks_read from v$sysmetric where group_id = 2 -- 60 second deltas only (not the 15 second deltas); BLOCKS_READ READS AVG_BLOCKS_READ ----------- ---------- --------------- 4798 4798 1
Load and Bottlenecks
The good thing about wait classes is that they simplify dealing with 1000s of wait events and group them into just a few wait classes. We can get a quick view of load on the system with
select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' ; WAIT_CLASS AAS --------------- ---------- Other 0 Application 0 Configuration 0 Administrative 0 Concurrency 0 Commit 0 Network 0 User I/O .149 System I/O .002
but the big drawback with wait event and/or wait class views is that they lack information on CPU load. CPU load can be found in the system statistics but it’s just easier to do it all in one query using v$active_session_history. Here is a query using ASH to calculate AAS load on the database over the last 60 seconds:
select round(count(*)/secs.var,3) AAS, decode(session_state,'ON CPU','CPU',wait_class) wait_class from v$active_session_history ash, (select 60 var from dual) secs where SAMPLE_TIME > sysdate - (secs.var/(24*60*60)) and SESSION_TYPE = 'FOREGROUND' group by decode(session_state,'ON CPU','CPU',wait_class) , secs.var / AAS WAIT_CLASS ---------- --------------- .016 Concurrency .001 Network 0 Other .083 Configuration .001 Administrative .034 CPU 0 System I/O .001 Commit .054 Application 0 User I/O
Though with v$sysmetric it’s pretty easy to join to v$waitclassmetric
select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' union select 'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 ; WAIT_CLASS AAS ---------------------------------------------------------------- ---------- Administrative 0 Application .009 CPU 1.696 Commit 0 Concurrency .001 Configuration 0 Network .002 Other 0 System I/O 0 User I/O 0
and adding v$sysmetric into the query allows me to do something I’ve always wanted which is to include the OS CPU in AAS
select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' union select 'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter, ( select 'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas ; WAIT_CLASS AAS ---------------------------------------------------------------- ---------- Administrative 0 Application 0 CPU .009 CPU_OS .024 Commit 0 Concurrency 0 Configuration 0 Network .002 Other 0 System I/O 0 User I/O 0
select 10*time_waited/nullif(wait_count,0) avg_io_ms -- convert centi-seconds to milliseconds from v$waitclassmetric m where wait_class_id= 1740759767 -- User I/O /
One huge loss over using ASH is the loss of the information users waiting for CPU but not running on CPU.
Trackbacks
Comments
Hi Kyle
in your “http://dboptimizer.com/2011/05/13/awr-io-latency-fluctuations/” entry
Sometime there is huge discrepancy between ‘Average Synchronous Single-Block Read Latency” and
waitclass calculation what do think about this ?
Best Regards
single blog latency is probably based on “db sequential reads” where as I/O wait class should be based on all the I/O in that wait class thus the metrics and statistics are imprecise. I don’t use them for I/O latency measurements. I use the actual wait events
http://dboptimizer.com/2013/01/30/oracle-io-latency-monitoring/