AWR: I/O latency fluctuations

One thing I’ve been wishing was more prominent in OEM or other tools is the displays of the fluctuation in I/O times for Oracle. I/O times can happen because of change in load or hardware issues. Changes in I/O latency are not shown clearly in OEM. Luckily the data is easily to query from the AWR repository. Starting in 11g there is a statistics “Average Synchronous Single-Block Read Latency” that can be queried in v$sysmetric and dba_hist_sysmetric_summary as in

select  to_char(begin_time,'DD-MON-YY HH24:MI'), 
 average, maxval from dba_hist_sysmetric_summary where
metric_name='Average Synchronous Single-Block Read Latency'
order by begin_time;

but for 10g Here is a query that gets the hourly average I/O latency for all User I/O and System I/O:

set pagesize 100
col event_name format a30
col avg_ms format 99999.99
       btime, event_name,
       (time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0) avg_ms,
       (count_end-count_beg) ct
from (
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       total_waits count_end,
       time_waited_micro/1000 time_ms_end,
       Lag (e.time_waited_micro/1000)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
       Lag (e.total_waits)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
   and s.begin_interval_time > sysdate -2
   and e.wait_class in ( 'User I/O', 'System I/O')
   -- and s.dbid=2
   -- and s.dbid=e.dbid
   -- and s.begin_interval_time > to_date('07-NOV-11 13:00','DD-MON-YY HH24:MI')
   -- and s.begin_interval_time < to_date('07-NOV-11 15:00','DD-MON-YY HH24:MI')

order by e.event_name, begin_interval_time
where (count_end-count_beg) > 0
order by event_name,btime
18-APR-11 18:00       db file scattered read              8.24       4404
18-APR-11 19:00       db file scattered read              7.85       3256
18-APR-11 20:00       db file scattered read              8.22       3511
18-APR-11 21:00       db file scattered read              7.40       5025
18-APR-11 22:00       db file scattered read              6.01     290640
18-APR-11 23:00       db file scattered read              8.99       6775
19-APR-11 00:00       db file scattered read             22.08      30126
19-APR-11 01:00       db file scattered read              8.63     105273
19-APR-11 02:00       db file scattered read              5.62      24064
19-APR-11 03:00       db file scattered read              8.32       3368
19-APR-11 04:00       db file scattered read              9.51       3218
19-APR-11 05:00       db file scattered read              5.64      28113
19-APR-11 06:00       db file scattered read              7.75      11014
19-APR-11 07:00       db file scattered read              8.80      18040
19-APR-11 08:00       db file scattered read              8.00      11443
19-APR-11 09:00       db file scattered read              5.51      11543
19-APR-11 10:00       db file scattered read              5.48      19038

It would be nice to see these I/O times broken down by histograms. Something to look into with the wait histograms.

Another thing to keep in mind is the IOPs of your database. If the IOPs start to max out the underlying disk storage subsystem then the latencies will suffer. Oracle has these statistics as well

select begin_time, metric_name, average, maxval 
from dba_hist_sysmetric_summary 
where metric_name
in ('Physical Read Total IO Requests Per Sec','Physical Write Total IO Requests Per Sec')
order by begin_time;

A typical rule of thumb is around 100 IOPs per spindle.
And if the underlying I/O subsystem is RAID 5 or a variation then shoot for multiply the write IOPs from Oracle by 4 to take into account the extra overhead of RAID 5 (and ilk).


× six = 36