AWR mining

June 21st, 2011

Update: nice query from Jonathan Lewis on AWR:

http://jonathanlewis.wordpress.com/awr-sys-stats/

I found this while  looking for how to eliminate reporting stats over database restarts. Apparently if using snap_id then there will be no snap_id-1 over restarts because the ids will jump more than one across restarts – have to check into this


Exploring ways to mine data in AWR.
One situation I’m looking at now is a number of database that all use the same LUNs on the back end storage, so two metrics that would be interesting to look at are IOs from these databases and the IO latencies to answer the question, “does read and/or write I/O activity correlate to increased latency on the backend storage?”

I have multiple databases in one AWR repository.

I want to create the output in a format that I can read with something like Excel, so I make the values comma delimited,
AWRs values for sysmetric are hourly, so I want to round the times to the nearest hour to make correlation easier and I want to translate the DBID into the database name to make reading the data easier. I could do something like:

select
               n.db_name||'_'||n.host_name||'_'||
               decode(metric_name,
                   'Physical Write Total Bytes Per Sec','write','read')||','||
               to_char(trunc(begin_time)+
                ( ROUND ((begin_time - TRUNC (begin_time)) * 24) / 24),
                       'YYYY/MM/DD HH24:MI')||','||
               average
        from dba_hist_sysmetric_summary s,
               (select distinct dbid, db_name, host_name
                from dba_hist_database_instance) n
        where
               ( s.metric_name= 'Physical Write Total Bytes Per Sec'
                  or
                 s.metric_name= 'Physical Read Total Bytes Per Sec'
               ) and n.dbid=s.dbid
               order by begin_time;

For the I/O latencies it take some more work as I have to compute the deltas between snapshots getting the total elasped time and the count of waits to compute the average wait over the snapshot period for each I/O wait event. I also have to watch out for database bounces, which I have inelligantly addressed by simply filtering only for waits less than or equal to 0. I could do something like

select
       btime||','||
       n.db_name||'_'||n.host_name||'_'||
       event_name||','||
       nvl(round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3),0) avg_ms
from (
   select
          e.event_name,
          e.dbid,
          to_char(trunc(cast(begin_interval_time as date ))+
                  (ROUND ((cast(begin_interval_time as  date)- TRUNC (cast(begin_interval_time as date))) * 24) / 24),
                       'YYYY/MM/DD HH24:MI') btime,
          Lag (e.total_waits) OVER( PARTITION BY e.event_name, e.dbid ORDER BY s.snap_id)
                      count_beg,
          total_waits count_end,
          Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name,e.dbid ORDER BY s.snap_id)
                                 time_ms_beg,
          time_waited_micro/1000 time_ms_end
   from
          DBA_HIST_SYSTEM_EVENT e,
          DBA_HIST_SNAPSHOT s
   where
            s.snap_id=e.snap_id and
            s.dbid=e.dbid and
            (e.event_name= 'db file sequential read'
          or e.event_name='db file scattered read' )
   order by begin_interval_time
) s,
  (select distinct dbid, db_name, host_name from dba_hist_database_instance) n
where
    n.dbid=s.dbid
and
       round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3)  >= 0
order by btime;

Correlating the read/write throughput along with single and multiblock read latencies across multiple database over multiple days can pose challenges. One strategy is to boil the data down to a smaller number of metrics like total read, total write to average single block I/O. For average single block I/O, I’d want to normalize the I/O latency relative to read I/O for that database. Maybe I could get an average of latencies weighted by throughput of that database
Work in progress …


Uncategorized

  1. Trackbacks

  1. Comments

  2. Tyler D Muth
    June 21st, 2011 at 02:10 | #1

    L

  3. Tyler D Muth
    June 21st, 2011 at 02:12 | #2

    Funny, I was just writing some related reports. Drop me an email if you would like to collaborate.

    The HTML output from sqlplus is easy to open in excel.

    Tyler

  4. Kyle Hailey
    June 21st, 2011 at 04:22 | #3

    Hey Tyler,

    Thanks for stopping by. I’d definitely be interested in collaborating.

    Been meaning to send you some questions about questions about the AWR formatting – pretty cool ( AWR formatter )

    – Kyle

  5. Karl Arao
    June 22nd, 2011 at 03:13 | #4

    Hi Kyle,
    This is my interest too, let me know if I can help ;) I’ve done some research before where I want to have the latency on the same row (avg,min,max) with the IO throughput numbers and IO type (single/multiblock) per snap_id but this requires joining to the dba_hist_filestats and if you have a bunch of datafiles that would be really make the join slow. So I just took out that portion and added the CPU IO wait at least that would let me know if I’m IO bound/bottleneck on a particular period. I’ve got a script called awr_iowl.sql let me know if that’s helpful.

    Tyler,
    I’ve got some ideas too on the AWR formatting, that tool was very useful on a recent Exadata tuning gig!

    -Karl

  6. Henry Poras
    June 22nd, 2011 at 18:11 | #5

    I compared “physical [read|write] total bytes” in dba_hist_sysstat to the dba_hist_sysmetric_summary and there seems to be a factor of 1000 difference in the MB/sec values. For example, sysmetric_summary shows 17 MB/sec while sysstat shows .017 MB/sec. Anyone else seeing this?

  7. Henry Poras
    June 22nd, 2011 at 18:15 | #6

    Ooops, stupid math error. Never mind.
    @Henry Poras

  8. Kyle Hailey
    June 22nd, 2011 at 21:04 | #7

    One thing to keep in mind: the “metric” tables have already computed values such as rates and deltas which are read to be used. On the other hand the standard Oracle statistic tables such as dba_hist_sysstat and db_hist_system_event have the cumlative values since database startup and require deltas to be calculated.
    DBA_HIST_SYSMETRIC_SUMMARY is cool as it already has max, min, avg, std etc:
    SQL> desc DBA_HIST_SYSMETRIC_SUMMARY
    Name
    ————————————
    SNAP_ID
    DBID
    INSTANCE_NUMBER
    BEGIN_TIME
    END_TIME
    INTSIZE
    GROUP_ID
    METRIC_ID
    METRIC_NAME
    METRIC_UNIT
    NUM_INTERVAL
    MINVAL
    MAXVAL
    AVERAGE
    STANDARD_DEVIATION
    SUM_SQUARES
    where as dba_hist_sysstat is not so much fun
    SQL> desc dba_hist_sysstat
    Name
    —————————-
    SNAP_ID
    DBID
    INSTANCE_NUMBER
    STAT_ID
    STAT_NAME
    VALUE
    so what would be an easy query on the metrics is harder on the sysstat:

    with stats as (
               select
                      st.stat_name,
                      st.dbid,
                      to_char(trunc(cast(begin_interval_time as date ))+
                              (ROUND ((cast(begin_interval_time as  date)- 
                                TRUNC (cast(begin_interval_time as date))) * 24) / 24),
                                'YYYY/MM/DD HH24:MI') btime,
                      Lag (st.value) OVER( PARTITION BY st.stat_name,st.dbid ORDER BY sn.snap_id)
                                   value_beg,
                      st.value     value_end,
                      (cast(end_interval_time as date) - cast(begin_interval_time as date )) * (24*60*60) delta
               from
                      DBA_HIST_SYSSTAT  st,
                      DBA_HIST_SNAPSHOT sn
               where
                        sn.snap_id=st.snap_id and
                        sn.dbid=st.dbid and
                        (st.stat_name= 'physical read total bytes'
                      or st.stat_name='physical write total bytes' )
               order by begin_interval_time
         ) 
       select
              n.db_name, 
              btime,
              stat_name,
              (value_end-value_beg)/delta rate_per_sec
       from stats st,
           (select distinct dbid, db_name, host_name from dba_hist_database_instance) n
       where n.dbid=st.dbid
        and  (value_end-value_beg) > 0
    /
    

    verses metrics

    select
                   n.db_name||'_'||n.host_name,
                   decode(metric_name,
                       'Physical Write Total Bytes Per Sec','write','read'),
                   to_char(trunc(begin_time)+
                    ( ROUND ((begin_time - TRUNC (begin_time)) * 24) / 24),
                           'YYYY/MM/DD HH24:MI'),
                   average
            from dba_hist_sysmetric_summary           s,
                 (select distinct 
                             dbid, 
                             db_name, 
                             host_name
                    from dba_hist_database_instance) n
            where
                   ( s.metric_name= 'Physical Write Total Bytes Per Sec'
                      or
                     s.metric_name= 'Physical Read Total Bytes Per Sec'
                   ) and n.dbid=s.dbid
                   order by begin_time;
    
  9. Alan Gagne
    July 1st, 2011 at 15:04 | #8

    You might want to look at MOD(b.snap_id,step)=0 to remove the data spikes from database restarts.

  10. Alan Gagne
    July 1st, 2011 at 15:48 | #9

    Alan Gagne :
    You might want to look at MOD(b.snap_id,step)=0 to remove the data spikes from database restarts.

    Sorry feeble mind looking at old scripts.
    I am using startup_time from dba_hist_snapshot on 10.2.
    b.startup_time = e.startup_time


5 − = three