DB Optimizer » AWR mining

AWR mining

June 21st, 2011

Update: nice query from Jonathan Lewis on AWR:

http://jonathanlewis.wordpress.com/awr-sys-stats/

I found this while looking for how to eliminate reporting stats over database restarts. Apparently if using snap_id then there will be no snap_id-1 over restarts because the ids will jump more than one across restarts – have to check into this

Exploring ways to mine data in AWR.
One situation I’m looking at now is a number of database that all use the same LUNs on the back end storage, so two metrics that would be interesting to look at are IOs from these databases and the IO latencies to answer the question, “does read and/or write I/O activity correlate to increased latency on the backend storage?”

I have multiple databases in one AWR repository.

I want to create the output in a format that I can read with something like Excel, so I make the values comma delimited,
AWRs values for sysmetric are hourly, so I want to round the times to the nearest hour to make correlation easier and I want to translate the DBID into the database name to make reading the data easier. I could do something like:

select
               n.db_name||'_'||n.host_name||'_'||
               decode(metric_name,
                   'Physical Write Total Bytes Per Sec','write','read')||','||
               to_char(trunc(begin_time)+
                ( ROUND ((begin_time - TRUNC (begin_time)) * 24) / 24),
                       'YYYY/MM/DD HH24:MI')||','||
               average
        from dba_hist_sysmetric_summary s,
               (select distinct dbid, db_name, host_name
                from dba_hist_database_instance) n
        where
               ( s.metric_name= 'Physical Write Total Bytes Per Sec'
                  or
                 s.metric_name= 'Physical Read Total Bytes Per Sec'
               ) and n.dbid=s.dbid
               order by begin_time;

For the I/O latencies it take some more work as I have to compute the deltas between snapshots getting the total elasped time and the count of waits to compute the average wait over the snapshot period for each I/O wait event. I also have to watch out for database bounces, which I have inelligantly addressed by simply filtering only for waits less than or equal to 0. I could do something like

select
       btime||','||
       n.db_name||'_'||n.host_name||'_'||
       event_name||','||
       nvl(round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3),0) avg_ms
from (
   select
          e.event_name,
          e.dbid,
          to_char(trunc(cast(begin_interval_time as date ))+
                  (ROUND ((cast(begin_interval_time as  date)- TRUNC (cast(begin_interval_time as date))) * 24) / 24),
                       'YYYY/MM/DD HH24:MI') btime,
          Lag (e.total_waits) OVER( PARTITION BY e.event_name, e.dbid ORDER BY s.snap_id)
                      count_beg,
          total_waits count_end,
          Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name,e.dbid ORDER BY s.snap_id)
                                 time_ms_beg,
          time_waited_micro/1000 time_ms_end
   from
          DBA_HIST_SYSTEM_EVENT e,
          DBA_HIST_SNAPSHOT s
   where
            s.snap_id=e.snap_id and
            s.dbid=e.dbid and
            (e.event_name= 'db file sequential read'
          or e.event_name='db file scattered read' )
   order by begin_interval_time
) s,
  (select distinct dbid, db_name, host_name from dba_hist_database_instance) n
where
    n.dbid=s.dbid
and
       round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3)  >= 0
order by btime;

Correlating the read/write throughput along with single and multiblock read latencies across multiple database over multiple days can pose challenges. One strategy is to boil the data down to a smaller number of metrics like total read, total write to average single block I/O. For average single block I/O, I’d want to normalize the I/O latency relative to read I/O for that database. Maybe I could get an average of latencies weighted by throughput of that database
Work in progress …

Kyle Hailey
Uncategorized

Leave a comment

Trackbacks
October 20th, 2011: DB Optimizer » Tuning Blog Entries

Comments
Tyler D Muth

June 21st, 2011 at 02:10 | #1

Reply | Quote

L
Tyler D Muth

June 21st, 2011 at 02:12 | #2

Reply | Quote

Funny, I was just writing some related reports. Drop me an email if you would like to collaborate.

The HTML output from sqlplus is easy to open in excel.

Tyler
Kyle Hailey

June 21st, 2011 at 04:22 | #3

Reply | Quote

Hey Tyler,

Thanks for stopping by. I’d definitely be interested in collaborating.

Been meaning to send you some questions about questions about the AWR formatting – pretty cool ( AWR formatter )

– Kyle
Karl Arao

June 22nd, 2011 at 03:13 | #4

Reply | Quote

Hi Kyle,
This is my interest too, let me know if I can help ;) I’ve done some research before where I want to have the latency on the same row (avg,min,max) with the IO throughput numbers and IO type (single/multiblock) per snap_id but this requires joining to the dba_hist_filestats and if you have a bunch of datafiles that would be really make the join slow. So I just took out that portion and added the CPU IO wait at least that would let me know if I’m IO bound/bottleneck on a particular period. I’ve got a script called awr_iowl.sql let me know if that’s helpful.

Tyler,
I’ve got some ideas too on the AWR formatting, that tool was very useful on a recent Exadata tuning gig!

-Karl
Henry Poras

June 22nd, 2011 at 18:11 | #5

Reply | Quote

I compared “physical [read|write] total bytes” in dba_hist_sysstat to the dba_hist_sysmetric_summary and there seems to be a factor of 1000 difference in the MB/sec values. For example, sysmetric_summary shows 17 MB/sec while sysstat shows .017 MB/sec. Anyone else seeing this?
Henry Poras

June 22nd, 2011 at 18:15 | #6

Reply | Quote

Ooops, stupid math error. Never mind.
@Henry Poras

Kyle Hailey

June 22nd, 2011 at 21:04 | #7

Reply | Quote

One thing to keep in mind: the “metric” tables have already computed values such as rates and deltas which are read to be used. On the other hand the standard Oracle statistic tables such as dba_hist_sysstat and db_hist_system_event have the cumlative values since database startup and require deltas to be calculated.
DBA_HIST_SYSMETRIC_SUMMARY is cool as it already has max, min, avg, std etc:
SQL> desc DBA_HIST_SYSMETRIC_SUMMARY
Name
————————————
SNAP_ID
DBID
INSTANCE_NUMBER
BEGIN_TIME
END_TIME
INTSIZE
GROUP_ID
METRIC_ID
METRIC_NAME
METRIC_UNIT
NUM_INTERVAL
MINVAL
MAXVAL
AVERAGE
STANDARD_DEVIATION
SUM_SQUARES
where as dba_hist_sysstat is not so much fun
SQL> desc dba_hist_sysstat
Name
—————————-
SNAP_ID
DBID
INSTANCE_NUMBER
STAT_ID
STAT_NAME
VALUE
so what would be an easy query on the metrics is harder on the sysstat:

with stats as (
           select
                  st.stat_name,
                  st.dbid,
                  to_char(trunc(cast(begin_interval_time as date ))+
                          (ROUND ((cast(begin_interval_time as  date)- 
                            TRUNC (cast(begin_interval_time as date))) * 24) / 24),
                            'YYYY/MM/DD HH24:MI') btime,
                  Lag (st.value) OVER( PARTITION BY st.stat_name,st.dbid ORDER BY sn.snap_id)
                               value_beg,
                  st.value     value_end,
                  (cast(end_interval_time as date) - cast(begin_interval_time as date )) * (24*60*60) delta
           from
                  DBA_HIST_SYSSTAT  st,
                  DBA_HIST_SNAPSHOT sn
           where
                    sn.snap_id=st.snap_id and
                    sn.dbid=st.dbid and
                    (st.stat_name= 'physical read total bytes'
                  or st.stat_name='physical write total bytes' )
           order by begin_interval_time
     ) 
   select
          n.db_name, 
          btime,
          stat_name,
          (value_end-value_beg)/delta rate_per_sec
   from stats st,
       (select distinct dbid, db_name, host_name from dba_hist_database_instance) n
   where n.dbid=st.dbid
    and  (value_end-value_beg) > 0
/

verses metrics

select
               n.db_name||'_'||n.host_name,
               decode(metric_name,
                   'Physical Write Total Bytes Per Sec','write','read'),
               to_char(trunc(begin_time)+
                ( ROUND ((begin_time - TRUNC (begin_time)) * 24) / 24),
                       'YYYY/MM/DD HH24:MI'),
               average
        from dba_hist_sysmetric_summary           s,
             (select distinct 
                         dbid, 
                         db_name, 
                         host_name
                from dba_hist_database_instance) n
        where
               ( s.metric_name= 'Physical Write Total Bytes Per Sec'
                  or
                 s.metric_name= 'Physical Read Total Bytes Per Sec'
               ) and n.dbid=s.dbid
               order by begin_time;

Alan Gagne

July 1st, 2011 at 15:04 | #8

Reply | Quote

You might want to look at MOD(b.snap_id,step)=0 to remove the data spikes from database restarts.
Alan Gagne

July 1st, 2011 at 15:48 | #9

Reply | Quote

Alan Gagne :
You might want to look at MOD(b.snap_id,step)=0 to remove the data spikes from database restarts.

Sorry feeble mind looking at old scripts.
I am using startup_time from dba_hist_snapshot on 10.2.
b.startup_time = e.startup_time