AWR mining
Update: nice query from Jonathan Lewis on AWR:
http://jonathanlewis.wordpress.com/awr-sys-stats/
I found this while looking for how to eliminate reporting stats over database restarts. Apparently if using snap_id then there will be no snap_id-1 over restarts because the ids will jump more than one across restarts – have to check into this
Exploring ways to mine data in AWR.
One situation I’m looking at now is a number of database that all use the same LUNs on the back end storage, so two metrics that would be interesting to look at are IOs from these databases and the IO latencies to answer the question, “does read and/or write I/O activity correlate to increased latency on the backend storage?”
I have multiple databases in one AWR repository.
I want to create the output in a format that I can read with something like Excel, so I make the values comma delimited,
AWRs values for sysmetric are hourly, so I want to round the times to the nearest hour to make correlation easier and I want to translate the DBID into the database name to make reading the data easier. I could do something like:
select n.db_name||'_'||n.host_name||'_'|| decode(metric_name, 'Physical Write Total Bytes Per Sec','write','read')||','|| to_char(trunc(begin_time)+ ( ROUND ((begin_time - TRUNC (begin_time)) * 24) / 24), 'YYYY/MM/DD HH24:MI')||','|| average from dba_hist_sysmetric_summary s, (select distinct dbid, db_name, host_name from dba_hist_database_instance) n where ( s.metric_name= 'Physical Write Total Bytes Per Sec' or s.metric_name= 'Physical Read Total Bytes Per Sec' ) and n.dbid=s.dbid order by begin_time;
For the I/O latencies it take some more work as I have to compute the deltas between snapshots getting the total elasped time and the count of waits to compute the average wait over the snapshot period for each I/O wait event. I also have to watch out for database bounces, which I have inelligantly addressed by simply filtering only for waits less than or equal to 0. I could do something like
select btime||','|| n.db_name||'_'||n.host_name||'_'|| event_name||','|| nvl(round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3),0) avg_ms from ( select e.event_name, e.dbid, to_char(trunc(cast(begin_interval_time as date ))+ (ROUND ((cast(begin_interval_time as date)- TRUNC (cast(begin_interval_time as date))) * 24) / 24), 'YYYY/MM/DD HH24:MI') btime, Lag (e.total_waits) OVER( PARTITION BY e.event_name, e.dbid ORDER BY s.snap_id) count_beg, total_waits count_end, Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name,e.dbid ORDER BY s.snap_id) time_ms_beg, time_waited_micro/1000 time_ms_end from DBA_HIST_SYSTEM_EVENT e, DBA_HIST_SNAPSHOT s where s.snap_id=e.snap_id and s.dbid=e.dbid and (e.event_name= 'db file sequential read' or e.event_name='db file scattered read' ) order by begin_interval_time ) s, (select distinct dbid, db_name, host_name from dba_hist_database_instance) n where n.dbid=s.dbid and round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3) >= 0 order by btime;
Correlating the read/write throughput along with single and multiblock read latencies across multiple database over multiple days can pose challenges. One strategy is to boil the data down to a smaller number of metrics like total read, total write to average single block I/O. For average single block I/O, I’d want to normalize the I/O latency relative to read I/O for that database. Maybe I could get an average of latencies weighted by throughput of that database
Work in progress …
Trackbacks
Comments
L
Funny, I was just writing some related reports. Drop me an email if you would like to collaborate.
The HTML output from sqlplus is easy to open in excel.
Tyler
Hey Tyler,
Thanks for stopping by. I’d definitely be interested in collaborating.
Been meaning to send you some questions about questions about the AWR formatting – pretty cool ( AWR formatter )
– Kyle
Hi Kyle,
This is my interest too, let me know if I can help ;) I’ve done some research before where I want to have the latency on the same row (avg,min,max) with the IO throughput numbers and IO type (single/multiblock) per snap_id but this requires joining to the dba_hist_filestats and if you have a bunch of datafiles that would be really make the join slow. So I just took out that portion and added the CPU IO wait at least that would let me know if I’m IO bound/bottleneck on a particular period. I’ve got a script called awr_iowl.sql let me know if that’s helpful.
Tyler,
I’ve got some ideas too on the AWR formatting, that tool was very useful on a recent Exadata tuning gig!
-Karl
I compared “physical [read|write] total bytes” in dba_hist_sysstat to the dba_hist_sysmetric_summary and there seems to be a factor of 1000 difference in the MB/sec values. For example, sysmetric_summary shows 17 MB/sec while sysstat shows .017 MB/sec. Anyone else seeing this?
Ooops, stupid math error. Never mind.
@Henry Poras
One thing to keep in mind: the “metric” tables have already computed values such as rates and deltas which are read to be used. On the other hand the standard Oracle statistic tables such as dba_hist_sysstat and db_hist_system_event have the cumlative values since database startup and require deltas to be calculated.
DBA_HIST_SYSMETRIC_SUMMARY is cool as it already has max, min, avg, std etc:
SQL> desc DBA_HIST_SYSMETRIC_SUMMARY
Name
————————————
SNAP_ID
DBID
INSTANCE_NUMBER
BEGIN_TIME
END_TIME
INTSIZE
GROUP_ID
METRIC_ID
METRIC_NAME
METRIC_UNIT
NUM_INTERVAL
MINVAL
MAXVAL
AVERAGE
STANDARD_DEVIATION
SUM_SQUARES
where as dba_hist_sysstat is not so much fun
SQL> desc dba_hist_sysstat
Name
—————————-
SNAP_ID
DBID
INSTANCE_NUMBER
STAT_ID
STAT_NAME
VALUE
so what would be an easy query on the metrics is harder on the sysstat:
verses metrics
You might want to look at MOD(b.snap_id,step)=0 to remove the data spikes from database restarts.
Sorry feeble mind looking at old scripts.
I am using startup_time from dba_hist_snapshot on 10.2.
b.startup_time = e.startup_time