LGWR redo write times (log file parallel write)
General ***rule of thumb*** I’ve heard and buy into is that decent log file syncs times should be 3ms or under. Well “log file sync” depends on “log file parallel write” times, and “log file parallel write” times depend on the size of the write, thus I looked at some different log write sizes on a Solaris system with redo files on a SAN with plenty of write back cache and 4Gb FC and there are the distributions I saw on this single system
In the following charts
- X axis is # of blocks written per write
- Y axis
- MB per Second on the bottom two graphs
- milliseconds per write on top two graphs
Customer Data
This data was from the 10046 trace files from the lgwr process.Block size is 512bytes.
Stange that the throughput goes to 90MB/s at 4-5 blocks and then drops off precipitously to 10-20MB/s above, when I’d expect I/O to speed up with larger write sizes.
One nagging question for me is why aren’t the log writes faster over 4G FB with write back cache. The fastest log write above 10 blocks (5K) is just under 25MB/s. I’d expect over 100MB/s. Definitely can dd write with oflag=sync at over 100MB/s.
Nice old page from the formidable Steve Adams http://www.ixora.com.au/notes/log_block_size.htm with query to get block size:
select max(lebsz) from sys.x$kccle;
UPDATE: the above data was from a customer system. I just ran a similar test on my lab machine and got results of more along the lines I was expecting. On the my lab machine I have the redo on file system mounted over 1GbE NFS which maxes out around 115MB/s. On a good day I get 108ish MB/s and on a bad day around 50MB/s – it’s a public network (ie company wide). Today was a bad day, but the results are pretty good:
In house data (using NFS storage)
UPDATE
Query to get average log write write times
select dbid, btime, round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),1) avg_ms from ( select s.dbid, to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime, total_waits count_end, time_waited_micro/1000 time_ms_end, Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg, Lag (e.total_waits) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg from DBA_HIST_SYSTEM_EVENT e, DBA_HIST_SNAPSHOT s where s.snap_id=e.snap_id and e.event_name in ( -- 'log file sync', 'log file parallel write' ) and s.dbid=e.dbid ) order by btime /
Redo Sizes
select dbid, to_char( begin_time ,'YYYY/MM/DD HH24'), round(sum(decode(metric_name,'Redo Generated Per Sec',average,0)) / nullif(sum(decode(metric_name,'Redo Writes Per Sec',average,0)),0)/1024,0) avg_log_write_kb from dba_hist_sysmetric_summary where metric_name in ( 'Redo Writes Per Sec', 'Redo Generated Per Sec' ) /* and DBID=[dbid if share repository] */ group by dbid, to_char( begin_time ,'YYYY/MM/DD HH24') /
Trackbacks
Comments
Hi,
The values for me are coming as negative:
DBID BTIME AVG_MS
———- ————— ———-
3578727330 31-MAY-12 21:00
3578727330 31-MAY-12 21:00 -429.6
3578727330 31-MAY-12 21:00 1.3
3578727330 31-MAY-12 22:00 1.3
3578727330 31-MAY-12 22:00 -66.5
3578727330 31-MAY-12 22:00
3578727330 31-MAY-12 22:00 -66.5
3578727330 31-MAY-12 22:00 .6
3578727330 31-MAY-12 22:00 -9.4
3578727330 31-MAY-12 22:00
3578727330 31-MAY-12 22:00 -9.4
DBID BTIME AVG_MS
———- ————— ———-
3578727330 31-MAY-12 23:00
3578727330 31-MAY-12 23:00 -45.6
3578727330 31-MAY-12 23:00 1.1
3578727330 31-MAY-12 23:00 -45.6
3578727330 31-MAY-12 23:00
3578727330 31-MAY-12 23:00 -9.3
3578727330 31-MAY-12 23:00 .5
3578727330 31-MAY-12 23:00 -9.3
What does that indicate.
Thanks
GP Singh
Hi GP Singh,
Thanks for pointing that out. Could you try the script again? The previous version collected info for both log file parallel write as well as log file sync without grouping them. The current one should just select “log file parallel write”.
– Kyle