LGWR redo write times (log file parallel write)

July 14th, 2011

General ***rule of thumb*** I’ve heard and buy into is that decent log file syncs times should be 3ms or under. Well “log file sync” depends on “log file parallel write” times, and “log file parallel write” times depend on the size of the write, thus I looked at some different log write sizes on a Solaris system with redo files on a SAN with plenty of write back cache and 4Gb FC and there are the distributions I saw on this single system

In the following charts

  • X axis is # of blocks written per write
  • Y axis
    1. MB per Second on the bottom two graphs
    2. milliseconds per write on top two graphs

Customer Data

This data was from the 10046 trace files from the lgwr process.Block size is 512bytes.

Stange that the throughput goes to 90MB/s at 4-5 blocks and then drops off precipitously to 10-20MB/s above, when I’d expect I/O to speed up with larger write sizes.

One nagging question for me is why aren’t the log writes faster over 4G FB with write back cache. The fastest log write above 10 blocks (5K) is just under 25MB/s. I’d expect over 100MB/s. Definitely can dd write with oflag=sync  at over 100MB/s.

Nice old page from the formidable Steve Adams http://www.ixora.com.au/notes/log_block_size.htm with query to get block size:

select max(lebsz) from sys.x$kccle;

UPDATE: the above data was from a customer system. I just ran a similar test  on my lab machine and got results of more along the lines I was expecting. On the my lab machine I have the redo on file system mounted over 1GbE NFS  which maxes out around 115MB/s. On a good day I get 108ish MB/s and on a bad day around 50MB/s – it’s a public network (ie company wide). Today was a bad day, but the results are pretty good:

In house data (using NFS storage)

UPDATE


 

Query to get  average  log write write times

select
       dbid,
       btime,
       round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),1) avg_ms
from (
select
       s.dbid,
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       total_waits count_end,
       time_waited_micro/1000 time_ms_end,
       Lag (e.time_waited_micro/1000)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
       Lag (e.total_waits)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
       DBA_HIST_SYSTEM_EVENT e,
       DBA_HIST_SNAPSHOT s
where
       s.snap_id=e.snap_id
   and e.event_name in (
                  -- 'log file sync',
                  'log file parallel write'
                )
   and  s.dbid=e.dbid
)
order by btime
/

Redo Sizes

select   dbid, to_char( begin_time ,'YYYY/MM/DD HH24'),
         round(sum(decode(metric_name,'Redo Generated Per Sec',average,0)) /
         nullif(sum(decode(metric_name,'Redo Writes Per Sec',average,0)),0)/1024,0)  avg_log_write_kb
from     dba_hist_sysmetric_summary
where    metric_name in  (
                           'Redo Writes Per Sec',
                           'Redo Generated Per Sec'
         ) /* and DBID=[dbid if share repository] */
group  by dbid, to_char( begin_time ,'YYYY/MM/DD HH24')
/

 


Uncategorized

  1. Trackbacks

  1. Comments

  2. G.P Singh
    June 13th, 2012 at 23:53 | #1

    Hi,

    The values for me are coming as negative:

    DBID BTIME AVG_MS
    ———- ————— ———-
    3578727330 31-MAY-12 21:00
    3578727330 31-MAY-12 21:00 -429.6
    3578727330 31-MAY-12 21:00 1.3
    3578727330 31-MAY-12 22:00 1.3
    3578727330 31-MAY-12 22:00 -66.5
    3578727330 31-MAY-12 22:00
    3578727330 31-MAY-12 22:00 -66.5
    3578727330 31-MAY-12 22:00 .6
    3578727330 31-MAY-12 22:00 -9.4
    3578727330 31-MAY-12 22:00
    3578727330 31-MAY-12 22:00 -9.4

    DBID BTIME AVG_MS
    ———- ————— ———-
    3578727330 31-MAY-12 23:00
    3578727330 31-MAY-12 23:00 -45.6
    3578727330 31-MAY-12 23:00 1.1
    3578727330 31-MAY-12 23:00 -45.6
    3578727330 31-MAY-12 23:00
    3578727330 31-MAY-12 23:00 -9.3
    3578727330 31-MAY-12 23:00 .5
    3578727330 31-MAY-12 23:00 -9.3

    What does that indicate.

    Thanks
    GP Singh

  3. June 14th, 2012 at 00:27 | #2

    Hi GP Singh,
    Thanks for pointing that out. Could you try the script again? The previous version collected info for both log file parallel write as well as log file sync without grouping them. The current one should just select “log file parallel write”.
    – Kyle


× eight = 24