Archive

Archive for the ‘Uncategorized’ Category

Alleviate load on SANs with Data Virtualization

February 15th, 2013

by Matt Hayward

One of the less obvious advantages of database virtualization is a reduction in the read I/O issued against the underlying physical storage (henceforth called “the SAN”) that ultimately stores the data for virtual databases.

In practice, Delphix prevents around 60% of all non-production database I/O* from ever being issued to the SAN with the Delphix cache.

This is possible because Delphix Server accommodates large amounts of RAM**, which is used as an auxiliary cache above and beyond the database buffer cache that resides on the database server.  When the database needs to read some data that is not present in the database buffer cache, an I/O is issued to the Delphix Server. Delphix then checks its own cache, and only passes the I/O request down to the SAN if the necessary data is not already in the Delphix cache.

Delphix cache hits bring dual performance benefits:

  • Virtual databases I/O service times for reads are fast: in the range of a few hundred microseconds plus network latency.  These ~1 millisecond latencies are 5-20 times faster than traditional SAN random read access times.
  • By serving these I/O requests from Delphix cache, the load on the SAN is reduced

Delphix’s minimum system configuration requires 16 GB of RAM, however most customers configure their Delphix Servers with 64 GB or more.  Thanks to these large caches, Delphix Servers consistently have a 60-70% cache hit ratio.

In the last two years, I’ve collected database I/O statistics from 469 production and non-production Oracle databases across 25 companies in a variety of industries and with diverse applications.  Studying these statistics gives the following findings for this particular sample:

  • Non-production databases account for 70% of all databases

Focusing on these non-production databases, which are the first candidates for database virtualization:

  • Reads accounted for 87% of database I/O, writes for only 13%.
  • Grouped by company, the highest read proportion observed was 97%, the lowest 61%
  • With Delphix’s typical 60-70% cache hit ratio, this means full virtualization of non-production environments would eliminate between:
    • 52.2 and 60.9% of total database I/O from the SAN

To give those findings a sense of scope, in the two years I’ve been working at Delphix, eliminating 60% of non-production I/O would amount to eliminating 59 petabytes of I/O.

To put that into perspective, Amazon Elastic Block Store charges $0.10 per million I/O requests.  Assuming an average I/O request size of 8 kilobytes it would cost you around$800,000 to do 59 petabytes of I/O in Amazon EBS.

That’s a truckload of I/O – literally: it takes around 120 milliwatt hours of energy to read 256 megabytes of data from disk, reading 59 petabytes would require around 29 megawatt hours – roughly equivalent to the energy in 2.55 tons of oil (or 25 tons of TNT*** if that’s more your style).

By using inexpensive server RAM as a secondary cache for virtual databases, Delphix can dramatically improve database read I/O performance, and eliminate 50-60% of non-production database I/O from the SAN.  This improves performance of virtual databases, and allows other applications to get more out of a shared SAN.

 

Footnotes:

* Delphix eliminates around 60% of I/O associated with the operation of non-production databases.  It also eliminates huge amounts of I/O on both production and non-production systems formerly used to create, copy, and restore full backups – although this is not quantified in this blog post.

** The upper bound on RAM assignable the Delphix Server is constrained by limitations of the hypervisor or underlying physical host long before it approaches limits inherent in the Delphix OS.

*** http://en.wikipedia.org/wiki/Tonne_of_oil_equivalent,http://en.wikipedia.org/wiki/Tons_of_TNT

Uncategorized

RMOUG Feb 11-13 , 2013

February 8th, 2013

I’ll be presenting twice at RMOUG next week on the subject of how to manage creating copies of Oracle databases and creating copies through thin provision cloning  as well as database virtualization.

  • Feb 12, 2013 11:15 room 402
  • Feb 12, 2013   1:15 room 407

The first talk covers the technologies in the industry

The second talk concentrates primarily on Delphix database virtualization but will hit on the other technologies as well.

If you are going, I highly recommend the guidebook app for the conference. It’s awesome. I wish Oracle had this at Open World. It makes browsing presentations and building a schedule a piece of cake. You can install it by either of these methods

  • Download ‘Guidebook’ from the Apple App Store or the Android Marketplace
  • Visit http://guidebook.com/getit from your phone’s browser

or

I just took my iPhone and pointed to the following QR code with a free QR code reader app, and up popped the “Guidebook” install on my iphone. I installed it, then chose RMOUG under conferences and vioila, I had the RMOUG guide!

The speaker list is awesome this year. Look at all the headliners coming in to talk and some of there talks are listed here

Oracle ACEs

Michael Abbey John King
Karl Arao Christo Kutrovsky
Jordan Braunstein Michael Messina
Stewart Bryson Karen Morton
Andy Colvin Chris Ostrowski
Dominic Delmolino Kellyn Pot’Vin
Mark Farnham Jared Still
Michael Fons Mike Swing
Kyle Hailey George Trujillo
Jerry Ireland Dan Vlamis
Jeff Jacobs  Martin Widlake
 Oracle ACE Directors
Bradley Brown Cary Millsap
Sheeri Cabral Daniel Morgan
Tim Gorman Arup Nanda
Kent Graziano Mogens Nørgaard
Frits Hoogland Kerry Osborne
Dan Hotka Scott Spendolini
Peter Koletzke
Debra Lilley

Uncategorized

Jonathan Lewis tests Delphix

February 7th, 2013

Ask Jonathan about Delphix at

http://jonathanlewis.wordpress.com/2013/02/06/delphix/

Jonathan Lewis has graciously accepted an offer to come out to sunny California next month  and spend a few days at Delphix !  Jonathan will be putting  Delphix through it’s paces.  I’m super excited to have Jonathan  test Delphix and then learn about what he discovers. We are planning on co-presenting a webinar on the findings as well as Jonathan will be blogging is findings.

If you have questions for Jonathan about Delphix or scenarios you’d like him to test out then ask Jonathan to  test them out by commenting on  his blog post  on Delphix:

Jonathan Lewis’ blog post on Delphix

As a performance architect at Delphix for the past 2 years,   I’ve been  involved in all sorts of Delphix performance related work, benchmarks and tests.  I’ve spent much of that time personally pounding on Delphix and I am convinced that the technology is rock solid , fast and agile. I can’t understand why every Oracle shop doesn’t already have Delphix in place. Its incredible. With out Delphix the cloning process is like dragging huge weights around. With Delphix the cloning process is fast and light like having enormous power at ones fingertips.

For more information on Delphix see: http://www.delphix.com/products/how-it-works/

 

 

 

Uncategorized

Why does my full table scan take 10x longer today ?!

February 5th, 2013

Every so often a DSS query that usually takes 10 minutes ends up taking over an hour.  (or one that takes an hour never seems to finish)

Why would this happen?

When investigating the DSS query, perhaps with wait event tracing,  one finds that the query which is doing full table scans and should be doing large multi-block reads and waiting for “db file scattered read” is instead waiting for single block reads, ie “db file sequential read”.  What the heck is going on?

Sequential reads during a  full table scan scattered read query is a classic sign of reading rollback and reading rollback can make that minute(s) full table scan take hours.

What can happen especially after over night jobs, is that if an overnight job fails to finished before the DSS query is run and if that overnight job  does massive updates without committing till then end, then the DSS query will have to rollback any changes made by the updates to the tables the DSS query is accessing.

How do we quickly identify if this our issue?

ASH is good at identify it. On the other hand it’s often impractical to whip up from scratch an ASH query and that’s where ashmasters on Github comes in. This ASH query and others are on Github under ashmasters.

see https://github.com/khailey/ashmasters

For this case specifically see:

https://github.com/khailey/ashmasters/blob/master/ash_io_top_obj_advanced.sql

Here is the output (slight different format than in the github repository) of a query I used in my Oracle Performance classes

AAS SQL_ID           %  OBJ              TABLESPACE
----- -------------  ---  ---------------  ----------
  .18 0yas01u2p9ch4    6  ITEM_PRODUCT_IX  SOEINDEX
                       6  ORDER_ITEMS_UK   SOEINDEX
                      88  ITEM_ORDER_IX    SOEINDEX
  .32 6v6gm0fd1rgrz    6  MY_BIG_Table     SOEDATA
                      94  UNDO             UNDOTBS1

i.e. 95% of the second SQL_ID’s i/o was coming from UNDO. The reads will be single block reads and tremendously slow down the full table scans.

 

Uncategorized

Oaktable World video: Database Virtualization and Instant Provisioning

February 4th, 2013

Slides available at: Database Virtualization and Instant Cloning

Thanks to Marcin Przepiorowski  for editing videos and Tim Gorman for funding the videos. For a full list of Oaktable World 2013 videos see http://dboptimizer.com/oaktable-world/

A completely new and totally different database virtualization presentation will be given at

  • RMOUG  Feb 12, 2013 11:15 room 402 “technical”  and 1:15  rm 407  “marketing” with technical information
  • NoCOUG Feb 21, 2013
  • HOTSOS Mar 5, 2013

 

What is Database Virtualization?


 Perhaps the single largest storage consolidation opportunity in history

By Kyle Hailey, Delphix http://delphix.com
January, 2013

Brief

How would you like to

  • Double development output
  • Lighten DBA work load
  • Reduce storage

Existing database cloning technologies allow increased development output, fewer bugs in production, and reduced DBA workload.  Database virtualization, built upon these technologies, can greatly increase these gains.  In this paper we’ll examine the history of using database clones to improve application development and the technical advances of thin provisioned clones and ultimately database virtualization that allow massive gains in productivity.

Introduction

Oracle estimates that customers deploy, on average, 12 clones of production databases to non-production environments.  These database clones are used to support the software development lifecycle – developing new functionality,  testing new versions of applications by quality assurance (QA) and user acceptance testing (UAT) prior to production. The clones are also used for reporting and hoc information queries. Further, Oracle predicts this average will double by the time Oracle 12c is adopted.*  Today, most cloning is accomplished by creating full physical copies of production databases. These full physical copies are time consuming to make, requiring significant DBA time, storage space, and generally lead to project delays.
Development demands preclude organizations from working directly with the production database.  Development of new versions of applications must be performed in a sandbox where schema changes and data additions, subtractions, and manipulations can be performed without affecting business continuity.  After development, QA and UAT testing must be done on a system that matches the development specifications, along with suitable data.  Finally, ad hoc and reporting queries can have unexpected resource consumption which negatively affects performance on production systems.
Development and QA processes can further exacerbate the need for copies.  Developers generally work on separate branches of code which can have associated requirements for database schema changes or specific datasets.  If developers are sharing a database copy, the job falls to the developers to make sure they approve any changes and these changes are compatible with what everyone else is working on.  This process of approving changes alone can take weeks and add much more time debugging when data or schema changes break others’ code.  Ideally, developers would operate in a sandbox with a their own copy of the production test database.
QA generally run multiple regression test suites, validating that the newly developed functionality works and that existing functionality hasn’t broken.  When working with a single copy of a production database, this puts QA in a bind – they either have to run all tests suites simultaneously or serially.  When the test suites are run simultaneously, teams run the risk of compromising the results as data are modified by multiple independent tests.   Test suites can be run serially – refreshing the database copy after each test, but at a massive hit to productivity.  Much like with development, the ideal scenario is a production clone for each test suite.
As an example scenario, a customer with a 1 terabyte database with 100 developers and 20 suites would need close to 130 production database copies (one database copy per developer a test suite, and a few extra for branching, merging, ad hoc queries, and reporting).  Understandably, very few companies have the resources (DBA time, storage) to provision these, let alone keep them refreshed for the duration of the project.
Given all the high demand for clones of production databases, companies and DBAs often struggle to keep up and must make sacrifices in quality or quantity.  The compromises reached are generally fewer, shared databases, partial subset databases, or a mixture of both.

Solutions

Development productivity gains, reduction of production bugs, and DBA time savings have been available without extra licenses through little known functionality in Oracle since version 11.2.0.2.  Even greater productivity gains are available with industry leading technologies, supporting additional versions of Oracle and other leading databases.  These technologies enable productivity gains by reducing the workload and resource required to provision multiple copies of production databases.
In in our previous example, creating 130 copies of a 1TB database is easily possible in the space of a single copy of the production database using thin provision cloning. Thin provision cloning gives enormous disk savings by sharing the majority of source database data blocks.  A large portion of database blocks across multiple copies of a database remain the same, thus thin provision cloning allows those unchanged blocks to be shared between different clones. This technology ultimately led to database virtualization, which goes beyond thin clone provisioning to dramatically reduce the overhead of managing many cloned databases providing significant agility to development teams.
Database virtualization is based on the core technology of thin provision cloning, which provides clones of production databases in less space and time than making full physical copies.  Database virtualization evolves this technology to provide specific management controls, allowing virtual databases to be created, refreshed, rolled back, cloned, branched  and deleted in minutes. Virtual databases can be provisioned from any time frame (down to the second) within the source database’s retention window.
This functionality allows each developer and each QA test suite to have their own full copy of a production database.  Further, developers and testers can have access to weeks worth of backup databases, in the space of a single backup.  These backups can be brought online in minutes, data reviewed or extracted and the copy removed in minutes. Database virtualization allows DBAs to quit having to make compromises – they can provide any number of databases without worrying about the scope of the effort or the space required, and developers and testers can ensure significantly higher quality with more complete data.
In recap, the three industry technologies available for making clones are:

  1. Full physical clone
  2. Thin provisioned clone
  3. Database virtualization

Next we’ll describe how each of these technologies solve the problems presented by creating copies of production databases, and the benefits that each evolutionary step provide.

Technologies

Each of the technologies follows along an evolutionary path – full physical clones, thin provision clones, and database virtualization offer the ability to create multiple copies of production databases, but where they differ is in implementation feasibility and automation.

Full Physical Clone

Full physical clones are the classic way to make copies of production databases to non production environments. Full copies are just that – an entirely new instance of a database, separate from the production systems.  These clones are time consuming, resource intensive, and space consuming.  On average, the time to create a full physical clone is about two weeks from initial request to useable database instance.  To DBAs the core issue is clear – significant work and time is invested to make exact copies, much of which is unused meaning that the majority of the data blocks are and will remain identical.  Further, the work done by DBAs to create the database copies is immediately out of date and there is no easy management solution for maintaining, refreshing, or modifying these clones.  Database copies can be created, however significant effort is required from the DBA, development and QA teams to  work around the limitations of the system.  

Thin Provisioned Cloning

Thin provisioned cloning was the first technology to address the issue of storing large numbers of identical data blocks. Thin provisioning introduces a new layer over a copy of a source databases. Each clone has a separate thin layer where the clone maintains its changes to the central copy, which remains unchanged. As each clone has a separate thin layer that only it can see, each has the appearance of being a full physical copy of the source database. Thin provisioning can eliminate much of the space demand of database copies, reducing the associate storage cost of non-production database copies.
There are three categories of thin provisioning technology:

  1. Single point in time
  2. Multiple but limited points in time
  3. Multiple but limited points in time in a rolling window

Single Point in Time

Single point in time  thin provision cloning is the simplest thin provisioning technology, but the least flexible. Single point in time thin provisioning takes a full database backup at a point in time and allows multiple clones to open this backup. The technical innovation is allowing each clone to write any changes to a private area, thus each clone shares the majority of data blocks with the other clones but the private change area makes it appear to each clone as if they have a full size read/write copy of the database. The downside to this technology is that it does not account for database refreshes – any time a clone requires a newer version of the source database, then an entire new copy of the source database has to be made.  Further, it is only appropriate for situations in which high performance is not a key requirements as it is notably slower than its physical counterparts.  Finally, there is significant scripting required and limited documentation available, meaning that the onus is on the DBA to manage and own the environment.
Oracle first offered this technology in an obscure feature called DBclone in Oracle 11.2.0.2#, however it has performance and management overhead even in limited use and not appropriate for enterprise level development.

Multiple limited clone versions

To address the issue of database refreshes, EMC and Fujitsu offer thin provisioned cloning technology which allows sharing data blocks across multiple versions of the source databases. This technology is based on file systems that can take point-in-time snapshots.  The point-in-time snapshot can be cloned to provide a private read/write version of that file system. As changes come into the file system from the source database, new file system snapshots and clones can be created allowing multiple point in time database views.
Unfortunately, after a limited number of snapshots (generally around ten), the system has to be rebuilt requiring a complete new copy of the original database. In addition to periodic rebuilds, these systems also incur major performance hits. The performance hits can be so serious on VMware’s Data Directory linked clone technology that VMware recommends against using it for Oracle databases.

Continuous data versions

NetApp offers the ability to not only snapshot and then create clones from the snapshots but also drop any blocks from the original snapshot that are no longer needed, allowing a continuous rolling window of snapshots from the source database. Custom retention windows can be set up – new data blocks are constantly added and old data blocks dropped. As an example, if a two week retention window was desired, the system could snapshot the source database once a day and clones could share snapshots anywhere in that two week window. Blocks particular to snapshots falling outside of the two week time window could be dropped, thus allowing the system to run continuously without requiring rebuilds.
While this offers quite a bit of functionality not possible with other thin provisioned clones, there are a number of serious downsides that prevent most enterprises from deploying it.  

  • Hardware Lock-in: To provision this functionality NetApp requires buying specialized hardware which requires unique administration.  Administrators using this functionality with the NetApp hardware are required to write custom scripts to set up the system.
  • LUN-Level Snapshots: NetApp works on LUNs, taking snapshots and making clones of the full LUN as opposed to the datafiles. As it works at the LUN level, it can not detect any corruption in the datafiles that would otherwise be found using RMAN APIs to get the database backups.
  • Custom Scripting: Custom scripting is required to make the original database backup and keep the backup updated with changes from the source database.
  • Clone Creation: NetApp doesn’t supply any functionality to actually provision the clone databases, and clones can only be made from snapshots.  
  • Clone Flexibility: As clone can only be made from snapshots, a number of key use cases cannot be accomplished – clones can’t be created from any timestamp, can’t be rolled back, and can’t be branched.

Oracle’s ZFS storage appliance has a similar capability as Netapp but requires even more scripting and manual administration than Netapp thus has seen little to no uptake.

Database Virtualization

Thin provisioned clones have been around for almost two decades, yet it has seen very limited uptake due to the need for specialized hardware, expert knowledge, and scripting.  These barriers to entry and the limited set of use cases have ensured that thin provisioned cloning remains an underutilized technology.  Database virtualization was invented to take the benefits of thin provisioned clones, couple it with simple management, and provide significant more data agility through on-demand database access.  
Database virtualization takes the core technology of thin provisioned cloning and extends it providing the ability to:

  • Automate initial source database backup, snapshots, and redo log collection.
  • Automate data retention, clearing out data older than designated time window
  • Automate provisioning a clone from any SCN or second
  • Provision clones from multiple sources to the same point in time
  • Enable cloning of clones, branching clones, and rolling back clones
  • Efficiently store all the changes from source database  
  • Run continually and automatically
  • End user virtual database provisioning
  • Easy enough to be run by non-DBA, non-sysadmin

Database virtualization technology allows virtual database to be made in minutes, taking up almost no space since the virtual database only creates new control files, redo log files and a new temporary table space. All the rest of the data is initially shared.  This allows the following advantages:

  1. Databases on demand
  2. Faster development
  3. Higher quality testing
  4. Hardware reduction

Databases on Demand

Virtual databases can be self provisioned in a matter of minutes, eliminating significant bureaucracy.  Provisioning full physical copies can takes weeks, virtual databases take minutes now by eliminating both the data copying time of the production database as well as all the time for requesting, discussing, processing and allocating resources.  When a developer needs a clone they typically have to ask their manager, DBA, storage admin, etc.  The managerial decision making process, administrative tasks and coordination meetings often take weeks. With database virtualization all of the overhead can be eliminated. The developer can provision their own virtual database in minutes, with no storage overhead.

Faster development

As the resource and operational cost of providing database copies are eliminated with database virtualization, teams of developers can go from sharing one full physical production copy to each having their own private copy. With a private copy of the database, a developer can change schema and metadata as fast as they want instead of waiting days or weeks of review time to check in changes to a shared development database.

Higher quality testing

With as many virtual databases as needed, QA teams no longer need to rely on one full copy of the source database on which to run tests.  With a single database, QA teams often have to stop and refresh and ensure they’re not overlapping tests.  With database virtualization, QA can run many tests concurrently and  the virtual databases can be refreshed back to the original state in minutes allow immediate repeated replay of test suites, captured workloads and patch applications.

Hardware reduction

Database virtualization can dramatically reduce the amount of storage required for database copies.  As the majority of the data blocks are similar, database virtualization requires storing the changed blocks, and even those can be compressed.  
Database virtualization not only saves disk space but can also save RAM.  RAM on the virtual database hosts can be minimized because virtual databases share the same data files and can share the same blocks in the file system cache. No longer does each copy require private memory to cache the data.

Database Virtualization Examples

Delphix example

The Delphix Server is a software stack that implements database virtualization using the Delphix file system (DxFS).   The Delphix Server automates the process of database virtualization and management, and doesn’t require any specialized hardware. It only requires an x86 box to run the software and access to LUNs with about the same amount of the disk space of the database to be virtualized. The source database is backed up onto the Delphix virtual appliance via automated RMAN APIs, the data is compressed, Delphix automates syncing of the local copy with changes in production, freeing of data blocks outside the time retention window and Delphix handles the provisioning of virtual databases. A virtual database can be provisioned from any SCN or second in time during the retention window (typically two weeks).

Oracle Example

Oracle is enabling database virtualization in Oracle 12c with Snapshot Manager Utility (SMU) a pay for  licensed software utility . The utility runs on the Oracle ZFS storage appliance, where the the source database data files are stored.

Summary

Thin provision cloning has been around nearly two decades but has not been widely adopted due to the high barriers to entry.  These barriers, including specialized hardware, consistent system rebuilds, specialized storage administrators, and custom scripting have led to the de facto solution being physical clones.  Short of a more attractive option, companies have opted to create full or partial physical clones and deal with the ramifications of incomplete datasets, refresh difficulty, and concurrent use. With database virtualization, the hardware and management barriers have finally been eliminated allowing enterprises to offer significant database agility.

 

Appendix


Here are a list of the technologies that can be used to create thin provision clones

  • EMC – system rebuild issues after a few snapshots, hardware lock-in, requires advanced  scripting, performance issues
  • NetApp – hardware lock-in, size limitations, requires advanced  scripting
  • Clone DB (Oracle) – single version of source database only, performance issues, requires advanced scripting
  • ZFS Storage Appliance (Oracle)  – hardware lock-in, requires advanced scripting
  • Data Director (VMware) -  system rebuild issues, performance issues, x86 databases only,  officially not supported for thin provisioning cloning of Oracle databases
  • Oracle 12c Snapshot Manager Utility (SMU) – hardware lock-in, requires source database have it’s datafiles located on Oracle ZFS Appliance
  • Delphix – automated solution for both administrator and end user. Delphix works for Oracle 9,10,11 on RAC, Standard Edition and Enterprise Edition. Fully automated with time retention windows and end user self service provisioning. Also support SQL Server databases. With Delphix there are no size restrictions and unlimited clones and snapshots. Snapshots can even be taken of snapshots creating branched versions of source databases.

References


 

  • CloneDB

–      http://www.oracle-base.com/articles/11g/clonedb-11gr2.php

-    http://oracleprof.blogspot.ie/2013/01/how-dnfs-database-clone-works-part-1.html

 

  • ZFS

–      http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf

 

  • ZFS Appliance

–      http://www.oracle.com/technetwork/articles/systems-hardware-architecture/cloning-solution-353626.pdf

 

  • Data Director

–      http://www.virtuallyghetto.com/2012/04/scripts-to-extract-vcloud-director.html

–      http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180

–      http://myvirtualcloud.net/?p=1222 linked Clone

 

  • EMC

–      https://community.emc.com/servlet/JiveServlet/previewBody/11789-102-1-45992/h8728-snapsure-oracle-dnfs-wp.pdf

 

  • NetApp

–      http://media.netapp.com/documents/snapmanager-oracle.pdf

–      https://communities.netapp.com/docs/DOC-10323  flexclone

–     http://blog.thestoragearchitect.com/2010/08/02/netapp-the-inflexibility-of-flexvols/

 

  • Delphix

–      http://delphix.com


* Charles Garry, Oracle keynote at NYOUG in Dec 2012

 

 

Uncategorized

Oracle I/O latency monitoring

January 30th, 2013

One thing that I have found sorely missing in the performance pages of Enterprise Manager is latency values for various types of I/O. The performance page or top activity may show high I/O waits but it won’t indicated if the latency of I/O is unusually high or not. Thus I put together a shell script that shows latency for the main I/O waits

  • db file sequential read
  • db file scattered read
  • log file parallel write
  • direct path reads
  • direct path reads temp

Of course it would be nice to add a few others like direct path writes, direct path writes temp and log file sync but there is only so much room in the screen width.

The script is called oramon.sh and is available on github at

https://github.com/khailey/oramon/blob/master/oramon.sh

Example:

$  oramon.sh
Usage: oramon.sh [username] [password] [host] [sid] <port=1521> <runtime=3600>

$ ./oramon.sh system sys 172.16.100.81 vsol
RUN_TIME=-1
COLLECT_LIST=
FAST_SAMPLE=iolatency
TARGET=172.16.100.81:vsol
DEBUG=0
Connected, starting collect at Wed Apr 18 18:41:13 UTC 2012
starting stats collecting

   single block       logfile write       multi block      direct read   direct read temp
   ms      IOP/s        ms    IOP/s       ms    IOP/s       ms    IOP/s       ms    IOP/s
   20.76    27.55    32.55      .71     3.50      .00      .00      .01      .00      .00
     .00      .20               .00               .00               .00               .00
   34.93   369.64   116.79     3.55               .00               .00               .00
   31.43   640.33    92.40     8.33               .00               .00               .00
   39.39   692.33   111.69     8.00               .00               .00               .00

The first line of output is the average since the database started up.
The subsequent lines are the averages since the last line which is 5 seconds by default.
One should be able to see immediately how much activity there is on the database and the latency for the basic types of database I/O.

Reads
Single block reads are the typical I/O from a database which would happen for example when reading a row in a table with indexes in place.
Multi block reads are common as well is which would happen when for example summing the values over all rows in a table.
Direct reads are less common but quite normal and happen almost exclusively for parallel query though may be used for other activities especially in newer version of Oracle such as 11.2. Direct reads are multiblock reads that by pass the Oracle buffer cache. The size varies from a datablock, such as 8k to 1MB.
Direct read temp happens when a sort has overflowed memory limits and been written to disk. Direct reads temp are multiblock reads that by pass the Oracle buffer cache. The size varies from a datablock, such as 8k to 1MB.

Writes
Logfile writes are the only writes that database users wait for in general. Actually users only wait when the commit, which then is a wait for a signal from the log writer that their particular redo data is on disk which could have already happened. Typically the user wait time is a bit slower than the logwrite time but in general it’s close, ie within a few milliseconds. The farther apart the user wait time is from the log write time the more likely there is a CPU, paging or other concurrency problem on the VDB host slowing down the users signalling and wake up time.
oramon.sql : Oracle Latency Query

If for some reason the shell script isn’t able to connect to the database, then the same data can be collected manually by running the SQL query in SQL*Plus by hand.
The following two SQL queries, oramon_setup.sql and oramon.sql are available on github at

https://github.com/khailey/oramon

If you want to see the latencies over periods shorter than 60s, then you have to collect the values of the cumulative counters at time A, then again at time B and take the difference. The following two queries, oramon.sql and oramon_setup.sql, are available on ftp site

Run oramon_setup.sql *once*
  column seq_ms for 9999.99
   column seq_ct for 9999.99
   column lfpw_ms for 9999.99
   column lfpw_ct for 9999.99
   column seq_ms for 9999.99
   column scat_ct for 9999.99
   column dpr_ms for 9999.99
   column dpr_ct for 9999.99
   column dprt_ms for 9999.99
   column dprt_ct for 9999.99
   column prevdprt_ct new_value prevdprt_ct_var
   column prevdprt_tm new_value prevdprt_tm_var
   column prevdpwt_ct new_value prevdpwt_ct_var
   column prevdpwt_tm new_value prevdpwt_tm_var
   column prevdpr_ct new_value prevdpr_ct_var
   column prevdpr_tm new_value prevdpr_tm_var
   column prevdpw_ct new_value prevdpw_ct_var
   column prevdpw_tm new_value prevdpw_tm_var
   column prevseq_ct new_value prevseq_ct_var
   column prevseq_tm new_value prevseq_tm_var
   column prevscat_ct new_value prevscat_ct_var
   column prevscat_tm new_value prevscat_tm_var
   column prevlfpw_ct new_value prevlfpw_ct_var
   column prevlfpw_tm new_value prevlfpw_tm_var
   column prevsec new_value prevsec_var
   select 0 prevsec from dual;
   select 0 prevseq_tm from dual;
   select 0 prevseq_ct from dual;
   select 0 prevscat_ct from dual;
   select 0 prevscat_tm from dual;
   select 0 prevlfpw_ct from dual;
   select 0 prevlfpw_tm from dual;
   select 0 prevdprt_ct from dual;
   select 0 prevdprt_tm from dual;
   select 0 prevdpwt_ct from dual;
   select 0 prevdpwt_tm from dual;
   select 0 prevdpr_ct from dual;
   select 0 prevdpr_tm from dual;
   select 0 prevdpw_ct from dual;
   select 0 prevdpw_tm from dual;
   column prevdprt_ct noprint
   column prevdprt_tm noprint
   column prevdpwt_ct noprint
   column prevdpwt_tm noprint
   column prevdpr_ct noprint
   column prevdpr_tm noprint
   column prevdpw_ct noprint
   column prevdpw_tm noprint
   column prevseq_ct noprint
   column prevseq_tm noprint
   column prevscat_ct noprint
   column prevscat_tm noprint
   column prevlfpw_ct noprint
   column prevlfpw_tm noprint
   column prevsec noprint

Run following query to see the current latency for

  • single block read
  • log file parallel write
  • multi-block read

oramon.sql

select
        round(seqtm/nullif(seqct,0),2) seq_ms,
        round(seqct/nullif(delta,0),2) seq_ct,
        round(lfpwtm/nullif(lfpwct,0),2) lfpw_ms,
        round(lfpwct/nullif(delta,0),2) lfpw_ct,
        round(scattm/nullif(scatct,0),2) scat_ms,
        round(scatct/nullif(delta,0),0) scat_ct,
        round(dprtm/nullif(dprct,0),2) dpr_ms,
        round(dprct/nullif(delta,0),2) dpr_ct,
        round(dprttm/nullif(dprtct,0),2) dprt_ms,
        round(dprtct/nullif(delta,0),2) dprt_ct,
        prevseq_ct, prevscat_ct, prevseq_tm, prevscat_tm, prevsec,prevlfpw_tm,prevlfpw_ct
        , prevdpr_ct, prevdpr_tm , prevdprt_ct, prevdprt_tm , prevdpw_ct, prevdpw_tm
        , prevdpwt_ct, prevdpwt_tm
from
(select
       sum(decode(event,'db file sequential read', round(time_waited_micro/1000) -  &prevseq_tm_var,0)) seqtm,
       sum(decode(event,'db file scattered read',  round(time_waited_micro/1000) - &prevscat_tm_var,0)) scattm,
       sum(decode(event,'log file parallel write',  round(time_waited_micro/1000) - &prevlfpw_tm_var,0)) lfpwtm,
       sum(decode(event,'db file sequential read', round(time_waited_micro/1000) ,0)) prevseq_tm,
       sum(decode(event,'db file scattered read',  round(time_waited_micro/1000) ,0)) prevscat_tm,
       sum(decode(event,'log file parallel write',  round(time_waited_micro/1000) ,0)) prevlfpw_tm,
       sum(decode(event,'db file sequential read', total_waits - &prevseq_ct_var,0)) seqct,
       sum(decode(event,'db file scattered read',  total_waits - &prevscat_ct_var,0)) scatct,
       sum(decode(event,'log file parallel write',  total_waits - &prevlfpw_ct_var,0)) lfpwct,
       sum(decode(event,'db file sequential read', total_waits ,0)) prevseq_ct,
       sum(decode(event,'db file scattered read',  total_waits ,0)) prevscat_ct,
       sum(decode(event,'log file parallel write',  total_waits ,0)) prevlfpw_ct,
       sum(decode(event,'direct path read',  round(time_waited_micro/1000) - &prevdpr_tm_var,0)) dprtm,
       sum(decode(event,'direct path read',  round(time_waited_micro/1000) ,0)) prevdpr_tm,
       sum(decode(event,'direct path read',  total_waits - &prevdpr_ct_var,0)) dprct,
       sum(decode(event,'direct path read',  total_waits ,0)) prevdpr_ct,
       sum(decode(event,'direct path write',  round(time_waited_micro/1000) - &prevdpw_tm_var,0)) dpwtm,
       sum(decode(event,'direct path write',  round(time_waited_micro/1000) ,0)) prevdpw_tm,
       sum(decode(event,'direct path write',  total_waits - &prevdpw_ct_var,0)) dpwct,
       sum(decode(event,'direct path write',  total_waits ,0)) prevdpw_ct,
       sum(decode(event,'direct path write temp',  round(time_waited_micro/1000) - &prevdpwt_tm_var,0)) dpwttm,
       sum(decode(event,'direct path write temp',  round(time_waited_micro/1000) ,0)) prevdpwt_tm,
       sum(decode(event,'direct path write temp',  total_waits - &prevdpwt_ct_var,0)) dpwtct,
       sum(decode(event,'direct path write temp',  total_waits ,0)) prevdpwt_ct,
       sum(decode(event,'direct path read temp',  round(time_waited_micro/1000) - &prevdprt_tm_var,0)) dprttm,
       sum(decode(event,'direct path read temp',  round(time_waited_micro/1000) ,0)) prevdprt_tm,
       sum(decode(event,'direct path read temp',  total_waits - &prevdprt_ct_var,0)) dprtct,
       sum(decode(event,'direct path read temp',  total_waits ,0)) prevdprt_ct,
       to_char(sysdate,'SSSSS')-&prevsec_var delta,
       to_char(sysdate,'SSSSS') prevsec
from
     v$system_event
where
     event in ('db file sequential read',
               'db file scattered read',
               'direct path read temp',
               'direct path write temp',
               'direct path read',
               'direct path write',
               'log file parallel write')
) ;

Output looks like

  SEQ_MS   SEQ_CT  LFPW_MS  LFPW_CT   SEQ_MS  SCAT_CT   DPR_MS   DPR_CT  DPRT_MS  DPRT_CT
-------- -------- -------- -------- -------- -------- -------- -------- -------- --------
  115.71   422.67    76.17    12.00               .00               .00               .00

The first execution of the query is I/O since database startup, so should most likely be ignored.
Subsequent executions are the I/O since the last execution

The columns are

  1. SEQ_MS: single block latency
  2. SEQ_CT: single block reads per second
  3. LFPW_MS: log file parallel write latency
  4. LFPW_CT: log file parallel write count per second
  5. SCAT_MS: multi-block latency
  6. SCAT_CT: multi-block reads per second
  7. DPR_MS: direct path read latency
  8. DPR_CT: direct path read count
  9. DPRT_MS: direct path read temp latency
  10. DPRT_CT: direct path read temp count
Instead of running the query by hand the script “oramon.sh” available at  https://github.com/khailey/oramon/blob/master/oramon.sh (see top of page) will collect this info ever 5 seconds in a loop and output to standard out at the UNIX prompt
NOTE: the following is a simpler query but the data only updates once a minute
select
       n.name event,
       m.wait_count  cnt,
       10*m.time_waited ms,
       nvl(round(10*m.time_waited/nullif(m.wait_count,0),3) ,0) avg_ms
  from v$eventmetric m,
       v$event_name n
  where m.event_id=n.event_id
        and (
              wait_class_id= 1740759767 --  User I/O 
                   or
              wait_class_id= 4108307767 --  System I/O  
             )
        and m.wait_count > 0 ;

.

Uncategorized

NoCOUG 21 Feb 2013 at Oracle: Big Data and NoSQL

January 30th, 2013

This past year I joined up on the board of the Northern California Users Group (NoCOUG) and am excited to help the NoCOUG team bring great conferenes to the Bay Area.

Our new conference director Ben Prusinski has scheduled a whole slew of NoSQL and Big Data presentations at Conference 105 on February 21. Has NoCOUG abandoned SQL? You can bet your bottom dollar it hasn’t! In fact, it was Oracle’s own idea to have so many NoSQL and Big Data presentations at Conference 105. Dave Rubin, Director of NoSQL Database Development at Oracle will explain why NoSQL Database and Oracle Database are the perfect match. They are not mortal enemies. They complete and complement each other. They shine in different use cases.

But why Big Data? Can’t Oracle Database handle gynormous amounts of data? Sure it can but, once again, it’s all about using the right tool for the job. Hadoop and Oracle are not mortal enemies. They complete and complement each other. They shine in different use cases.

If you’re a die-hard relational purist and refuse to look NoSQL and Big Data in the eye then have no fear for Ben has scheduled plenty of traditional sessions for Oracle developers as well as database administrators.

We’re looking forward to another awesome NoCOUG conference with something for everybody.

Click here to check the agenda and RSVP.

Kindest regards,

The hard-working board and volunteers of NoCOUG

REGISTER HERE: http://www.nocoug.org/rsvp.html

Auditorium Room 102 Room 103
Registration and Continental Breakfast Thursday 8:00 to 9:00 Registration and Continental Breakfast Thursday 8:00 to 9:00
President Welcome and General Session Thursday 9:00 to 9:30 President Welcome and General Session Thursday 9:00 to 9:30
Thursday 9:30 to 10:30
Thursday 9:30 to 10:30
Thursday 11:00 to 12:00
Big Data: The Big Story
Jean-Pierre Dijcks
Oracle Corp
Topic
NoSQL
Understanding SQLTXPLAIN (SQLT) main report by nav…
Carlos Sierra
Oracle Corp
Topic
Developer
Thursday 11:00 to 12:00
Thursday 13:00 to 14:00
The Sins of SQL Programming that Send the DB to Po…
Abel Macias
Oracle Corp
Topic
Developer
Thursday 13:00 to 14:00
Thursday 14:30 to 15:30
Reduce Database Latency
Josh Lyford
Whiptail Storage
Topic
DBA
Advanced SQL Injection techniques
Slavik Markovich
McAfee
Topic
Developer
Thursday 14:30 to 15:30
Thursday 16:00 to 17:00
Looney Tuner? No, there IS a method to my madness…
Janis Griffin
Confio Software
Topic
Developer
Thursday 16:00 to 17:00

Uncategorized

Delphix Live webcast: Database Virtualization

January 16th, 2013

Live webcast: Realize Massive ROI with Database Virtualization
Date: Wednesday Jan 30, 2013 @12pm ET/9am PT
Click Here to Register

If you could create as many copies as you wanted of production databases for development, reporting and QA, how many would you create? 10,20, a 100? It is possible in the space of a single copy of the production database using thin provision cloning. Thin provision cloning give enormous disk savings by sharing the majority of source database data blocks. Thin provision cloning is one technology of database virtualization. Database virtualization goes beyond thin provision cloning technology to provide agile corporate data management.

Virtual databases can be created, refreshed, rolled back, rolled forward and deleted in seconds. Virtual databases can be provisioned from any second within the source databases retention window which is typically several weeks. Every developer can have their own full copy of production databases, production databases can have 50 days of backup live online in the space of one backup. Backups can be brought online in seconds, data reviewed or extracted and the copy removed in seconds. QA teams can go from one test environment to instead having multiple full copies of production database allowing running QA tests in parallel.

Delphix eliminates widespread IT inefficiencies caused by dragging behind enormous amounts of infrastructure, process and bureaucracy  required to provide database copies.  Delphix eliminates the drag and provides power through agile data management software and database virtualization. Join this session to learn how organizations like Deutsche Bank, Proctor and Gamble, Facebook, EA, Stubhub and many more are realizing significant returns with Delphix, including:

 

  • Greater agility: 500% greater application project output
  • Lower risk: 50% higher error detection in development
  • Reduced costs: 90% reduction in storage cost for copies and backups 

Join us on Jan 30 @12PM ET to learn more. Click Here to Register

 

Uncategorized

R: slicing and dicing data

January 4th, 2013

Nicer formating at https://sites.google.com/site/oraclemonitor/r-slicing-and-dicing-data

  1. R data types
  2. Converting columns into vectors
  3. Extracting Rows and converting Rows to numeric vectors
  4. Entering data
  5. Vectorwise maximum/minimum
  6. Column Sums and Row Sums

R can do some awesome data visualizations: http://gallery.r-enthusiasts.com/thumbs.php

Instead of doing one off data visualizations like with Excel, R can automate the process allowing one to visualize many sets of data with the same visualizations.

Installing R is pretty easy http://scs.math.yorku.ca/index.php/R:_Getting_started_with_R

There are lots of blogs out there on getting started with R. The one thing that I didn’t find explained well was slicing and dicing data.

Lets take some data that I want to visualize.  The following data shows the performance of network throughput. The throughput is measured by latency of communication in milliseconds (avg_ms) and throughput in MB per second (MB/s).

The parameters are the I/O message size in KB (0KB is actually 1 byte) and the number of concurrent threads sending data (threads)

IOsize ,threads ,avg_ms ,    MB/s
     0 ,      1 ,   .02 ,    .010
     0 ,      8 ,   .04 ,    .024
     0 ,     64 ,   .20 ,    .025
     8 ,      1 ,   .03 ,  70.529
     8 ,      8 ,   .04 , 150.389
     8 ,     64 ,   .23 ,  48.604
    32 ,      1 ,   .06 , 149.405
    32 ,      8 ,   .07 , 321.392
    32 ,     64 ,   .18 ,  73.652
   128 ,      1 ,   .03 , 226.457
   128 ,      8 ,   .01 , 557.196
   128 ,     64 ,   .06 , 180.176
  1024 ,      1 ,   .01 , 335.587
  1024 ,      8 ,   .01 , 726.876
  1024 ,     64 ,   .02 , 714.162

If this data is a file, it can be easily loaded and charted with R.

Find out what directory R is working in:

getwd()

go to a directory with my data and R files:

setwd("C:/Users/Kyle/R")

list files

dir()

load data into a variable

mydata <- read.csv("mydata.csv")

Simple, et voila, the data is loaded. To see the data just type the name of the variable ( the “>” is the R prompt, like “SQL>” in SQL*Plus)

> mydata
   IOsize threads avg_ms    MB.s
1       0       1   0.02   0.010
2       0       8   0.04   0.024
3       0      64   0.20   0.025
4       8       1   0.03  70.529
5       8       8   0.04 150.389
6       8      64   0.23  48.604
7      32       1   0.06 149.405
8      32       8   0.07 321.392
9      32      64   0.18  73.652
10    128       1   0.03 226.457
11    128       8   0.01 557.196
12    128      64   0.06 180.176
13   1024       1   0.01 335.587
14   1024       8   0.01 726.876
15   1024      64   0.02 714.162

Creating a chart is a breeze, just say plot(x,y) where x and y are the values you want to plot.
How to we extract an x and y from mydata?
First pick what to plot. Let’s plot averge ms latency (avg_ms) verse MB per sec (MB.s).
Here is how to extract those columns from the data

x=mydata['avg_ms']
y=mydata['MB.s']

Now plot

> plot(x,y)
Error in stripchart.default(x1, ...) : invalid plotting method

huh … what’s that Error?

If we look at x and/or y, they are actually columns from mydata and plot() wants rows (actually vectors but we’ll get there).

> x
   avg_ms
1    0.02
2    0.04
3    0.20
4    0.03
5    0.04
6    0.23
7    0.06
8    0.07
9    0.18
10   0.03
11   0.01
12   0.06
13   0.01
14   0.01
15   0.02

To transpose a column into a row we can use “t()”

> t(x)
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
avg_ms 0.02 0.04  0.2 0.03 0.04 0.23 0.06 0.07 0.18  0.03  0.01  0.06  0.01  0.01  0.02

Now we can try plotting again:

> plot(t(x),t(y))

and voila

but let’s address the issue of transforming x and y from columns to rows and specifically into vectors.
Let’s look at the original data and then the transformed data

x=mydata['avg_ms']     #  column of data extracted from a data.frame
tx=t(mydata['avg_ms']) #  transform the column of data into a row

Look at the datatypes of x and t(x) using the class() function

> class(mydata)
[1] "data.frame"
> class(x)
[1] "data.frame"
> class(tx)
[1] "matrix"

the column is considered a “data.frame” and the row is considered a “matrix”.

The method of extracting a column by it’s column name only works for datatype class data.frame.

If the datatype was a matrix we would be required to supply both the row and column as in  matrix["row","column"]

By leaving either row or column empty but keeping the comma in place then it acts as a wild card.

matrix[,"column"] – gives all values in that column

matrix["row",] – gives all the values in that row

plot() wants a vector (but it forgivingly works with rows of data as we did above).

R data types

What are these datatypes in R?
There is a simple discussion of data types at http://www.statmethods.net/input/datatypes.html

The types are basically (using “value1:value2″ gives a list iterating from value1 to value2 by increments of 1)

  • integer
    • > i=1:5
      > class(i)
      [1] "integer"
      > i
      [1] 1 2 3 4 5
  • character
    • > c=letters[1:5]
      > class(c)
      [1] "character"
      > c
      [1] "a" "b" "c" "d" "e"
  • (booleans are integers )
    • > b=FALSE:TRUE
      > class(b)
      [1] "integer"
      > b
      [1] 0 1
  • vectors
    • > v=c(1,2,3,4,5)
      > class(v)
      [1] "numeric"
      > v
      [1] 1 2 3 4 5
  • matrix
    • > m=matrix(c(1,2,3,4,5))
      > class(m)
      [1] "matrix"
      > m
           [,1]
      [1,]    1
      [2,]    2
      [3,]    3
      [4,]    4
      [5,]    5
  • data.frames – mixes numeric and character
    • > df=matrix(1:5,letters[1:5])      # matrix can't contain character and numeric
      Error in matrix(1:5, letters[1:5]) : non-numeric matrix extent
      >
      > df=data.frame(1:5,letters[1:5])  # dataframe can

      > class(df)
      [1] "data.frame"
      > df
        X1.5 letters.1.5.
      1    1            a
      2    2            b
      3    3            c
      4    4            d
      5    5            e
  • lists – like an matrix but can mix different data types together such as character, number, matrix
    •  > a = c(1,2,5.3,6,-2,4) # numeric vector
      > # generates 5 x 4 numeric matrix 
      > y=matrix(1:20, nrow=5,ncol=4)
      > # example of a list with 4 components - 
      > # a string, a numeric vector, a matrix, and a scaler 
      > w= list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)
      > w
      $name
      [1] "Fred"
      
      $mynumbers
      [1]  1.0  2.0  5.3  6.0 -2.0  4.0
      
      $mymatrix
           [,1] [,2] [,3] [,4]
      [1,]    1    6   11   16
      [2,]    2    7   12   17
      [3,]    3    8   13   18
      [4,]    4    9   14   19
      [5,]    5   10   15   20
      
      $age
      [1] 5.3
    • extract the various parts of a list with  list[["name"]], as in w[["mymatrix"]]
  • array – are matrices with more than 2 dimensions
  • factors

Useful functions on data types

  • dput(var) – will give structure of var
  • class(var) – will tell the data type
  • dim(var) – will set dimension
  • as.matrix(data.frame) – useful for changing a data.frame into a matrix, though be careful because if there are any character values in the data frame then all entries in the matrix will be charcter

Sometimes R transforms data in ways I don’t predict, but the best strategy is just to force R to do what I want more explicitly.

Converting columns into vectors

When originally selecting out the columns of the data, we could have selected out vectors directly instead of selecting a column and transforming the column to a vector.
Instead of asking for the column which gives a column we can ask for every value in that column
by adding in a “,” infront of the column name. The brackets take the equivalent of x and y coordinates or row and column position. By adding a “,” with no value before it, we are giving a wild card to the row identifier and saying give me all the values for all rows in the column “avg_ms”

x=mydata[,'avg_ms']
> class(x)
[1] "numeric"
> x
 [1] 0.02 0.04 0.20 0.03 0.04 0.23 0.06 0.07 0.18 0.03 0.01 0.06 0.01 0.01 0.02

We can also extract the values by the column position instead of column name. The “avg_ms” is column 3

> x=mydata[,3]
> class(x)
[1] "numeric"
> x
 [1] 0.02 0.04 0.20 0.03 0.04 0.23 0.06 0.07 0.18 0.03 0.01 0.06 0.01 0.01 0.02

A third way to get the vector format is using “[[ ]]” syntax

> x=mydata[[3]]
> class(x)
[1] "numeric"
> x
 [1] 0.02 0.04 0.20 0.03 0.04 0.23 0.06 0.07 0.18 0.03 0.01 0.06 0.01 0.01 0.02

A forth way is with the matrix$col syntax

> x=mydata$avg_ms
> class(x)
[1] "numeric"
> x
 [1] 0.02 0.04 0.20 0.03 0.04 0.23 0.06 0.07 0.18 0.03 0.01 0.06 0.01 0.01 0.02

Another way that we’ll talk about in converting a row to a vector is the apply() and as.numeric() functions:
The function apply can also change a column to a vector

> x=mydata['avg_ms']
> class(x)
[1] "data.frame"
> x
   avg_ms
1    0.02
2    0.04
3    0.20
4    0.03
5    0.04
6    0.23
7    0.06
8    0.07
9    0.18
10   0.03
11   0.01
12   0.06
13   0.01
14   0.01
15   0.02
> x=apply(x,1,as.numeric)
> class(x)
[1] "numeric"
> x
[1] 0.02 0.04 0.20 0.03 0.04 0.23 0.06 0.07 0.18 0.03 0.01 0.06 0.01 0.01 0.02

These vector extractions work for columns but things are different for rows.

Extracting Rows and converting Rows to numeric vectors

The other side other coin is extracting a row into vector format. In mydata, the rows don’t have names, so we have to use position. By specifying row position with no following column names then all column values are given for that row.

> row=mydata[3,]
> class(row)
[1] "data.frame"
> row
  IOsize threads avg_ms  MB.s
3      0      64    0.2 0.025

The resulting data is a  data frame and not a vector  (ie a vector is of datatype numeric)
We can use the “as.numeric” function to convert the data.frame to a vector, ie numeric.
The apply() function will apply the “as.numeric” function to multiple values at once. The apply() takes 3 args

  • input variable
  • 1=row,2=col,1:2=both
  • function to apply

see http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/

> ra=apply(row,2,as.numeric)
> class(ra)
[1] "numeric"
> ra
 IOsize threads  avg_ms    MB.s
  0.000  64.000   0.200   0.025

The above applies  the change to all columns in the given row in a data.frame.

(apply can also be used for example to change all 0 to NULLs

new_matrix = apply(matrix,1:2,function(x)if (x==0)  NULL else x)

see http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega)

For selecting the row out directly as a vector, the as.matrix() function can also be used

> row=as.matrix(mydata)[3,]
> class(row)
[1] "numeric"
> row
 IOsize threads  avg_ms    MB.s
  0.000  64.000   0.200   0.025

yet another way

> row=c(t(mydata[3,]))
> class(row)
[1] "numeric"
> row
[1]  0.000 64.000  0.200  0.025

( see http://stackoverflow.com/questions/2545228/converting-a-dataframe-to-a-vector-by-rows)

or yet

> row=unlist(mydata[3,])
> class(row)
[1] "numeric"
> row
 IOsize threads  avg_ms    MB.s
  0.000  64.000   0.200   0.025

Filtering Data

The data in the CSV file actually represents throughput not only at different I/O send sizes but also for different number of concurrent senders. What if I wanted to just plot the throughput by I/O send size for tests with one thread? How would I filter the data?

IOsize=subset(mydata[,'IOsize'],mydata['threads'] == 1 )
MBs=subset(mydata[,'MB.s'],mydata['threads'] == 1 )
plot(IOsize,MBs)

 

How about plotting the throughput by I/O size for each number of threads test?
The parameter ‘type=”o”‘  makes the plot a line plot

#extract data
IOsize=subset(mydata[,'IOsize'],mydata['threads'] == 1 )
MBs_1=subset(mydata[,'MB.s'],mydata['threads'] == 1 )
MBs_8=subset(mydata[,'MB.s'],mydata['threads'] == 8 )
MBs_64=subset(mydata[,'MB.s'],mydata['threads'] == 64 )
# create graph
plot(IOsize,MBs_64,type="o")
# plot other lines
lines(IOsize,MBs_1,lty=2,col="green",type="o")
lines(IOsize,MBs_8,lty=3,col="red",type="o")

# add a legend
legend(1,700,c("1 thread","8 threads","64 threads"), cex=0.8,
   col=c("green","red","black"), lty=3:1);

 

 

 

Entering data

Instead of entering data via a CSV file it can be entered directly into R

> m=matrix(c(
     0 ,      1 ,  .02 ,    .010 ,
     0 ,      8 ,  .04 ,    .024 ,
     0 ,     64 ,  .20 ,    .025 ,
     8 ,      1 ,  .03 ,  70.529 ,
     8 ,      8 ,  .04 , 150.389 ,
     8 ,     64 ,  .23 ,  48.604 ,
    32 ,      1 ,  .06 , 149.405 ,
    32 ,      8 ,  .07 , 321.392 ,
    32 ,     64 ,  .18 ,  73.652 ,
   128 ,      1 ,  .03 , 226.457 ,
   128 ,      8 ,  .01 , 557.196 ,
   128 ,     64 ,  .06 , 180.176 ,
  1024 ,      1 ,  .01 , 335.587 ,
  1024 ,      8 ,  .01 , 726.876 ,
  1024 ,     64 ,  .02 , 714.162 ),
nrow=4,ncol=15,
dimnames=list(rows=c( 'IOsize' ,'threads' ,'avg_ms' , 'MB/s'
)))
> m
rows      [,1]  [,2]   [,3]   [,4]    [,5]   [,6]    [,7]    [,8]   [,9]   [,10]   [,11]   [,12]    [,13]    [,14]    [,15]
  IOsize  0.00 0.000  0.000  8.000   8.000  8.000  32.000  32.000 32.000 128.000 128.000 128.000 1024.000 1024.000 1024.000
  threads 1.00 8.000 64.000  1.000   8.000 64.000   1.000   8.000 64.000   1.000   8.000  64.000    1.000    8.000   64.000
  avg_ms  0.02 0.040  0.200  0.030   0.040  0.230   0.060   0.070  0.180   0.030   0.010   0.060    0.010    0.010    0.020
  MB/s    0.01 0.024  0.025 70.529 150.389 48.604 149.405 321.392 73.652 226.457 557.196 180.176  335.587  726.876  714.162

> t(m)
        IOsize threads avg_ms    MB/s
   [1,]      0       1   0.02   0.010
   [2,]      0       8   0.04   0.024
   [3,]      0      64   0.20   0.025
   [4,]      8       1   0.03  70.529
   [5,]      8       8   0.04 150.389
   [6,]      8      64   0.23  48.604
   [7,]     32       1   0.06 149.405
   [8,]     32       8   0.07 321.392
   [9,]     32      64   0.18  73.652
  [10,]    128       1   0.03 226.457
  [11,]    128       8   0.01 557.196
  [12,]    128      64   0.06 180.176
  [13,]   1024       1   0.01 335.587
  [14,]   1024       8   0.01 726.876
  [15,]   1024      64   0.02 714.162

The bizarre thing about this is that the nrows corresponds to the number of columns and the matrix comes out transposed. Using t() can re-transpose it, but this is all confusing.
To make it more intuitive add the argument
"byrow=TRUE,"
and add a
"NULL"
for the rowname position in the row and columns name section

m=matrix(c(
     0 ,      1 ,  .02 ,    .010 ,
     0 ,      8 ,  .04 ,    .024 ,
     0 ,     64 ,  .20 ,    .025 ,
     8 ,      1 ,  .03 ,  70.529 ,
     8 ,      8 ,  .04 , 150.389 ,
     8 ,     64 ,  .23 ,  48.604 ,
    32 ,      1 ,  .06 , 149.405 ,
    32 ,      8 ,  .07 , 321.392 ,
    32 ,     64 ,  .18 ,  73.652 ,
   128 ,      1 ,  .03 , 226.457 ,
   128 ,      8 ,  .01 , 557.196 ,
   128 ,     64 ,  .06 , 180.176 ,
  1024 ,      1 ,  .01 , 335.587 ,
  1024 ,      8 ,  .01 , 726.876 ,
  1024 ,     64 ,  .02 , 714.162 ),
nrow=15,ncol=4,byrow=TRUE,
dimnames=list(NULL,c( 'IOsize' ,'threads' ,'avg_ms' , 'MB/s'
)))
> m
     IOsize threads avg_ms    MB/s
 [1,]      0       1   0.02   0.010
 [2,]      0       8   0.04   0.024
 [3,]      0      64   0.20   0.025
 [4,]      8       1   0.03  70.529
 [5,]      8       8   0.04 150.389
 [6,]      8      64   0.23  48.604
 [7,]     32       1   0.06 149.405
 [8,]     32       8   0.07 321.392
 [9,]     32      64   0.18  73.652
[10,]    128       1   0.03 226.457
[11,]    128       8   0.01 557.196
[12,]    128      64   0.06 180.176
[13,]   1024       1   0.01 335.587
[14,]   1024       8   0.01 726.876
[15,]   1024      64   0.02 714.162

Vectorwise maximum/minimum

Another issues is trying to get the max or min of two or more values on a point by point basis.
Using the “min()” function gives a single minimum and not a minimum on a point by point basis.
Use “pmax()” and “pmin()” to get point by point max and min of two or more vectors.
> lat
[1]  44.370  22.558  37.708  73.070 131.950
> std
[1]  37.7  21.6  67.1 136.1 186.0
> min
[1] 0.0 0.6 0.6 1.0 1.0
> pmax(lat-std,min)
[1] 6.670 0.958 0.600 1.000 1.000

 

Column Sums and Row Sums

to sum up rows or colums use “rowSums()” and   “colSUms()”

http://stat.ethz.ch/R-manual/R-patched/library/base/html/colSums.html

For more info

for more info on data types and manipulation see

see: http://cran.r-project.org/doc/manuals/R-intro.html

Uncategorized

No 3d charts in Excel? try R

January 2nd, 2013

 

I wanted to plot a set of data by 3 dimensions.

I wanted to plot I/O read latency by MB/s throughput by number of concurrent readers. Seemed simple. Well it turns out there is no good way to do it in Excel. Sure Excel has 3d charts but, attention, the z axis is treated like rows and not values. For example

Note that the z axis, “users” had 3 values marked  on the axis. Those 3 values are 1,16 and 64. Notice that  16 is as far from 1 as 64 is from 16, ie the distance is not proportional to the value.

There is a free plug-in for Excel called Excel3Dscatterplot, but the data is hard to read, for example

Luckily R saves the day. With R there are a number of easy ways to graph 3d data.

Let’s take a data set that represents network I/O showing  the send size in KB (s_KB),  number of concurrent sender threads (thrds), the MB send throughput (s_MB/s) and then avg, min and max latency (mn_ms, avg_ms, max_ms).

s_KB ,thrds ,mn_ms ,avg_ms , max_ms , s_MB/s
     ,    1 ,  .02 ,   .05 ,   3.06 ,   .010 
     ,    8 ,  .04 ,   .12 ,   5.27 ,   .024 
     ,   64 ,  .20 ,   .82 ,  12.03 ,   .025 
   8 ,    1 ,  .03 ,   .06 ,   8.47 , 70.529 
   8 ,    8 ,  .04 ,   .17 ,  21.64 ,150.389 
   8 ,   64 ,  .23 ,  1.31 ,  20.50 , 48.604 
  32 ,    1 ,  .06 ,   .10 ,   1.82 ,149.405 
  32 ,    8 ,  .07 ,   .32 ,  16.78 ,321.392 
  32 ,   64 ,  .18 ,  5.32 , 380.02 , 73.652 
 128 ,    1 ,  .03 ,   .28 ,   2.01 ,226.457 
 128 ,    8 ,  .01 ,   .80 ,  54.78 ,557.196 
 128 ,   64 ,  .06 , 11.77 ,  77.96 ,180.176
1024 ,    1 ,  .01 ,  1.49 ,   5.76 ,335.587
1024 ,    8 ,  .01 ,  5.35 , 118.48 ,726.876
1024 ,   64 ,  .02 , 40.50 , 221.59 ,714.162

to plot this in R, first start R up (see http://scs.math.yorku.ca/index.php/R:_Getting_started_with_R#Installing_R )

find out where R’s working directory is

   getwd()

go to a directory with my data and r files

   setwd("C:/Users/Kyle/Documents/GitHub/nio")

list files

   dir()

load and plot
   nio <- read.csv("perfibmd1_loopback_short.csv")

   nio
s_KB thrds mn_ms avg_ms max_ms  s_MB.s
1    NA     1  0.02   0.05   3.06   0.010
2    NA     8  0.04   0.12   5.27   0.024
3    NA    64  0.20   0.82  12.03   0.025
4     8     1  0.03   0.06   8.47  70.529
5     8     8  0.04   0.17  21.64 150.389
6     8    64  0.23   1.31  20.50  48.604
7    32     1  0.06   0.10   1.82 149.405
8    32     8  0.07   0.32  16.78 321.392
9    32    64  0.18   5.32 380.02  73.652
10  128     1  0.03   0.28   2.01 226.457
11  128     8  0.01   0.80  54.78 557.196
12  128    64  0.06  11.77  77.96 180.176
13 1024     1  0.01   1.49   5.76 335.587
14 1024     8  0.01   5.35 118.48 726.876
15 1024    64  0.02  40.50 221.59 714.162


library(scatterplot3d)
MBs=nio[,'s_MB.s']
threads=nio[,'thrds']
IOsize=nio[,'s_KB']
s3d <-scatterplot3d(IOsize,threads,MBs)

 

That’s a bit simple, but we can add some easy improvements. Here the dots are made solid, with color highlighting and a drop line to the bottom plane

s3d <-scatterplot3d(IOsize,threads,MBs,pch=16, highlight.3d=TRUE,
type="h", main="3D Scatterplot")
In this test I’m running five different IO sizes 1 byte, 8k, 32k, 128k and 1024k. It would be nice to draw a line for each I/O size by increasing number of threads. There is no way to use 3 coordinates to identify a point, but there is a conversion routine that converts 3 coordinate point to two
s$xyz.convert(x,y,z)
There are 5 sets of points, i.e. one set per I/O size. Each set had 3 points, i.e. one point for each different number of concurrent threads 1,8,64.
The code loops through the 5 sets and draw a line through the 3 points:

x=nio[,'IOsize']
y=nio[,'threads']
z=nio[,'s_MB.s']
s <- scatterplot3d(x,y,z,xlab="IOsize", ylab="threads", zlab="MBs")
   for ( i in  1:5  )  {
     j=i*3
     p1 <- s$xyz.convert(x[j-2],y[j-2],z[j-2])
     p2 <- s$xyz.convert(x[j-1],y[j-1],z[j-1])
     p3 <- s$xyz.convert(x[j],y[j],z[j])
     segments(p1$x,p1$y,p2$x,p2$y,lwd=2,col=i)
     segments(p2$x,p2$y,p3$x,p3$y,lwd=2,col=i)
  }

It’s also easy to rotate the data. Here are two larger datasets

example

m=matrix(c(
 0 ,    1 , 0.02 ,  0.06 ,   4.72 ,   0 ,  0.008  ,  0.000  ,99.19 , 0.67 , 0.05 , 0.09 , 0.00 , 0.00 , 0.00 , 0.00 , 0.06 , 0.08 ,    1 ,
   0 ,    2 , 0.03 ,  0.05 ,   2.35 ,   0 ,  0.020  ,  0.000  ,99.49 , 0.39 , 0.07 , 0.05 , 0.00 , 0.00 , 0.00 , 0.00 , 0.04 , 0.07 ,    0 ,
   0 ,    4 , 0.03 ,  0.07 ,  14.27 ,   0 ,  0.026  ,  0.000  ,98.33 , 1.18 , 0.15 , 0.24 , 0.06 , 0.04 , 0.00 , 0.00 , 0.05 , 0.07 ,    0 ,
   0 ,    8 , 0.03 ,  0.13 ,  34.08 ,   0 ,  0.024  ,  0.000  ,93.49 , 4.91 , 0.51 , 0.66 , 0.13 , 0.31 , 0.00 , 0.00 , 0.05 , 0.14 ,    0 ,
   0 ,   16 , 0.04 ,  0.26 ,  31.27 ,   0 ,  0.025  ,  0.000  ,56.29 ,42.31 , 0.16 , 0.57 , 0.31 , 0.30 , 0.06 , 0.00 , 0.11 , 0.16 ,    0 ,
   0 ,   32 , 0.10 ,  0.35 ,   3.57 ,   0 ,  0.029  ,  0.000  ,39.26 ,57.80 , 0.71 , 1.08 , 0.49 , 0.66 , 0.00 , 0.00 , 0.18 , 0.43 ,    0 ,
   0 ,   64 , 0.16 ,  0.70 ,  39.93 ,   0 ,  0.032  ,  0.000  , 1.28 ,91.69 , 3.18 , 1.64 , 1.25 , 0.37 , 0.59 , 0.00 , 0.33 , 0.70 ,    0 ,
   8 ,    1 , 0.02 ,  0.07 ,   8.17 ,   0 , 59.668  ,  0.000  ,93.41 , 6.34 , 0.13 , 0.11 , 0.01 , 0.00 , 0.00 , 0.00 , 0.05 , 0.10 ,    0 ,
   8 ,    2 , 0.03 ,  0.07 ,   9.75 ,   0 ,111.486  ,  0.000  ,98.18 , 1.35 , 0.24 , 0.18 , 0.04 , 0.00 , 0.00 , 0.00 , 0.06 , 0.08 ,    0 ,
   8 ,    4 , 0.03 ,  0.09 ,  39.08 ,   0 ,154.623  ,  0.000  ,97.09 , 1.86 , 0.47 , 0.47 , 0.07 , 0.04 , 0.00 , 0.00 , 0.06 , 0.08 ,    0 ,
   8 ,    8 , 0.05 ,  0.15 ,  18.48 ,   0 ,171.580  ,  0.000  ,78.14 ,19.78 , 0.74 , 1.05 , 0.11 , 0.19 , 0.00 , 0.00 , 0.09 , 0.29 ,    0 ,
   8 ,   16 , 0.05 ,  0.36 ,  37.54 ,   0 ,154.778  ,  0.000  ,51.87 ,44.81 , 0.93 , 0.97 , 0.80 , 0.62 , 0.00 , 0.00 , 0.11 , 0.51 ,    0 ,
   8 ,   32 , 0.06 ,  0.62 ,  43.06 ,   0 ,152.741  ,  0.000  ,15.84 ,66.88 ,13.27 , 1.64 , 0.64 , 1.42 , 0.31 , 0.00 , 0.11 , 0.55 ,    0 ,
   8 ,   64 , 0.24 ,  1.07 ,  24.72 ,   0 ,166.518  ,  0.000  , 8.00 ,58.29 ,25.47 , 5.75 , 1.99 , 0.45 , 0.04 , 0.00 , 0.43 , 3.34 ,    0 ,
  32 ,    1 , 0.05 ,  0.11 ,   8.46 ,   0 ,145.238  ,  0.000  ,40.63 ,58.08 , 1.14 , 0.15 , 0.01 , 0.00 , 0.00 , 0.00 , 0.10 , 0.13 ,    0 ,
  32 ,    2 , 0.04 ,  0.12 ,   5.20 ,   0 ,248.208  ,  0.000  ,25.14 ,73.08 , 1.47 , 0.25 , 0.05 , 0.00 , 0.00 , 0.00 , 0.11 , 0.14 ,    0 ,
  32 ,    4 , 0.06 ,  0.19 ,  22.94 ,   0 ,297.655  ,  0.000  ,11.05 ,85.37 , 2.40 , 0.97 , 0.15 , 0.06 , 0.00 , 0.00 , 0.12 , 0.22 ,    0 ,
  32 ,    8 , 0.05 ,  0.32 ,  21.12 ,   0 ,360.777  ,  0.000  , 5.75 ,87.25 , 3.21 , 3.06 , 0.45 , 0.26 , 0.00 , 0.00 , 0.12 , 0.83 ,    0 ,
  32 ,   16 , 0.09 ,  0.62 ,  22.54 ,   0 ,362.603  ,  0.000  , 3.22 ,81.87 , 6.48 , 6.21 , 1.15 , 1.06 , 0.00 , 0.00 , 0.25 , 1.88 ,    0 ,
  32 ,   32 , 0.08 ,  1.14 ,  40.08 ,   0 ,369.589  ,  0.000  , 1.85 ,71.97 ,14.28 , 8.46 , 1.24 , 1.95 , 0.25 , 0.00 , 0.42 , 4.55 ,    0 ,
  32 ,   64 , 0.38 ,  2.23 ,  33.70 ,   0 ,319.048  ,  0.000  , 0.00 ,36.12 ,39.33 ,18.56 , 2.61 , 3.31 , 0.07 , 0.00 , 0.68 , 4.97 ,    5 ,
 128 ,    1 , 0.02 ,  0.29 ,   8.71 ,   0 ,212.555  ,  0.000  , 0.65 ,98.08 , 1.05 , 0.19 , 0.02 , 0.00 , 0.00 , 0.00 , 0.28 , 0.34 ,    0 ,
 128 ,    2 , 0.01 ,  0.33 ,   8.54 ,   0 ,376.113  ,  0.000  , 6.95 ,87.98 , 4.28 , 0.76 , 0.02 , 0.00 , 0.00 , 0.00 , 0.28 , 0.64 ,    0 ,
 128 ,    4 , 0.01 ,  0.46 ,  35.22 ,   0 ,533.401  ,  0.000  ,31.45 ,62.25 , 5.15 , 1.04 , 0.06 , 0.05 , 0.00 , 0.00 , 0.23 , 0.68 ,    1 ,
 128 ,    8 , 0.01 ,  0.83 ,  89.39 ,   0 ,545.988  ,  0.000  ,23.75 ,59.64 ,10.87 , 4.96 , 0.25 , 0.51 , 0.02 , 0.00 , 0.28 , 0.97 ,    0 ,
 128 ,   16 , 0.01 ,  1.57 ,  54.30 ,   0 ,554.879  ,  0.000  , 8.77 ,54.71 ,19.95 ,14.16 , 1.36 , 0.98 , 0.06 , 0.00 , 0.37 , 2.52 ,    0 ,
 128 ,   32 , 0.04 ,  3.21 ,  86.26 ,   0 ,549.206  ,  0.000  , 2.61 ,41.04 ,26.36 ,23.33 , 3.27 , 3.00 , 0.38 , 0.00 , 0.49 , 5.87 ,    0 ,
 128 ,   64 , 0.07 ,  6.93 , 115.71 ,   0 ,502.197  ,  0.000  , 5.93 ,32.15 ,16.22 ,30.94 , 7.48 , 5.84 , 1.26 , 0.17 , 0.79 ,20.10 ,    0 ,
1024 ,    1 , 0.04 ,  2.57 ,  12.71 ,   0 ,194.751  ,  0.000  ,39.55 ,47.67 , 0.98 ,11.80 , 0.00 , 0.00 , 0.00 , 0.00 , 0.11 , 2.16 ,    0 ,
1024 ,    2 , 0.01 ,  1.97 ,  10.25 ,   0 ,506.611  ,  0.000  ,48.02 ,36.93 , 0.20 ,14.84 , 0.01 , 0.00 , 0.00 , 0.00 , 0.11 , 1.42 ,    0 ,
1024 ,    4 , 0.01 ,  2.73 ,  44.98 ,   0 ,724.303  ,  0.000  ,17.76 ,70.16 , 0.64 ,11.37 , 0.04 , 0.03 , 0.00 , 0.00 , 0.11 , 1.66 ,    1 ,
1024 ,    8 , 0.01 ,  5.27 ,  91.10 ,   0 ,738.061  ,  0.000  ,16.37 ,66.91 , 2.36 ,12.81 , 0.72 , 0.74 , 0.08 , 0.01 , 0.11 , 1.88 ,    0 ,
1024 ,   16 , 0.01 , 10.37 , 157.60 ,   0 ,727.271  ,  0.000  ,13.81 ,58.74 , 8.07 ,14.70 , 2.30 , 2.19 , 0.18 , 0.01 , 0.15 , 4.63 ,    0 ,
1024 ,   32 , 0.01 , 20.61 , 116.30 ,   0 ,726.869  ,  0.000  ,11.06 ,49.24 ,13.88 ,16.86 , 3.71 , 4.41 , 0.62 , 0.22 , 0.26 ,10.11 ,    1 ,
1024 ,   64 , 0.06 , 39.57 , 316.81 ,   0 ,721.422  ,  0.000  , 2.40 ,40.93 ,21.18 ,20.12 , 5.73 , 7.25 , 1.47 , 0.91 , 0.56 ,22.30 ,    0

),
nrow=35,ncol=19,byrow=TRUE, dimnames=list(NULL,c(
's_KB','thrds','mn_ms','avg_ms','max_ms','r_KB','s_MB/s','r_MB/s',' <100u','<500u','<1ms','<5ms','<10ms','<50ms','<100m',' <1s',' p50',' p95','retrans'
)))MBs=m[,'s_MB/s']
threads=m[,'thrds']
IOsize=m[,'s_KB']x=IOsize
y=threads
z=MBs

library(scatterplot3d)
s <- scatterplot3d(x,y,z,xlab="IOsize", ylab="threads", zlab="MBs")
nthreads=7
for ( i in  1:5  )  {
  beg=(1+(i-1)*nthreads)
  end=(beg+nthreads-2)
  cat( beg," ",end,"\n")
  for ( j in beg:end ) {
       p1 <- s$xyz.convert(x[j],y[j],z[j])
       p2 <- s$xyz.convert(x[j+1],y[j+1],z[j+1])
       segments(p1$x,p1$y,p2$x,p2$y,lwd=2,col=i)
   }
}

rotating the axis

z=IOsize
x=threads
y=MBs
s <- scatterplot3d(x,y,z,pch=16,highlight.3d=TRUE,type="h",zlab="IOsize", xlab="threads", ylab="MBs")
nthreads=7
for ( i in  1:5  )  {
  beg=(1+(i-1)*nthreads)
  end=(beg+nthreads-2)
  cat( beg," ",end,"\n")
  for ( j in beg:end ) {
       p1 <- s$xyz.convert(x[j],y[j],z[j])
       p2 <- s$xyz.convert(x[j+1],y[j+1],z[j+1])
       segments(p1$x,p1$y,p2$x,p2$y,lwd=2,col=i)
   }
}
There is more to do.  Wanted to hurry up and post and will add more later. Would be good to make the IOsize axis log scale because all the small sizes 1 byte, 8k, 32k and 128k are all bunched together. The library scatterplot3d has log axis scoped but not implemented. The call will take a log argument but not yet use it, thus in the mean time one can run log10 on the values and update the tick labels.
 All that being said, I’m dubious of the use and utility of 3d charts in most situations. Sure graphing a 3d  function where things are changing seamlessly in all directions can be insightful, but for simple data like the above, 2d is probably clearer

Uncategorized