What is database virtualization? Database virtualization, as opposed to operating system virtualization, is sharing a read only copy of a source database duplicate blocks between all clones of a source databases. Where VMware shares the compute tier between multiple virtual machines, virtual databases share the duplicate datablocks between the virtual databases. The clone databases are called virtual databases (also called thin provision clones as distinct from full physical copy clones). The virtual databases appear as full size copes of the source databases. Database changes made by a virtual database are only seen by that virtual database as if the virtual database had its own full private copy of the source database.
Database Virtualization gives:
- Enormous storage savings. Clones initially take up almost no space.
- Instant provisioning of virtual databases. Clones can be made instantly since there is no copying of data.
A virtual database can be made in seconds and takes up almost no space since the virtual database only creates new control files, redo log files and a new temporary table space. All the rest of the data is shared initially.
For example, if one wanted to supply a team of 100 developers each with their own copy of a production database, then for every 1TB of production data files it would cost 100TB of disk to support the full copies. Copying 100TB of data would require not only massive amounts of storage but would also be extremely time consuming. Because of the time it takes to create those 100 copies, the copies would be out of date with production. On the other hand with virtual databases, the amount of disk required would be about 1TB total for all 100 virtual databases and because of how quick virtual databases can be created, the virtual database would be closely in sync with production
Coming of age
Database virtualization is finally coming of age. The core technology to virtualize databases has been around for almost 2 decades but because virtualizing databases has required specialized hardware, expert knowledge, and complex scripting. Thus virtualizing databases remained an underutilized technology. An analogy is that the usage of the internet. The internet existed before the web browser and it was possible to share photos, music, and documents via the internet but few people did because the tools were relatively unknown and difficult. Once the web browser came out and sharing became as easy as point and click and then usage of the internet exploded.
The same explosion is happening with database virtualization. Delphix for example now eliminates the requirements for specialized hardware, expert knowledge and complex scripting. Delphix is so completely turnkey that a developer can provision a fully operational virtual database in seconds with 3 clicks of a mouse. Delphix is not the only one automating database virtualization. Oracle as well will be introducing a point and click interface in Oracle 12c.
Barriers to entry
There have been two basic barriers to entry for adoption of database virtualization
1. Specialized Hardware
One of the barriers to entry to database virtualization has been the requirement of specialized storage systems such as EMC, Netapp or Oracle ZFS storage appliance that were capable of taking snapshots of the file system. The requirement for specialized hardware has been eliminated with new technologies such as open source ZFS file system, Oracle dNFS Clonedb and Delphix file system (DxFS) .
2. Costly Administration
Even without the requirement of specialized hardware, the adoption of virtualized databases has been low. Adoption has been low because of the complicated scripting requirements for managing virtual databases. Managing the process of virtualization has been too high a barrier for major adoption. Some of the steps that have to be managed in a database virtualization system are administering the RMAN backups, incremental backups, archive log files, file system snapshots, file system clones, exporting file systems on fibre channel, iscis, or nfs, mounting those file systems on virtual database host machines, creating proper startup parameters and recovering the virtual database to a point in time.
Now the hardware and management barriers are falling with fully automated database virtualization solutions the adoption of database virtualization is taking off. For the first time companies such as Delphixare adding the automation steps on top of the storage snapshot technologies. The automation allows anyone to harness the power of database virtualization.
Delphix for example is a software stack that implements Delphix file system (DxFS) and Delphix Datavisor that automates the whole process of database vitualization and doesn’t require file system snapshots or EMC or NetApp. It only requires an x86 box with about the same amount of the disk space of the database to be virtualized. The source database is backed up onto the Delphix appliance via automated RMAN APIs, the data is automatically compressed and Delphix automates syncing of the local copy with changes in production and Delphix handles the provisioning of virtual databases. A virtual database can be provisioned from any SCN or second in time during the retention window. The retention window is typically 2 weeks.
Oracle as well is attempting to bring automation to database virtualization in Oralce12c with SMU or snapshot manager utility a pay for licensed software utility . The utility is based on ZFS and even though ZFS is open source, Oracle’s implantation of SMU requires an Oracle ZFS appliance, ie specialized hardware which will be a barrier to entry for most IT shops. An even higher barrier to entry for SMU is that SMU requires the source database data files be stored on the ZFS appliance. Using the ZFS storage appliance for production databases will all but prohibit the use of SMU for production databases. Delphix on the other hand has no such limitations.
Automated virtual database solutions eliminates bureaucracy. What took weeks in the past takes seconds now by eliminating both the data copying time of the production database as well as all the time for requesting, discussing, processing and allocating resources. When a developer wants a clone they typically have to ask their manager, their DBA, their storage admin etc. The managerial decision making process, administrative tasks and coordination meetings often take weeks. With database virtualization all of the overhead can be eliminated. The developer can provision their own virtual database in seconds by themselves, when they want with a few clicks of a mouse with now storage overhead.
Speed up development: each developer gets a copy
Because the resource cost of providing database copies is eliminated it means that a team of developers can go from sharing one copy of production to each having their own private copy of the source database. Now with a private copy of the database, a developer can change schema and metadata as fast as they want instead of waiting days or weeks of review time to check in changes to a shared development database
Virtualizing databases not only saves disk space but can save RAM. RAM on the virtual database hosts can be minimized because virtual databases share the same data files and can share the same blocks and thus share the same blocks in the file system cache on Delphix, thus reducing the amount of memory required to cache data. No longer does each copy require private memory to cache the data. With virtual databases not only are blocks shared on disk but the blocks are also shared in memory. For example, Delphix is connected between the backend storage SAN and the database hosts and thus Delphix sits between the back end SAN and the virtual database hosts and serves as a shared cache for all the virtual databases. Since the virtual databases use the same underlying copy of the original database then Delphix can hold blocks in memory from the original database and those memory cached blocks can be shared between all the virtual databases.
How to virtualize databases
What are the current virtualization technologies and what are the advantages and disadvantages of each ?
- Clone DB (Oracle)
- ZFS Storage Appliance (Oracle)
- Data Director (VMware)
- Oracle 12c Snapshot Manager Utility (SMU)
Issues to consider in virtualization are
- Specialized hardware
- Golden copy problem
- Performance issues
- Size restrictions
- Database limitations
The largest barrier to entry is specialized hardware. EMC, Netapp , ZFS Storage Appliance and Oracle 12c SMU all require specialized storage hardware.
Of these choices, the least interesting is probably EMC because EMC’s snapshot technology is based on a journaling file system and not a pointer based file system and thus one is constrained by the “golden copy” problem (see below)
Of the other 3 solutions Netapp, ZFS storage appliance and Oracle 12c SMU, Netapp is the most tried and true and has limited automation such as Snap Manager for Oracle. At the opposite end with no automation is the the ZFS storage appliance which requires a custom scripting using an Oracle proprietary shell. The scripting requirements are removed largely by Oracle 12c SMU licensed product which automates the snapshot and cloning and works on top of the ZFS storage appliance. One of the biggest issues with 12c SMU is that the source database is required to use storage on the ZFS appliance.
Golden Copy problem
With pointer based file systems one can start with an initial full copy of the source database and then keep a rolling window of incremental changes from the source database. For example one could have incremental backups of the source database everyday for a month. These snapshots would consist of the new changed blocks in the incremental backup as well as the unchanged blocks from the initial full copy backup. As incremental backups fill the retention time window then blocks that are older than the time window can be freed when there are newer copies of those blocks.
Netapp, ZFS and Delphix are all based on pointer based file systems and can have rolling retention windows.
On the other hand CloneDB, Data Director and EMC all have the golden copy problem. For CloneDB there is only one copy of the source database and if a new copy is required then a new full copy has to be taken of the source databases. For Data Director and EMC they allow snapshots so the original copy can be shared with newer versions of the source database but the number of snapshot is limited and once the limit is reached, the source database has to be copied over in its entirety.
Two of the solutions have significant performance drawbacks. Oracle’s clonedb and VMware’s Data Director have performance issues. VMware’s Data Director performance degrades the more clones there are sharing the same snapshot. Also the more snapshots that are taken the more performance degrades. For Oracle’s CloneDB, Oracle doesn’t recommend cloned for databases where performance is a concern.
Net App has a limit of 255 snapshots. Snaps are limited to the same aggregate (storage pool)
Aggregates have size limits depending on controller. A 32 bit controllers is limited to 16T.
Size restrictions for VMware Data Director linked clones are limited to 30 snapshots and EMC are limited to 16 snapshots.
Delphix file system (DxFS) and ZFS solutions are for all practical purposes unbounded in size, in snapshots and in snapshots of snapshots (i.e. branching clones).
VMwares data director is limited to support x86 databases, such as Linux and OpenSolaris only.
The other solutions support any OS version of Oracle.
The solutions offer different level of automation from none to fully automated. Automation is critical for adoption and utilization.
EMC, Netapp and VMware Data Director all provide a limited set of automation but not enough to significantly create adoption. Even with automation EMC and Data Director have the golden copy problem thus can never be fully automated to create a rolling window of time from which clones can be provisioned. Netapp can be set up to create a rolling window of snapshots, but Netapp lacks an interface for provisioning the clones, thus creating clones requires significant work for DBAs and is out of reach for an end user such as a developer. Oracle’s 12c SMU will provide automated provisioning of databases but as this is an unreleased product, it’s unclear whether this will be functionality accessible to end users such as developers.]
Delphix is the only solution that fully automates the system eliminating the need to know how to run RMAN or how to recover a database and gives a friendly, slick UI to developers and end users that allows them to create virtual databases.
Here are a list of the technologies out there and the issues
- EMC – golden copy issues, hardware lock-in, requires advanced scripting
- NetApp – hardware lock-in, size limitations, requires advanced scripting
- Clone DB (Oracle) – golden copy issue, performance issues, requires advanced scripting
- ZFS Storage Appliance (Oracle) – hardware lock-in, requires advanced scripting
- Data Director (VMware) - golden copy issue, performance issues, x86 databases only
- Oracle 12c Snapshot Manager Utility (SMU) – hardware lock-in, requires source database have it’s datafiles located on Oracle ZFS Appliance
- Delphix – automated solution for both administrator and end user. Delphix works for Oracle 9,10,11 on RAC, Standard Edition and Enterprise Edition. Fully automated with time retention windows and end user self service provisioning. Also support SQL Server databases. With Delphix there are no size restrictions and unlimited clones and snapshots. Snapshots can even be taken of snapshots creating branched versions of source databases.
If interested in producing a home grown data virtualization, then either set up open source ZFS or buy an Oracle ZFS storage appliance and read Oracle’s white paper listed below in the references and start coding.
If looking for an enterprise level data virtualization system that is fully automated, hardware agnostic, runs itself, and provides self service end user provisioning then Delphix is the solution.
- ZFS Appliance
- Data Director
– http://myvirtualcloud.net/?p=1222 linked Clone
– https://communities.netapp.com/docs/DOC-10323 flexclone