Clarifying Some Differences Between Network-Based Flash Caching and SSD SANs

George Crump, of Storage Switzerland, published a great article titled Cost Effectively Solving Oracle Performance Problems to which Kaminario (K) responded with some thoughts of their own on why SSD SANs are a better choice for solving Oracle performance problems compared to network-based flash caching.  We disagree with some of the points made by Kaminario:

Storage Switzerland: The challenge is that typically these vendors have limited experience in delivering the types of storage services that Oracle administrators have become accustomed to.
K: This is no longer true. Today, Kaminario K2 offers sophisticated SAN features important to Oracle administrators including lightening fast snapshots and non-disruptive operations.

There is no such thing as non-disruptive deployment of new storage arrays. In a production environment, halting a system to perform data migration, validate that migration and then restart the environment can be time and resource intensive. Converting existing scripts and operating procedures to use a new vendor’s snapshot features can be equally complicated and risky.  With GridIron’s transparent network-based deployment, no changes are required to business processes or applications and there is no data migration involved–it is truly non-disruptive!

K: The idea of caching is to quickly serve data that was served before. Most Oracle read performance problems are random read or single blocks. This is where mechanical disk storage is limited. If Oracle needs to read a block, that block needs to be in the cache appliance to improve performance. But will that block be in the cache? Only if that block is being used a lot (was served before). We call these hot-blocks. BUT, if these blocks are hot, they will already be in the Oracle internal cache and therefore Oracle will not read them. You’d end up double caching without a lot of improvements, unless the entire SAN data is placed in the caching appliance.

The difference in size between Oracle block cache and the dataset is so vast that the Oracle block cache cannot effective hold anything but the hottest blocks. SAN based caches can be scaled to be many times the size of the largest memory footprints (impossible to construct using server DRAM) yet be a fraction of the dataset size and the (implicitly larger) physical storage footprint. Using sophisticated caching algorithms based on performance feedback is the key to making caches effective for large datasets. GridIron customers who evaluated server-based, storage-based and network-based caching using flash can testify to the advantages that proper algorithms bring to the picture.

K: Some applications are dynamic in nature with real random access. You will not get much improvement from a caching solution compared with placing the entire database on a Flash array. We have seen this with Oracle Flash Cache vs. placing the entire tablespace on SSD.

Holding a dataset vs. making that dataset available at high bandwidth are entirely different problems. Holding data on a single SSD drive would only be available at the bandwidth of that single drive and the queue depth of that drive’s controller. Data within a storage array populated with SSD disks is limited by the RAID structure – RAID 5 cannot deliver more bandwidth than 4 disks and cannot deliver higher IOPS than 4 disks (assuming perfectly spread random reads). GridIron architecture can spread data over 100 disks per appliance to deliver highest concurrent bandwidth at levels not available from primary storage arrays.

K: I agree that caching will improve writes BUT not as much as placing an entire database on Kaminario K2. There is no doubt there.

Write throughput of spinning disks is higher than SSDs. If a database is write throughput bound, there is no doubt you would be wasting money on an all-flash array.

K: Finally, if you really like your existing SAN and want to keep it, use a solution like Kaminario K2 mirrored to the existing SAN. Use ASM or OS mirroring to assure that writes go to both storages, but reads are served only from K2 (prefer-read option). Then you will get both read and the write improvements. This solution will work for every Oracle application.

There are several issues with this approach:

  1. Preferred read is an Oracle 11g/ASM function and is not available for Oracle 10 or earlier.
  2. The doubling of the bandwidth and IOPS coming out of the servers leads to other performance problems. The goal is to have processors do more Oracle work, not spend twice as much effort moving data around.
  3. Finally, the ASM silvering process to integrate a new all-flash array into an existing storage environment can take upwards of 4 hours for a 50TB data warehouse during which the Oracle server is so busy with the silvering process that it is practically offline i.e., NOT highly available. Here are the details of the silvering process:
    • The K2 array has a write bandwidth of 8GB/sec
    • The Oracle server will need to silver the K2 plex by copying data from the main SAN array and writing it to the K2 array resulting in:
      • Sustained reads fully saturating the primary SAN – a Tier-1 SAN will saturate at 4GB/sec.
      • An Oracle server load of 4GB/sec for reading the data AND 4GB/sec of writing the data. This will essentially saturate the Oracle server.
      • Streaming writes of 4 GB/sec to the K2 array while it is silvering
    • The 4GB/sec number is actually pretty ambitious from a server to sustain – let’s say that it does so…
    • That translates to approximately 12TB/hour of silvering
    • So… when you install your K2 array and start the ASM silvering – you will bring down the Oracle server and the primary Storage Array to its knees and take them essentially offline for 4+ HOURS

Contrast the above approach with GridIron’s SAN accelerator that starts boosting data immediately when it turns on and:

  • Learns and stores new data in RAM
  • De-stages the data to flash in the background making the flash write time a non-issue
  • NEVER overloads the server OR the primary storage array with superfluous “saturating” reads that degrade performance.

GridIron Systems TurboCharger is the only solution in the market today that can be deployed without disrupting business operations and without requiring any changes to operational processes.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s