Home > Architectures, Oracle, PCIe, SSD, tuning > ZFS Tuning for SSDs

ZFS Tuning for SSDs

Update – Some of the suggestions below have been questioned for a typical ZFS setup.  To clarify, these setting should not be implemented on most ZFS installations.  The L2ARC is designed to ensure that by default it will never hurt performance.  Some of the changes below can have a negative impact on workloads that are not using the L2ARC and accepts the possibility of worse performance in some workloads for better performance with cache friendly workloads.  These suggestions are intended for ZFS implementations where the flash cache is one of the main motivating factors for deploying ZFS – think high SSD to disk ratios.  In particular, these changes were tested with an Oracle database on ZFS

In 2008 Sun Microsystems announced the availability of a feature in ZFS that could use SSDs as a read or write cache to accelerate ZFS.  A good write-up on the implementation of the level 2 adaptive read cache (L2ARC) by a member of the fishworks team is available here.  In 2008, flash SSDs were just starting to penetrate the enterprise storage market and this cache was written with many of early flash SSD issues in mind.  First, it warms quite slowly, defaulting to a maximum setting of 8 MB/s cache load rate.  Second, to avoid being in a write heavy path, it is explicitly set outside of the data eviction path from the ZFS memory cache (ARC).  This prevents it from behaving like a traditional level 2 cache and causes it fill more slowly with mainly static data.  Finally, the default record size of the file system is rather big (128KB) and the default assumption is that for sequential scans it is better to just read from the disk and skip the SSD cache.

Many of the assumptions about SSD don’t line up with current generation SSD products.  An enterprise class SSD can write quickly, has high capacity, higher sequential bandwidth than the disk storage systems, and has a long life span even under a heavy write load.  There are a few parameters that can be changed as a best practice when enterprises SSDs are being used for the L2ARC in ZFS:

Record Size

Change the Record Size to a much lower value than 128 KB.  The L2ARC fetches the full record on a read and 128 KB IO size to an SSD uses up device bandwidth increases the response time.

This can be set dynamically (for new files) with:

# zfs set recordsize <record size> <filesystem>

l2arc_write_max

Change the l2arc_write_max to a higher value.  Most SSDs specify a device life in terms of full device write cycles per day.  For instance, say you have 700 GBs of SSDs that support 10 device cycles per day for 5 years.  This equates to a max write rate of 7000 GBs/day or 83 MB/s.  As the setting is the maximum write rate, I would suggest at least doubling the speced drive max rate.  As the L2ARC is a read only cache that can abruptly fail without impacting the file system availability, the risk of too high of a write rate is only that of wearing out the drive ealier.  This throttle was put in place as when early SSDs with unsophisticated controllers were the norm.  Early SSDs could experience significant performance problems during writes that would limit the performance of the reads the cache was meant to accelerate.  Modern enterprise SSDs are orders of magnitude better at handling writes so this is not a major concern.

On Solaris, this parameter is set by adding the following line to the /etc/system file:

set zfs:l2arc_write_max= <maximum bytes per second>

l2arc_noprefetch

Set the l2arc_noprefetch=0.  By default this is set to one, skipping the L2ARC for prefetch reads that are used for sequential reading.  The idea here is that the disks are good at sequential so just read from them.  With PCIe SSDs readily available with multiple GB/s of bandwidth even sequential workloads can get a significant performance boost.  Changing this parameter will put the L2ARC in the prefetch read path and can make a big difference for workloads that have a sequential component.

On Solaris, this parameter is set by adding the following line to the /etc/system file:

set zfs:l2arc_noprefetch=0
Advertisements
Categories: Architectures, Oracle, PCIe, SSD, tuning
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: