This week in TMS’s booth (booth #258) at VMworld we have a joint demo with our partner Datacore that shows an interesting combination of VMware, SSDs, and storage virtualization. We are using Datacore’s SANsymphony -V software to create an environment with the RamSan-70 as tier 1 storage and SATA disks as tier 2. The SANsymphony-V software handles the tiering, high-availability mirroring, snapshots, replication, and other storage virtualization features
Iometer is running on four virtual machines within a server and handling just north of 140,000 4 KB read IOPS. A screen shot from Iometer on the master manager is show below:
Running 140,000 IOPS is a healthy workload, but the real benefit of this configuration is its simplicity. It uses just two 1U servers and hits all of the requirements for a targeted VMware deployment. Much of the time, RamSans are deployed in support of a key database application where exceptionally high performance shared SSD capacity is the driving requirement. RamSan systems implement a highly parallel hardware design to achieve an extremely high performance level at exceptionally low latency. This is an ideal solution for a critical database environment where the database has all of the tools integrated that are normally “outsourced” to a SAN array (such as clustering, replication, snapshots, backup, etc.). However, in a VMware environment many physical and virtual servers are leveraging the SAN, so pushing the data management to each application is impractical.
Caching vs. Tiering
One of the key use cases of SSDs in VMware environments is automatically accelerating the most accessed data as new VMs are brought online, grow over time, and retire. The benefit of a flexible virtual infrastructure makes seamless automatic access to SSD capacity more important. There are two accepted approaches to properly integrating an SSD in a virtual environment; I’ll call them caching and tiering. Although, similar on the surface, there are some important distinctions.
In a caching approach, the data remains in place (in its primary location) and a cached copy is propagated to SSD. This setup is best suited to heavily accessed read data because write-back caches break all of the data management storage features running on the storage behind it (Woody Hutsell discusses this in more depth in this article). This approach is effective for frequently accessed static data, but it is not ideal for frequently changing data.
In tiering, the actual location of the data moves from one type of persistent storage to another. In the read-only caching case it is possible to create a transparent storage cache layer that is managed outside of the virtualized layer, but when tiering with the SSD, tiering and storage virtualization need to be managed together.
SSDs have solved the boot-storm startup issues that plague many virtual environments, but VMware’s recent license model updates sparked increased interest in other SSD use cases. With VMware moving to a memory-based licensing model there is interest in using SSDs to accelerate VMs with a smaller memory footprint. In a tiering model, if VMDK are created on LUNs that leverages SSDs, the virtual guest will automatically move the internal paging mechanisms within the VM to low latency SSDs. Paging is write-heavy, so the tiering model is important to ensure that the page files are leveraging SSD as they are modified(and that the less active storage doesn’t use the SSD).
We are showing this full setup at our booth (#258) at VMworld. If you are attending I would be happy to show you the setup.
Yesterday TMS announced our latest PCIe Flash offering, the RamSan-70. Prior to the launch, I had the chance to brief a number of analysts and to explain the key deployments we are tackling with this offering. One thing I discovered was that there is a lot of confusion about what PCIe SSDs bring to the table and what they don’t. So with that in mind, I present the following primer on fit of PCIe SSDs.
How would you describe a PCIe SSD?
A PCIe SSD is like Direct-attached storage (DAS) on steroids.
Doesn’t being on the PCIe bus increase performance by being as close to the CPU as possible?
Yes, but nowhere near the degree it is promoted. Going through a HBA to FC attached RamSan adds about 10 µs of latency –that’s it. The reason that accessing SSDs through most SAN systems take 1-2 ms is because of the software stack in the SAN head – not because of the PCIe to FC conversion. For our customers the decision to go with a PCIe RamSan-70 for a FC/IB attached RamSan-630 comes down to whether the architecture needs to share storage.
Are you working on a way to make the PCIe card sharable
No, we have shared systems. If the architecture needs the shared storage, use our shared storage systems.
So why is PCIe making such a splash? Isn’t the DAS vs SAN argument over with SAN rising triumphant?
Well, the argument was over until two things happened: servers started to get really cheap, and really, really big clusters started getting deployed. In a shared storage model, a big core network is needed so each server can access the storage at a reasonable rate. This is one of the main reasons a dedicated high performance Storage Area Network is used for the server to storage network. However, after there are more than a few dozen servers, the network starts to become rather large. Now imagine if you want to have tens of thousands of servers, the network becomes the dominant cost (see the aside in my post on SSDs and the Cloud for more details). In these very large clusters the use of a network attached shared storage model becomes impractical.
A new computing model developed for these environments – a shared nothing scale-out cluster. The basic idea is that each computer processes a part of the data that is stored locally, many nodes do this in parallel, and then an aggregation step compiles the results. This way all of the heavy data to CPU movement takes place within a single server and only the results are compiled across the network. This is the foundation of Hadoop as well as several data warehouse appliances. In effect, rather than virtualized servers, a big network, and virtualized storage via a SAN or NAS array; the servers and storage are virtualized in a single step using hardware that has CPU resources and Direct-Attached Storage.
PCIe SSDs are important for this compute framework because reasonably priced servers are really quite powerful and can leverage quite a bit of storage performance. With the RamSan-70 each PCIe slot can provide 2 GB/s of throughput while fitting directly inside the server. This much local performance allows building high performance nodes for a scale-out shared-nothing cluster that balances the CPU and storage resources. Otherwise, a large number of disks would be needed for each node or the nodes would have to scale to a lower CPU power than is readily available from mainstream servers. Both of these other options have negative power and space qualities that make them less desirable.
The rise of SSDs has provided a quantum leap in storage price-performance at a reasonable cost for capacity as new compute frameworks are moving into mainstream applications. Both of these developments are still being digested by the IT community. You can see the big vendors jostling for dominance to control the new platforms that are used to build the datacenters of the future. At TMS we see the need for “DAS on steroids” in the new frameworks and are leveraging our hardware engineering expertise to make the best DAS solution.
From time to time I become involved in discussions about where SSDs are making an impact in the consumer market and what I think is going to happen. The biggest knock that I hear about SSDs making serious inroads is: Consumers buy computers based on specs, most just won’t accept a computer with less storage at the same price as one that has more. The details on the SSD benefits are lost on this mainstream market and disks can maintain a price per GB advantage for a long time in the future.
I attended a marketing presentation by David Kenyon from AMD recently and he pointed out something about computer marketing trends that I found insightful. The use of specs as a computer differentiator is becoming less prominent. The look and feel and the fitness for a particular use case are becoming more important selling points. The reduced prominence of specs started when the clock rates of CPUs stopped being promoted and instead the family name and model number were used. If you look at Apple products, it is hard to even find the specs until after you have selected the make you want and are trying to decide on a model.
Part of the shift towards use case based computing is a fragmentation of computing resources into multiple devices. People have many devices – laptops, work desktops, home desktops, tablets, and smart phones. Having devices that are accessible and convenient for a particular use is a wonderful thing. But there is one major headache that comes with this – having access to your data from a particular device that you would like to use is a pain. A Kindle is great to dive into a book on a quiet afternoon, but it is relatively inconvenient to take with you all the time. Being able to pull out a smartphone in a waiting room and pick up reading where you had left off is what people want. Multiple device shared access has already happened with email and it is just a matter of time until the rest of private data goes the same route. The access to data without physical device dependence is what cloud storage is all about.
So what does this have to do with SSDs? Besides the lower prominence of specs, using more and more devices makes it clear that having a bunch of storage on any particular device just isn’t valuable. The data needs to be accessible from the other devices. Consumers are not going the go through the hard work of setting up data synchronization though. They will eventually pay to have it done for them, by whoever wins a monopoly over access to user’s data – Microsoft, Google, Facebook, or someone new. Soon, their data is going to end up in a datacenter somewhere that all of the devices can access. There may be a full copy of everything on the computer at home, but even this could fall by the wayside. In this environment, having a disk in any of the devices is just crazy. This is simply because at low capacities, SSDs are cheaper than disks! They are also higher performance, lower power, and have a malleable form factor.
There has to be a pretty robust high-speed network available almost everywhere for this to work, but that is clearly not that far off. Once the network is in place the service offerings and vendors will coalesce to develop a clear standard and price model. At that point consumer disks will move to the datacenter. This may sound like too much complexity to occur quickly, but the benefits that come from easy access to your data and the profits that will be bestowed on the vendor that becomes the gatekeeper are just too great to prevent it from happening.
If this framework develops, the total disk capacity will grow more slowly as the efficiencies that have developed in the enterprise storage arena are brought to bear – just in time provisioning, deduplication, and compression. (Just imagine how much unused capacity is isolated on all of the disks in the consumer computers today.) In the not too distant future, having a disk in your computing device will be the exception to the norm.
An aside on cloud computing frameworks, data, and network bandwidth
The biggest issue with cloud storage is the network bandwidth. I don’t mean to suggest network bandwidth needs to be high enough to use cloud storage remotely – that may never happen. Today, the data in successful cloud services is being fragmented by application. Keeping the data and the compute resources close gives a big benefit in reducing the network traffic needed for processing. The drawback is that data behind the scenes is handled differently by each service provider and managing credentials for each separate service is difficult. In effect it is creating a data management nightmare for the user. This fragmentation is bad for users – they want an easy way to access and control all of the data that belongs to them.
The efficiency of having the data near the compute resource is huge, but there is no real reason that the data has to fragment and move to the service providers. With the proper cloud computing framework, the applications could just as easily move to the data’s location and run in the same datacenter. This would provide the same benefit but make it easy for the user to see and mange the data that belongs to them. I don’t see an easy way to separate cloud storage from cloud computing – but at the end of the day the data is what everything else depends on and frameworks have to account for this.