If you follow storage trends at all, you will know that vendors are yelling “All Flash Storage”, and promising millions of IOPs ( Input/Output Operations per second ), and fantastic data throughput speeds. As always, there are at least two sides to every story, and I would like to give you some information to think about, before you buy the shiny new boxes.
First let me define “All Flash Storage” in simple terms. It is just like hard disk storage, but instead of hard drives, you install SSD drives. It’s that simple…just a box, simple or sophisticated, that is full of SSD drives.
Second, let me tell you that these vendors are essentially telling the truth. If you want the fastest storage, then purchase one of their All Flash Storage solutions right now. And now you might think this blog ends. Well it does not, because most of us have budget limitations, and all-flash storage may not give you the significant performance gains you expect.
The Case for SSDs
I want you to step back from thinking about SSD’s for a moment. Think about your personal files directory, both the disk space used, and how many of those files you have touched in the last month. My personal directory on our SAN is 96.2 GB. I have files going back to the year 2000 (!), and I probably opened twenty of them in the last month…a few megabytes of data. Let me be generous and say I opened 200 MB in the last month. That means I did not touch 96 GB, or 99.8% of my files!
Looking at this issue of dormant data from a different angle, I looked at the ratio of changed data we backup at night; versus the disk space allocated to the same servers, and found that approximately two percent of our allocated disk space is modified on a daily basis.
Each of us will come up with different values for our own environments. However, I think it is a fair statement to say that most of our data is essentially dormant, with a fairly small percentage routinely, and perhaps more importantly, frequently accessed.
If you determine that a large percentage of your data is dormant, is it money well spent to put that data on costly SSDs?
SAN & Processing Power
There is another issue, which I want to just touch on here. The controllers in a SAN have a limited amount of processing power, and the SAN interconnects to your servers, have a limited amount of bandwidth.
Having a SAN full of SSD’s is a bit like having a 1000 HP car. There are cars that can make use of it, and they cost a million dollars plus. There are also SANs that cost a million dollars plus, and they can better utilize a lot of SSDs. However if your budget for a SAN is more modest, the controllers and interfaces will severely limit the maximum throughput achievable. The saying about the “weakest link in a chain” is appropriate.
The Alternative Solution
Fortunately there are some great hybrid solutions which recognize the limitations of SANs, and can provide greatly improved performance for a modest cost increase. These hybrid SANs typically contain a few SSD drives, a large number of high performance SAS drives, and sometimes a number of low performance SAS/SATA drives.
Each vendor implements this differently, but the basic concept is that the SSD drives act as a caching layer, the high performance SAS drives hold your less frequently accessed data, and the low performance drives hold your rarely accessed data. Often only the SSD and high performance drives are used in practice.
Does this mean you need to determine your data needs, and move your files around on these disks?
If you are looking at a SAN which requires you to do this…don’t buy it. Enterprise grade SANs have embedded software (“Tiering software”) that does the work for you. So all of your files are placed on your high performance SAS drives, providing the performance you expect.
As data is frequently accessed it is copied up to the SSD cache layer, to provide a very fast response to the next read request. Data that is not accessed is migrated to the low performance SAS/SATA drives, if this class of drives is present.
The Showdown
I really like to see apples to apples comparisons by digging into the details of the various vendors’ solutions. Although most vendors now offer Tiering solutions with their SANs, they are not all implemented the same way. You would expect that when data gets frequently accessed, it would migrate to the SSD cache layer and give the improved performance you expect. That would be an incorrect assumption unfortunately.
Some SANs, like the models we recommend, will start migrating frequently accessed data immediately…as in RIGHT NOW, and performance will ramp up over seconds and minutes. Many other SAN products will collect statistics about files you access frequently, and then at night when the SAN is not busy, they will copy them to the SSD cache layer. This is like reading yesterdays news.
I think these are terrible implementations, and reflect that they are using underpowered controllers. When a vendor says they offer a Tiered solution, ask them to explain exactly how, and when, they migrate data to the cache layer.
On That Note…
I am not a SAN genius, but myself and other technical staff at Lanworks have quite a bit of hands-on, real life experience with SANs in the SMB (Small to Medium Business) space. If you found this blog interesting, boring, or misleading, please feel free to leave a comment, or contact us for further information. We would love to have the chance to talk about this topic further.