Your guide to storage (Part 1)

by [Published on 5 Sept. 2013 / Last Updated on 7 Oct. 2013]

In this series, you will learn about storage concepts that are important to understand regardless of the kinds of applications being supported.

If you would like to read the next part of this article series please go to Your guide to storage (Part 2).

Introduction

Regardless of the kind of system you’re managing, storage is a foundational and important element in the overall environment and requires special consideration to get perfect. Traditionally, storage related issues and lack of storage knowledge are often the culprit when an organization has a problem with a mission-critical system. As such, having a reasonable understanding about storage – its concepts and how it operates – is important, particularly as systems become more centralized. Today, the walls between various IT resource domains – servers, networking, and storage – are blurring or coming down completely, requiring that all infrastructure staff

In this series, you will learn about storage concepts that are important to understand regardless of the kinds of applications being supported. Some concepts are older, but still relevant, while others are newer and increasingly important.

Input/Output Operations Per Second (IOPS)

One of the most widely used storage performance metrics is used to measure the number of Input/Output Operations per Second (IOPS) that can take place. In short, the resulting figures indicate how many read and write operations can be performed by the underlying storage.

Beware IOPS comparisons

In order to create the appearance of performance favorability, some storage vendors provide public IOPS figures that announce the number of IOPS that can be expected from the device. It would seem that simply comparing IOPS figures between vendors and products would be the perfect way to determine which system is fastest. Unfortunately, it’s not quite that simple. There are many factors that go into determining the number of IOPS that a system is capable of supporting. Among the factors:

  • The block size of the underlying storage, although this factor is becoming a bit convoluted as storage vendors virtualize the underlying storage, which can present a different block size to the operating system than is used on the physical storage.
  • The kind of drive in use. A traditional hard drive provides fewer IOPS than modern solid state disks. The difference is massive… as in orders of magnitude. In general, you’ll get anywhere from about 70 IOPS per disk up to about 200 IOPS per disk. With solid state disks, will see thousands or even tens of thousands of IOPS per disk.
  • Disk latency. The more time it takes to complete a storage operation, the fewer IOPS that can be supported by that disk.
  • Seek time. This is the amount of time that it takes for a disk to locate the data it needs to read or to arrive at the disk location to which data should be written. The longer this process takes, the lower the IOPS value for the disk.
  • Disk rotational speed. Hard disks used in the enterprise generally run at 7200 RPM, 10,000 RPM, or 15,000 RPM. The faster the disk spins, the more I/O workload it can support. As such, disks that spin faster generally provide more IOPS.
  • Read vs. write load. Disks perform different when they’re reading data as opposed to when they’re writing it. The total possible IOPS from a disk is highly dependent on how the disk is being used.
  • RAID level. For those organizations using RAID, it should be understood that different RAID levels introduce different performance penalties as far as IOPS is concerned. For example, a RAID 6 set of disks imposes a 6x IOPS penalty. What this means is that every time an I/O operation takes place, behind the scenes, it actually takes six operations to satisfy the request.

If you do find it necessary to compare IOPS, see if you can get information about the conditions under which particular tests were performed.

IOPS calculations

To get a very rough raw IOPS calculation for a single disk, you need to know three values for the disk: The average latency for the disk, and the average seek times – both read and write. From there use the following formula to determine the average IOPS value for the disk:

(1 / (average latency in ms + average seek time in ms)

Let’s use this disk from Seagate as an example. Specifically, note that the 600GB disk carries the following specifications:

  • Rotational speed: 15,000 RPM
  • Average latency: 2 ms (0.002 ms)
  • Average seek time (read): 3.4 ms (0.0034 s)
  • Average seek time (write): 3.9 ms (0.0039 s)
  • Average seek time: 3.65 ms (0.00365 ms)

Formula: 1 / (0.002 + 0.00365) = 177 IOPS

So, in perfect conditions, this disk would provide about 177 raw IOPS. Bear in mind though that this is a very raw figure and doesn’t take any number of other conditions into consideration, including the aforementioned RAID penalty, different block sizes and more. But, if comparing raw performance between individual disks, it’s useful.

Enterprise storage features

Home storage and enterprise storage are very different animals. As one may expect, the differences between the two kinds of storage are also reflected in the features that are supported by each. Whereas home-based storage typically focuses on single drives, enterprise-class storage focuses on full arrays or, at least, multiple disks pooled together for various purposes.

Deduplication

Deduplication is a storage feature that used to be a bit more difficult to obtain than it is today. In the “old days” the feature was limited to high end providers and was an additional cost option. In addition, because some array processors were booked solid with ongoing storage work, deduplication couldn’t even be run since there was no available power.

Today, that’s all changed for the better. Modern processors have imbued storage processors with cycles to spare and storage vendors have wasted no time beefing up their deduplication technologies to leverage this new processing opportunity. In addition, storage startups have eschewed the add-on nature of deduplication and many now include it as a part of the base storage array.

There are two types of deduplication technology available:

  • Inline deduplication. As the storage data makes its way through the array, the array compares each and every block of transmitted data and compares it with what’s already stored on the array. If the block matches something already stored, the array discards the newly transmitted block and, in its place, writes a pointer to the block that already exists on the system. This storage method allows organizations to get a lot more storage, particularly when there could be a lot of commonality of data. Consider VDI, for example; in VDI environments, deduplication can have major benefits since most of the workloads are practically identical.
  • Post-process deduplication. Under this deduplication technique, data is written to disk as normal, even if it’s a duplicate of something already on the array. At determined intervals, the storage system scans all newly written blocks to look fir blocks that match blocks of data already existing on the array. Windows Server 2012 uses a post-process deduplication technique.

Inline processing is generally preferred whenever it can be used.

Storage tiering

A common method by which data is stored on different kinds of drives to meet performance targets. In a tiering scenario – which can be manual or automated – storage buyers might buy, for example, a shelf of large capacity, low performance SATA disks for file storage purposes and to meet other needs in which these characteristics apply. In addition, said buyer may buy a shelf each of SAS disks and SSDs to provide a higher performance tier and a blazing performance tier, respectively.

As mentioned, storage tiering can be a manual process, but handling this manually requires a lot of overhead. As such, in enterprise arrays, storage tiering is often automated via a system that tracks hot and cold data and moves it to appropriate storage based on pre-defined rules.

Summary

This concludes part 1. In part 2, we’ll discuss RAID and storage performance metrics, among other topics.

RAID

Latency

Latency is the time it takes for an entire storage operation to take place. High latency values – above 20 to 30 ms – can create problems for workloads running on the storage. There are many opportunities for latency to be introduced into the storage equation, such as:

  • The time is takes for the hypervisor to process a storage command
  • The time is takes to transfer the command over the storage link
  • The time it takes disks to spin around to the location on the disk
  • The time it takes for the system to read or write the data from or to the disk
  • The time is takes to transfer the data over the storage link back to the host

If you would like to read the next part of this article series please go to Your guide to storage (Part 2).

Featured Links