Server Hardware Explained (Part 4)

by [Published on 17 Nov. 2011 / Last Updated on 17 Nov. 2011]

In this article, we'll take a look at server memory.

If you would like to read the other parts in this article series please go to:

Introduction

When it comes to server memory, there are two main concepts that need to be understood – parity and NUMA. I want to begin by discussing NUMA.

NUMA is an acronym standing for Non Uniform Memory Access. Even though this might initially sound a bit intimidating, the basic premise of NUMA is actually quite simple.

Over the years I have read countless books and articles stating that memory is the fastest component in any computer. However, that statement hasn’t held true for supercomputers in quite some time and is no longer true for most network servers either. Nowadays, a server’s CPU tends to be quite a bit faster than its memory.

The difference between the CPU speed and the memory speed is something of a problem when it comes to the server’s efficiency. Most CPU operations involve executing instructions against data. Of course the data itself resides in the memory. This means that no matter how fast the CPU is, it can only work as fast as it is able to retrieve the required data from memory. Granted, sometimes data needs to be read from disk or other storage mechanisms, but such data must be committed to memory before it can be acted upon.

My point is that computer manufacturers began to realize that even though processors were becoming faster and faster, much of the speed was wasted because the processor had to spend so much time waiting on the server’s memory.

This problem would be bad enough if a server only had a single processor, but modern servers include multiple CPU cores. Often times there are multiple physical processors, each of which contains multiple cores. As such, a single server can have anywhere from four to well over a dozen CPU cores (The largest number of cores that I have seen in a single server is 32). Each core acts as a logical CPU.

The reason why equipping a server with multiple cores and / or processors is such a problem is because a memory location can only be accessed by one processor at a time. This means that not only might a CPU spend time waiting on the memory, it might even have to stand in line behind the other CPUs before it even gets a chance to try to access the system’s memory.

One of the ways in which server manufacturers have attempted to reduce the amount of “hurry up and wait” that a CPU experiences is by introducing NUMA memory. NUMA memory is separated into individual compartments which are known as NUMA nodes. That way, a separate NUMA node can be allocated to each processor.

In most cases, NUMA nodes are allocated on a per CPU bases, not a per-core basis. There is usually a one to one relationship between physical CPU sockets and NUMA nodes, regardless of how many cores exist within a physical CPU. That isn’t to say that there aren’t exceptions. I know of at least one server that allocates two NUMA nodes to a single, 12 core processor.

Dedicating a NUMA node to each of the server’s CPUs helps to improve performance in two ways.

First, the use of separate NUMA nodes helps to prevent multiple CPUs from competing for memory access. Multiple CPUs are able to access memory simultaneously because each CPU is accessing a separate NUMA node.

The other way in which NUMA improves performance is by limiting the amount of memory that is available to a single CPU. If a CPU is using a dedicated NUMA node exclusively then it is dealing with a smaller amount of memory than what the system is equipped with as a whole. This makes memory management more efficient.

Unfortunately, what I just described represents an ideal situation. In the real world, CPUs do not always use NUMA nodes exclusively. For instance, some processes might require more memory than what can be supplied by a single NUMA node, so a processor might have to use multiple NUMA nodes. Likewise, some applications might share data, which can also require a CPU to access multiple NUMA nodes. It isn’t usually a problem for a CPU to access data from a non-local NUMA node, but local requests are handled much more quickly than non-local requests as a result of the way that the underlying bus works.

Memory Parity

At the beginning of this article I mentioned that the two most important concepts to be familiar with in regard to server memory were NUMA nodes and parity. That being the case, I want to turn my attention to parity.

At its simplest, memory parity is a mechanism for detecting memory errors. If memory errors occur and are not detected then the errors can lead to data corruption or system instability.

Memory parity has been around in one form or another for many years. As such, there are several different flavors of memory parity that are sometimes used on servers. The most commonly used type of parity at the moment is probably Error Correction Code, or ECC.

As I’m sure you know, data is stored in memory in a binary format. Each individual bit can store a zero or a one. ECC memory is designed to detect single bit errors. In other words, if a bit’s value was supposed to have been zero, but was recorded as one (or visa versa) then the ECC memory should detect (and in some cases correct) the problem so long as it does not affect more than a single bit.

Although it is nice to be able to detect and correct single bit errors, ECC technologies exist that are more robust. There is an extension to ECC known as Single Device Data Correction (SDDC). SDDC is capable of detecting and correcting multiple memory faults.

SDDC works similarly to a RAID 5 array. Data that is written to memory is scattered across multiple chips, with each chip receiving a little bit of extra data. That way no data is lost in the event of a chip failure because there is enough replica data scattered across the remaining chips to compensate for the failure.

Some servers even take the process one step further by incorporating spare memory sockets. That way, extra memory can be installed into the server and that memory is only used if a failure occurs with the server’s primary memory.

One last thing that I want to quickly mention about ECC memory is that simply installing ECC memory into a server isn’t always enough to protect you against memory errors. Because ECC memory tends to be expensive, a lot of servers are designed to also accept less expensive non-ECC memory. Such servers usually require you to modify a BIOS setting prior to installing ECC memory.

Conclusion

In this article, I have introduced you to two of the more important concepts with regard to server memory. In the next article in this series, I want to turn my attention to storage.

If you would like to read the other parts in this article series please go to:

Advertisement

Featured Links