Most large and medium sized businesses today rely heavily on their computer networks. Also, many businesses have mission critical applications. As a result, many businesses require access to many parts of their network at all times. If there is a failure on a piece of equipment, mission critical applications may not function. The business user will not care, they still need their application. Likewise, the equipment has to work as expected. So a network must protect against malicious activity to provide high assurance.
I have divided this article into two sections to discuss the two different aspects of developing a high assurance network. The first section will deal with security. The second section will discuss high availability clusters, a popular method of overcoming hardware failures which also acts as a backup to any security features which may have failed.
A Multiple Independent Levels of Security (MILS) architecture is a powerful technique to secure a system running multiple applications, with multiple accesses. The basic idea of a MILS architecture is that the system is partitioned so that the failure or corruption of one partition cannot effect another. This partitioning also allows each partition to be security-evaluated and certified separately.
A key component of the MILS architecture is the MILS kernel. Traditional kernels are intended to provide applications with as many services as possible while MILS kernels do only four things:
- Data isolation
- Control information flow
- Periods processing
- Damage limiting
Data isolation means that the kernel will guarantee that an application can only access the memory which has been specifically, and explicitly, allocated for its use. Having this policy enforced will greatly improve the security of a system for two major reasons; data compromise and tamper. Data compromise is the unwanted reading of data and data tampering is the unwanted writing of data. Obviously, if the application can only access limited amounts of data, the threats of compromise or tamper are greatly reduced.
Control information flow
Control of information flow means that the kernel will guarantee that applications can only communicate with each other via approved paths. Proper enforcement of this policy means that applications cannot bypass a path and access data which it is not allowed to access.
Periods processing is a bit of an insurance policy which will eliminate the possibility of covert communication paths being used on a MILS system. It does this by separating processing into periods, between these periods the processor is scrubbed. The end result means that if a processor has been compromised and a covert communication path is set up, it will soon be erased and unusable.
Like periods processing, damage limiting is also an insurance policy. Damage limiting means that all failures are contained locally and recovered locally. Or, quite simply, a failure on one component cannot cause a failure on another. Proper enforcement of this policy will eliminate the possibility of a cascade failure.
Another key aspect of the MILS kernel is that it can be written in a small amount of code, only a few thousand lines. This small amount of code means that it is quite feasible for medium sized businesses to afford to rigorously check the code and mathematically prove that these policies will always be enforced. Mathematical proof can be very time consuming and therefore quite expensive, but because of the small code size this is reasonable. This proof only has to be done once, and will obviously pay off in the improved security for many networks.
Partition Communication System
This method, as I've explained it above, is designed for single processors. But, when an application must communicate over a network you would still like to have the application as secure as if it were operating on just one processor. A MILS architecture can support this. This is accomplished by end-to-end enforcement of the MILS kernel's policies. A Partition Communication System, or PCS, is what will enforce this policy.
A distributed collection of MILS nodes, often called an enclave, will use a PCS. The PCS will be with every node and will be logically between the applications and the other partitions. The PCS will implement all network protocols and is responsible for the following:
- Identification of nodes: protection against spoofing
- Data separation
- Consistent policy management
High availability clusters
So, all of this is fine, but we all know that things can, and usually do, go wrong. Even if our network is well protected against malicious activity, hardware failures or application bugs can easily bring down a network node. So how do you make sure that the business users have their applications and services available when things go wrong? This can be done with high availability clusters. Basically, this means that some nodes on the network will be redundant. This improves availability because software can detect failure of one node and automatically start using the back-up node.
Of course, not all applications can be used in a high availability cluster. To be used on a high availability cluster, applications must follow the following design requirements:
- There must be an easy way for a failure detection application to start, stop, and to check the status of the application. If software cannot do this, and a human is required, then the automated fail-over cannot work.
- The application must be able to use shared storage.
- The application must save all information about its state on this shared storage.
- The application must not corrupt data when it crashes.
Let us assume your application can be used in a high availability cluster; what does this cluster look like? Well, there are several implementations of this idea. A simplistic representation of a high availability network is shown in Figure 1. You can see the heartbeat connection between the two nodes and the shared data storage, both key features of any high availability network.
Figure 1: High Availability Network diagram (Courtesy of www.dell.com)
One possible implementation is the active/active configuration. In this configuration, two or more nodes are actively running, with the traffic directed to the nodes based on load balancing scenarios. If one node were to fail, there is still at least one more node that will run with more of the load.
In an active/passive configuration there is a fully redundant node for each active node. The redundant nodes are only brought online when their corresponding active node fails. Figure 1, could represent an active/passive or an active/active high availability configuration.
N + 1
The N+1 configuration has one node responsible for several active nodes. This one node must be capable of assuming any of the roles the active nodes are responsible for.
N + M
The N+M configuration is very similar to the N+1 configuration but is more useful for large networks where one back-up node is not sufficient to provide high availability. In the N+M configuration, there are several (M) back-up nodes, each able to assume the role of any of the active nodes.
N to N
The N to N configuration is a combination of active/active and N + M configurations. In this configuration, each node is active but is capable of re-configuring themselves to take on any of the responsibilities of the other nodes. This configuration will eliminate the need for back-up nodes but will require additional space on each active node.
Whichever configuration is used, a well engineered high availability cluster along with strong protection against malicious activity can give your business users a network with consistent availability. For more information about how to protect your network against malicious activity do not forget to check out WindowSecurity.com.