Appendix C

RAID

Redundant Arrays of Inexpensive Disks (RAID) is a setup that uses multiple disk drives, special disk controllers, and software to increase the safety of your data and the performance of your disk subsystem.

RAID protects your data by spreading it among multiple disk drives, and then calculating and storing parity information. This redundancy allows any drive to fail without causing the array itself to lose data. When a failed drive is replaced, its contents can be reconstructed from the information on the remaining drives in the array.

RAID increases disk subsystem performance by distributing read tasks among several drives, allowing the same data to be retrieved from different locations, depending which happens to be closest at hand when the data is requested.

Different levels of RAID exist, each of which is optimized for certain types of data and storage requirements. RAID can be implemented in hardware or as add-on software. Modern NOSs like Novell NetWare and Microsoft Windows NT Server provide native support for one or more RAID levels.

The various component parts of RAID technology were developed originally for mainframes and minicomputers, and were until recently limited by high cost to those environments. In the past few years, though, RAID has become widely available in the PC LAN environment. The cost of disk drives has plummeted. RAID controllers have become, if not mass market items, at least reasonably priced. Cost-based objections to RAID have just about disappeared. Your server deserves to have RAID, and you shouldn't consider building a server that doesn't.

In this chapter you learn the following:

RAID Levels

Although various component parts of RAID have been used in the mainframe and minicomputer arenas for years, the RAID model was properly defined in a white paper published in 1987 by the University of California at Berkeley. This paper sets the theoretical framework upon which subsequent RAID implementations have been built. It defines five levels of RAID, numbered 1 through 5. These RAID levels are not indicative of the degree of data safety or increased performance; they simply define how the data is divided and stored on the disk drives comprising the array, and how and where parity information is calculated and stored. A higher level number is not necessarily better.

Disk drives really do only two thingsóthey write data and they read it. Depending upon the application, the disk subsystem might be called upon to do frequent small reads and writes, or it might need to do less frequent but longer reads and writes. An application server running a client/server database, for example, tends toward frequent small reads and writes, while a server providing access to stored images tends toward infrequent large reads and writes. The various RAID levels differ in their optimization for small reads, large reads, small writes and large writes. Although most servers have a mix of these, choosing the RAID level optimized for the predominant computing tasks in your environment should maximize the performance of your disk subsystem.

The various RAID levels are optimized for varying data storage requirements, in terms of redundancy levels and performance issues. Different RAID levels store data bit-wise, byte-wise or sector-wise over the array of disks. Similarly, parity information may be distributed across the array, or may be contained on just one physical disk drive. RAID levels 1 and 5 are very common in PC LAN environmentsóall hardware and software RAID implementations provide at least these two levels. RAID level 3 is used occasionally in specialized applications, and is supported by most hardwareóand some softwareóRAID implementations. RAID levels 2 and 4 are seldom, if ever, used in PC LAN environments, but some hardware RAID implementations offer these levels.

Although RAID actually has only levels 1 through 5 defined, you commonly see references to RAID 0, RAID 0/1, RAID 6, RAID 7, and RAID 10, all of which are de facto extensions of the original RAID specification. These usages have become so common that they now are universally accepted. Because RAID is a model or theoretical framework, rather than a defined protocol or implementation, manufacturers continue to market improved RAID technology with arbitrarily assigned RAID levels.

The RAID Advisory Board (RAB)

The RAID Advisory Board (RAB) is a consortium of manufacturers of RAID equipment, as well as some other interested parties. RAB is responsible for developing and maintaining RAID standards, and has formal programs covering education, standardization and certification. Supporting these programs are six committees, including Functional Test, Performance Test, RAID-Ready Drive, Host Interface, RAID Enclosure, and Education. RAB sells several documents, the most popular of which is RAIDbook, first published in 1993. This publication covers the fundamentals of RAID and defines each RAID level; it is a worthwhile acquisition for those who want to learn more about RAID.

The RAB Certification Program awards logos to equipment that passes its compatibility and performance testing suites. The RAB Conformance Logo certifies that the component bearing the logo complies with the named RAID level designation, as published by RAB. The RAB Gold Certificate Logo certifies that a product meets both the functional and performance specifications published by RAB.

For further information about RAB and its programs, you can contact Joe Molina, RAB Chairman, in one of the following ways:

ï RAID Advisory Board
Technology Forums LTD
13 Marie Lane
St. Peter, MN 56082-9423
Phone: (507) 931-0967
Fax: (507) 931-0976
E-Mail: 0004706032@mcimail.com
Web: http://www.andataco.com/rab/

RAID 0

RAID 0, illustrated in figure C.1, is a high-performance, zero-redundancy array option. RAID 0 is not technically RAID at all. It stripes blocks of data across multiple disk drives to increase the throughput of the disk subsystem, but offers no redundancy. If one drive fails in a RAID 0 array, then the data on all drives on the array is inaccessible. RAID 0 is a sports car with a powerful engine, but bald tires and no brakesósome would say there's no steering wheel, either.

Figure C.1 RAID 0 uses sector striping to increase performance.

Nevertheless, there is a place for RAID 0. Understanding RAID 0 is important because the same mechanism used in RAID 0 is used to increase performance in other RAID levels. RAID 0 is inexpensive to implement, for two reasons. First, no disk space is used to store parity information, eliminating the need to buy larger disk drivesóor more disk drivesófor a given amount of storage. Second, the algorithms used by RAID 0 are simple ones that do not add much overhead or require a dedicated processor. RAID 0 offers high performance on reads and writes of both short and long data elements. If your application requires large amounts of fast disk storage, and if you have made other provisions for backing up this data to your satisfaction, then RAID 0 is worth considering.

RAID 0 uses striping to store data. This means that data blocks are written in turn to the various different physical disk drives that make up the logical volume represented by the array. For instance, your RAID 0 array might comprise three physical disk drives, which are visible to the operating system as one logical volume. Let's say that your block size is 8K and that a 32K file is to be written to disk. With RAID 0, the first 8K block might be written to physical drive 1, the second block to drive 2, the third to drive 3, and the fourth 8K block to drive 1. Your single 32K file is thus stored as four separate blocks residing on three separate physical hard disk drives.

This introduces two parameters used to quantify a RAID 0 array. The size of the block usedó8K in our exampleóis referred to as the chunk size. The chunk size determines how much data is written to a disk drive in each operation. The number of physical hard disk drives comprising the array determines the stripe width. Both chunk size and stripe width impact the performance of a RAID 0 array.

When a logical read request is made to the RAID 0 array, and fulfillment requires an amount of data larger than the chunk size to be retrieved, this request is broken down into multiple smaller physical read requests, each of which is directed to and serviced by the individual physical drives upon which the multiple blocks are stored. Although these multiple read requests are generated serially, doing so takes very little time. The bulk of the time needed to fulfill the read request is used to transfer the data itself. With sequential reads, which involve less drive head seeking, bottlenecks can occur due to the internal transfer rate of the drives themselves. With striping, the transfer activity occurs in parallel on the individual disk drives that make up the array, so the elapsed time until the read request is completely fulfilled is greatly reduced.

Striping does not come without some cost in processing overhead, and this is where chunk size impacts performance. Against the benefit of having multiple spindles at work to service a single logical read request, you must weigh the overhead processing cost required to write and then read this data from many disks rather than from just one. Each SCSI disk access requires numerous SCSI commands to be generated and then executed, and striping the data across several physical drives multiplies the effort required. Reducing the block size too far can cause the performance benefits of using multiple spindles to be swamped by the increased time needed to generate and execute additional SCSI commands. You can, by using too small a block size, actually decrease performance. The break-even point is determined by your SCSI host adapter and the characteristics of the SCSI hard disk drives themselves; generally speaking, a block size smaller than 8K risks performance degradation. Using block sizes of 16K, 32K, or larger offers correspondingly greater performance benefits.

Sequential reads and writes make up a small percentage of total disk activity on a typical server disk subsystem. Most disk accesses are random, and by definition this means that the heads probably need to move to retrieve a particular block of data. Since head positioning is a physical process, it is, relatively speaking, very slow. The benefit of striping in allowing parallel data transfer from multiple spindles is much less significant in random-access situations, since everything is awaiting relatively slow head positioning to occur. Therefore, striping does little to benefit any particular random-access disk transaction. Strangely, however, it does benefit random-access disk throughput as a whole.

Here's why: Imagine a scene at your local hardware store. There is only one checkout line, and the owner is considering opening more. The single existing checkout line works well when the store is not busy, but at peak times customers have to stand in line too long. Some customers pay cash, and others use credit cards. The owner opens four additional checkout lines, but he decides to dedicate particular lines to particular itemsóone line for garden supplies, one for paint, one for tools, and so on. He notices that although this scheme does reduce the average wait, there are times when one checkout line has people backed up while other lines are free. His next step is to allow any of the five lines to process any type of item. He immediately notices a big drop in average wait time, and is satisfied with this arrangement until he notices that the backup hasn't completely disappeared. Because some individual transactions take longer than others, any given line can unpredictably move more slowly than others, leaving customers standing in line while other checkout lines are free. His final modification is to install a serpentine queue prior to the checkout lines, thereby allowing each customer in turn to use whichever checkout line becomes free first.

In this example, the checkout lines are analogous to the physical hard drives in the array, and the customers represent disk transactionsócash payments are like disk reads, and credit card payments are like disk writes. Just as a checkout clerk can ring up only so many items in a given amount of time, even a very fast hard drive is limited in the number of disk transactions per second it can execute. Just as many people can suddenly show up at a checkout line almost simultaneously, a server can generate many more disk requests in a short period of time than the disk can process. Because server requests tend to be burstyómany requests occurring nearly simultaneously, followed by a period with few or no requestsóthe disk subsystem must buffer or queue outstanding requests at times of peak demand, and then process these requests as demand slackens.

Because striping distributes the single logical volume's data across several physical drives, each of which can process disk transactions independently of the other drives, it provides the equivalent of additional checkout lines dedicated to particular products. Requests are routed to the physical drive that contains the data needed, thereby dividing a single long queue into two or more shorter queues, depending on the number of drives in the array (stripe width). Because each drive has its own spindle and head mechanism, these requests are processed in parallel, shortening the average time required to fulfill a disk request.

In most servers with a disk subsystem bottleneck, the problem is an unequal distribution of workload among the physical disk drives. It is not uncommon to see servers with several physical drives in which 90 percent or more of the total disk activity is confined to just one of the drives. RAID 0 addresses this problem through striping by distributing the workload evenly and eliminating any single drive as a bottleneck. RAID 0 improves both read and write performance for random, small-block I/O as well as sequential, large-block I/O.

What RAID 0 doesn't do is protect your data. There is no redundancy, and the loss of any single drive in a RAID 0 array renders the contents of the remaining partitions, which comprise the RAID 0 array inaccessible for all practical purposes. Because both Windows NT Server native software RAID 0 and every hardware RAID 0 implementation that I am familiar with refuse to allow you to access the remaining partitions directly, the only alternative method available to access the data on the remaining partitions is to use a low-level sector editor similar to the Norton Utilities DiskEdit. Because each file's data is distributed in chunks across two or more physical disk drives, the chances of salvaging usable files with this method is very small.

RAID 1

What do you do to make sure you don't lose something? The obvious answer is to make a copy of it. RAID 1, illustrated in figure C.2, works this way, making two complete copies of everything to mirrored, or duplexed, pairs of disk drives. This 100-percent redundancy means that if you lose a drive in a RAID 1 array, you have another drive that contains an exact duplicate of the failed drive's contents. This offers the greatest level of redundancy, but requires the highest expenditure on drives.

Figure C.2 RAID 1 uses mirroring or duplexing to increase data safety.

Mirroring means that each disk drive has a twin. Anything written to one drive is written to the second drive simultaneously. Mirroring is 100-percent duplication of your drives. If one drive fails, its twin can replace it without loss of data. Mirroring has two disadvantages. First and most obvious is that you must purchase twice as many disk drives to yield a given amount of storage. Second is that the process of writing to both drives, and maintaining coherency of their contents, introduces overhead that slows writes. The two advantages of mirroring are that your data is safely duplicated on two physical devices, making catastrophic data loss much less likely, and that read performance is greatly increased because reads can be made by the drive that has a head closest to the requested data.

Duplexing is similar to mirroring, but adds a second host adapter to control the second drive or set of drives. The only disadvantage of duplexing relative to mirroring is the cost of the second host adapter. Duplexing eliminates the host adapter being a single point of failure.

RAID 1 is the most common level used in mainframes, where cost has always been a low priority relative to data safety. The rapidly dropping cost of disk storage has made RAID 1 a popular choice in PC LAN servers, also. Conventional wisdom says that RAID 1 is the most expensive RAID implementation due to the requirement for purchasing twice as many disk drives. In reality, RAID 1 might be the most expensive way to implement RAID, but it also might be the least expensive, depending on your environment.

In a large server environment, the cost of duplicating every disk drive quickly adds up, making RAID 1 very expensive. With smaller servers, though, the economics can be very different. If your server has only one SCSI hard disk drive installed, you might find that you can implement RAID 1 for the relatively small cost of buying another similar disk drive.

RAID 1 is provided as a standard software feature with most NOSs. Even if your NOS doesn't offer RAID 1, you might find that your SCSI host adapter doesóalthough it might be called something else. If the host adapter manual doesn't mention RAID 1, check for references to hardware support for mirroring or duplexing. If you find that your SCSI adapter does support hardware mirroring, you have what you need to implement RAID 1 in hardware. Simply install another drive similar to the existing one (or identical, depending on the host adapter requirements), reconfigure according to the directions in the manual, and you're running RAID 1. If you have a choice between using either the NOS native RAID 1 support or the support provided by your SCSI host adapter, choose the hardware solution every time. It should offer better performance and not put any additional load on the server. Now, let's look at RAID 1 in more detail.

RAID 1 reads usually are faster than those of a stand-alone drive. Returning to the hardware store analogy, we now have multiple checkout lines, each of which can handle any customer. With RAID 1 reads, any given block of data can be read from either drive, thereby shortening queues, lowering drive utilization, and increasing read performance. This increase occurs only with multi-threaded reads. Single-threaded reads show no performance difference, just as no gain is realized in the hardware store when all but one of the checkout lines are closed.

Most RAID 1 implementations offer two alternative methods for optimizing read performance. The first is referred to as circular queue or round-robin scheduling. Using this method, read requests are simply alternated between the two physical drives, with each drive serving every other read request. This method equalizes the read w edóare being figessed frequently. It is less appropriate for sequential-access environments where large amounts of data are being retrieved. Most disk drives have buffers used to provide read-ahead optimization, where the drive hardware itself reads and stores whatever data immediately follows a requested block, on the assumption that this data is most likely to be requested next. Alternating small-block requests between two physical drives can eliminate the benefit of such read-ahead buffering.

The second method used in RAID 1 to increase read performance is called geometric, regional, or assigned cylinder scheduling. This method depends on the fact that head positioning is by far the slowest activity a disk drive does. By giving each of the two drives comprising the RAID 1 array responsibility for covering only half of the physical drive, this head positioning time can be minimized. For example, using mirrored drives where each has 1024 cylinders, the first drive might be assigned responsibility for fulfilling all requests for data that is stored on cylinders 0 through 511, with the second drive covering cylinders 512 through 1023.

Although this method is superficially attractive, it seldom works in practice. First, few drives have their data distributed in such a way that any specific cylinder is equally likely to be accessed. Operating system files, swap files, user applications, and other frequently read files are likely to reside near the front of the disk. In this situation, your first disk may be assigned 90 percent or more of the read requests. Second, even if the data were distributed to equalize access across the portion of the disk occupied by data, few people run their drives at full capacity, so the second drive would have correspondingly less to do. This problem could be addressed by allowing a user-defined split ratio, perhaps assigning disk 1 to cover the first 10 to 20 percent of the physical drive area, and disk 2 to cover the remainder. In practice, I've never seen a RAID 1 array that allows user tuning to this extent.

RAID 1 writes are more problematic. Because all data has to be written to both drives, it appears that we have a situation where the customer has to go through one checkout line and complete his transaction. He then has to go to the end of the other checkout line, wait in line again, and complete the same transaction at the other register. RAID 1 therefore provides a high level of data safety by replicating all data, and an increase in read performance by allowing either physical drive to fulfill the read request, but a low level of write performance due to the necessity of writing the same information to both drives.

It might seem that RAID 1 should have little overall impact on performance since the increase in read performance might be balanced by the decrease in write performance. In reality, this is seldom the case.

First, in most server environments, reads greatly outnumber writes. In a database, for example, any given record might be read as many as 100 times for every single time it is written. Similarly, operating system executables, application program files, and overlays are essentially read-only. Any factor that benefits read performance at the expense of write performance does greatly increase the overall performance for most servers, most of the time.

Second, although it seems reasonable to assume that writing to two separate drives would cut write performance in half, in reality the performance hit for mirrored writes is usually only 10 to 20 percent. Although both physical writes must be executed before the logical write to the array can be considered complete, and although the two write requests are generated serially, the actual physical writes to the two drives occur in parallel. Since it is the head positioning and subsequent writing that occupy the bulk of the time required for the entire transaction, the extra time needed to generate a second write request has only a small impact on the total time required to complete the dual write.

RAID 2

RAID 2, a proprietary RAID architecture patented by Thinking Machines, Inc., distributes data across multiple drives at the bit level, using Hamming code error detection and correction. RAID 2 uses multiple dedicated disks to store parity information, and therefore requires that an array contain a relatively large number of individual disk drives. For example, a RAID 2 array with four data drives requires three dedicated parity drives. Consequently, RAID 2 has the highest redundancy of any of the parity-oriented RAID schemes.

The bit-wise orientation of RAID 2 means that every disk access occurs in parallel. RAID 2 is optimized for applications like imaging that require transfer of large amounts of contiguous data. RAID 2 is not a good choice for random-access applications that require frequent small reads and writes. The amount of processing overhead needed to fragment and reassemble data makes RAID 2 slow relative to other RAID levels. The large number of dedicated parity drives required makes it expensive. Because nearly all PC LAN environments have heavy random disk access, RAID 2 has no place in a PC LAN.

RAID 3

RAID 3, illustrated in figure C.3, stripes data across drives, usually at the byte level, although bit-level implementations are possible. RAID 3 dedicates one drive in the array to storing parity information. Like RAID 2, RAID 3 is optimized for long sequential disk accesses in applications like imaging, and is inappropriate for random-access environments like PC LANs. Any single drive in a RAID 3 array can fail without causing data loss, since the data can be reconstructed from the remaining drives. RAID 3 sometimes is offered as an option on PC-based RAID controllers, but seldom is used.

Figure C.3 RAID 3 uses byte striping with a dedicated parity disk.

RAID 3 uses byte striping with a dedicated parity disk.

RAID 3 can be considered an extension of RAID 0, in that RAID 3 stripes small chunks of data across multiple physical drives. For example, in a RAID 3 array that comprises four physical drives, the first block is written to the first physical drive, the second block to the second drive, and the third block to the third drive. The fourth block, however, is not written to the fourth drive; instead the round-robin moves to the first drive again, and writes the fourth block there. Instead of storing user data, the fourth drive stores the results of parity calculations performed on the data written to the first three drives. This small chunk striping provides good performance on large amounts of data since all three data drives are operating in parallel. The fourth drive, the parity drive, provides redundancy to ensure that the loss of any one drive does not cause the array to lose data.

For sequential data transfers, RAID 3 offers high performance due to striping, and low cost due to its reliance on a single parity drive. It is this single parity drive, however, that is the downfall of RAID 3 for most PC LAN applications. By definition, no read to a RAID 3 array requires that the parity drive be accessed, unless data corruption has occurred on one or more of the data drives. Reads therefore proceed quickly. However, every write to a RAID 3 array requires that the single parity drive be accessed and written to in order to store the parity information for the data write that just occurred. The random access typical of a PC LAN environment means that the parity drive in a RAID 3 array is over-utilized, with long queues for pending writes, while the data drives are underutilized since they cannot proceed until parity information is written to the dedicated parity drive.

Returning to the hardware store analogy, RAID 3 allows multiple checkout lines, all but one of which accept only cash. The sole remaining checkout line accepts only credit cards. As long as most customers pay cash, this scheme works well. If, instead, many customers decide to pay by credit card, the queue for the single checkout line that accepts credit cards grows longer and longer while the checkout clerks in the cash lines have nothing to do.

Thus, RAID 3 works well in read-intensive environments, but breaks down in the random-access read/write environments typical of a PC LAN.

RAID 3 is a common option on hardware RAID implementations. In practical terms, RAID 5 is a universally available option and is usually used in preference to RAID 3, since it offers most of the advantages of RAID 3 and has none of the drawbacks. Consider using RAID 3 only in very specialized applications where large sequential reads predominate; for example, in a dedicated imaging server.

RAID 4

RAID 4 is similar to RAID 3, but stripes data at the block or sector level rather than at the byte level, thereby providing better read performance than RAID 3 for small random reads. The small chunk size of RAID 3 means that every read requires participation from every disk in the array. The disks in a RAID 3 array are therefore referred to as synchronized or coupled. The larger chunk size used in RAID 4 means that small random reads can be completed by accessing a single disk drive instead of multiple data drives. RAID 4 drives are therefore referred to as unsynchronized or decoupled.

Like RAID 3, RAID 4 suffers from having a single dedicated parity drive that must be accessed for every write. RAID 4 has all the drawbacks of RAID 3, and does not have the performance advantage of RAID 3 on large read transactions. About the only environment in which RAID 4 makes any sense at all is one in which nearly 100 percent of disk activity is small random reads. Since this situation is not seen in real-world server environments, do not consider using RAID 4 for your PC LAN.

RAID 5

RAID 5, illustrated in figure C.4, is the most common RAID level used in PC LAN environments. RAID 5 stripes both user and parity data across all the drives in the array, consuming the equivalent of one drive for parity information. With RAID 5, all drives are the same size, and one drive is unavailable to the operating system. For example, in a RAID 5 array with three 1G drives, the equivalent of one of those drives is used for parity, leaving 2G visible to the operating system. If you add a fourth 1G drive to the array, the equivalent of one drive is still used for parity, leaving 3G visible to the operating system. RAID 5 is optimized for transaction-processing activity, in which users frequently read and write relatively small amounts of data. It is the best RAID level for nearly any PC LAN environment.

Figure C.4 RAID 5 uses sector striping with distributed parity.

The most important weakness of RAID levels 2 through 4 is that they dedicate a single physical disk drive to parity information. Reads do not require accessing the parity drive, so they are not degraded, but each write to the array must access this parity drive, so RAID levels 2 through 4 do not allow parallel writes. RAID 5 eliminates this bottleneck by striping the parity data onto all physical drives in the array, thereby allowing parallel writes as well as parallel reads.

RAID 5 reads, like reads with RAID levels 2 through 4, do not require access to parity information unless one or more of the data stripes is unreadable. Because both are optimized for sequential read performance where the block size of the requested data is a multiple of the stripe width, RAID 5 offers sequential read performance similar to that of RAID 3. Because, unlike RAID 3, RAID 5 allows parallel reads, RAID 5 offers substantially better performance on random reads. RAID 5 matches or exceeds RAID 0's performance on sequential reads because RAID 5 stripes the data across one more physical drive than RAID 0 does. RAID 5 performance on random reads at least equals the performance of RAID 0, and usually is better.

RAID 5 writes are more problematic. A RAID 0 single-block write requires only one access to one physical disk to complete the write. With RAID 5, the situation is considerably more complex. In the simplest case, two reads are required, one for the existing data block, and the other for the existing parity block. Parity is recalculated for the stripe set based on these reads and the contents of the pending write. Two writes are then required, one for the data block itself, and the other for the revised parity block. Completing a single write therefore requires a minimum of four disk operations, compared with the single operation required by RAID 0.

The situation worsens when you consider what must be done to maintain data integrity. Because the modified data block is written to disk before the modified parity block is written, the possibility exists that a system failure could result in the data block being written successfully to disk, but the newly calculated parity block being lost, thereby leaving new data with old parityóand corrupting the disk. Such a situation must be avoided at all costs.

RAID 5 addresses this problem by borrowing a concept from database transaction processing. Transaction processing is so named because it treats multiple component parts of a related whole as a single transaction. Either the whole transaction is completed successfully, or none of it is. For example, when you transfer money from your checking account to your savings account, your savings account is increased by the amount of the transfer, and your checking account is reduced by the same amount. This transaction obviously involves updates to at least two separate records, and possibly more. It wouldn't be acceptable to have one of these record updates succeed, but the other one fail (either you or the bank would be upset, depending on which one fails).

The way around this problem is a process called two-phase commit. Rather than writing the altered records individually, two-phase commit first creates a snapshot image of the entire transaction, and stores this image. It then updates the affected records, and verifies that all components of the transaction have been completed successfully. As soon as this is verified, the snapshot image is deleted. If the transaction fails somewhere in the middle, the snapshot image is used to roll back the status of whatever portion had been updated, returning the system to an essentially unmodified state.

RAID 5 uses a two-phase commit process to ensure data integrity, further increasing write overhead. It first does a parallel read of every data block belonging to the affected stripe set, calculating a new parity block based on this read and the contents of the new data block to be written. The changed data and newly calculated parity information are written to a log area, along with pointers to the correct locations. After successfully writing the log information, the changed data and parity information are written in parallel to the stripe set. After verification that the entire transaction has been completed successfully, the log information is deleted.

This process obviously introduces considerable overhead to the write process, and in theory slows RAID 5 writes by 50 percent or more, relative to RAID 0 writes. In practice, the situation is not as bad as you might expect. Examining the process shows that the vast majority of the extra time involved in these overhead operations is consumed by physical positioning of drive headsóthis brings up the question of caching.

At first glance, caching might appear to be of little use for drive arrays. Drive arrays range in size from a few gigabytes to a terabyte or more. Most arrays mainly service small random read requestsóeven frequent large sequential reads can be considered random in this context, relative to the overall size of the array. Providing enough RAM to realistically do read caching on this amount of disk space would be prohibitive simply due to cost. Even if you were willing to buy this much RAM, the overhead involved in doing cache searches and maintaining cache coherency would swamp any benefits you might otherwise gain.

Write caching, however, is a different story. Existing RAID 5 implementations, to avoid most of the lost time described earlier, relocate operations whenever possible from physical disk to non-volatile or battery-backed RAM. This caching, in conjunction with deferred writes to frequently updated data, reduces overhead by an order of magnitude (or more), and allows real-world RAID 5 write performance that approaches that of less capable RAID versions.

Returning to the hardware store analogy, RAID 5 allows multiple checkout lines, all of which accept both cash (disk reads) and credit cards (disk writes). Because each checkout line is equipped with a scanner, it doesn't take much longer to process many items (large, sequential disk access) than it does to process only a few items (small, random disk access). As long as most customers pay cash, this scheme works well. The lines are short, transactions are completed quickly, and nobody has to wait long. Even though some customers pay by credit card, the lines remain relatively short because most transactions in any given line are cash. If, instead, many customers decide to pay by credit card, each checkout line grows longer because checkout clerks take much longer to process credit card transactions than they do to process cash. In the same way, RAID 5 works well in any environmentólike a typical PC LAN'sóthat involves mostly reads, with less frequent writes.

Proprietary and Non-Standard RAID Levels

RAID is the hottest topic in mass storage right now. Only a year or two ago, articles on RAID were seen only in magazines intended for LAN managers. Today you see RAID discussed in mass-market computer magazines like PC Computing. Inevitably, suggestions of using RAID in workstations rather than just servers have begun to appear.

As is usually the case with a hot product category, manufacturers push the envelope, developing their own proprietary extensions to the standards-based architectures. Also in keeping with tradition, some of these extensions originate with the engineering folks and represent real improvements to the field, while others come from the marketing department and represent nothing but an attempt to gain a marketplace advantage.

RAID 6

The term RAID 6 is now being used in at least three different ways. Some manufacturers simply take a RAID 5 array, add redundant power supplies and perhaps a hot spare disk, and refer to this configuration as RAID 6. Others add an additional disk to the array to increase redundancy, allowing the array to suffer simultaneous failure of two disks without causing data loss. Still others modify the striping method used by RAID 5 and refer to the result as RAID 6. Any of these modifications might yield worthwhile improvements. Be aware, though, that when you see RAID 6, you need to question the vendor carefully to determine exactly what it means.

RAID 7

RAID 7 is patented by Storage Computer Corporation. From published documents, it appears that, architecturally, RAID 7 most resembles RAID 4 with the addition of caching. RAID 7 uses a dedicated microprocessor-driven controller running an embedded propriety real-time operating system named SOS. Storage Computers equips its arrays with dual Fast SCSI-2 multi-channel adapters, allowing one array to be simultaneously connected to more than one host, including mainframes, minicomputers, and PC LAN servers.

Storage Computer Corporation claims that RAID 7 provides performance equal to or better than RAID 3 on large sequential reads, while at the same time equaling or bettering RAID 5 on small random reads and writes. Anecdotal reports have claimed performance increases of between three and nine times, when compared with traditional RAID 3 and RAID 5 arrays. The claimed benefits of RAID 7 have been hotly debated on the Internet since the product was introduced. Some posted comments have reported significant increases in performance, while others have questioned the benefits, and even the safety, of RAID 7, particularly in a UNIX environment. The jury is still out on RAID 7.

Stacked RAID

One characteristic of all RAID implementations is that the array is seen as a single logical disk drive by the host operating system. This means that it is possible to stack arrays, with the host using one RAID level to control an array of arrays, in which individual disk drives are replaced with second-level arrays operating at the same or a different RAID level. Using stacked arrays allows you to gain the individual benefits of more than one RAID level, while offsetting the drawbacks of each. In essence, stacking makes the high-performance RAID element visible to the host while concealing the low-performance RAID element used to provide data redundancy.

One common stacked RAID implementation is referred to as RAID 0/1 or RAID 0+1, which also is marketed as a proprietary implementation named RAID 10. RAID 0/1 is illustrated in figure C.5. This method combines the performance of RAID 0 striping with the redundancy of RAID 1 mirroring. RAID 0/1 simply replaces each of the individual disk drives used in a RAID 0 array with a RAID 1 array. Since the host computer sees the array as RAID 0, performance is enhanced to RAID 0 levels. Since each drive component of the RAID 0 array is actually a RAID 1 mirrored set, data safety is at the same level you expect from a full mirror. Other stacked RAID implementations are possible. For example, replacing the individual drives in a RAID 5 array with subsidiary RAID 3 arrays results in a RAID 53 configuration.

Figure C.5 RAID 0+1 uses sector striping to mirrored target arrays.

Another benefit of stacking is that it can build arrays with extremely large capacity. For the reasons described earlier, RAID 5 is the most popular choice for PC LAN arrays. However, for technical reasons described below, a RAID 5 array normally should be limited to five or six disk drives. The largest disk drives currently available for PC LANs hold about 9G, placing the upper limit on a simple RAID 5 array at about 50G. Replacing the individual disk drives in a simple RAID 5 array with subsidiary RAID 5 arrays allows you to extend this maximum to 250G or more. In theory, it is possible to use three tiers of RAIDóan array of arrays of arraysóto extend the capacity to the terabyte range.

Sizing the Array

Although we've talked a great deal about redundancy, we haven't examined in detail what happens when a drive fails. In the case of RAID 0, the answer is obvious. If your RAID 0 array comprised partitions on two physical disk drives, the data remaining on the good drive is unusable. You could, of course, spend hours, days, or weeks attempting to piece together file fragments remaining on the good drive. If your RAID 0 array comprised partitions on more than two physical disk drives, the situation is even more complex because you have file fragments distributed over additional drives. The only realistic alternative when one drive in a RAID 0 array fails is to replace the failed drive and rebuild the RAID 0 array from a backup set. With RAID 1, the answer is equally obvious. The failed drive was an exact duplicate of the remaining good drive, all your data is still available, and all your redundancy is gone until you replace the failed drive. With RAID 3 and RAID 5, the issue becomes much more complex.

Because RAID 3 and RAID 5 use parity to provide data redundancy, rather than physically replicating the data as RAID 1 does, the implications of a drive failure are not as obvious. In RAID 3, the failure of the parity drive has no effect on reads, since the parity drive is never accessed for reads. In terms of RAID 3 writes, failure of the parity drive removes all redundancy until the drive is replaced, since all parity information is stored on that drive alone. When a data drive fails in a RAID 3 array, the situation becomes more complicated. Reads of data formerly stored on the failed drive must be reconstructed using the contents of the other data drives and the parity drive. This results in a greatly increased number of read accesses, and correspondingly lowered performance.

With RAID 5, any failed drive is similar to RAID 3 with a failed data drive. Because every drive in a RAID 5 array contains both data and parity information, the failure of any drive results in the loss of both data and parity. An attempt to read data formerly resident on the failed drive requires that every remaining drive in the array be read, and parity used, to recalculate the missing data. For example, in a RAID 5 array containing 15 drives, a read and reconstruction of lost data requires 14 separate read operations and a recalculation before the data can be returned to the host. Writes to a RAID 5 array with a failed drive also require numerous disk accesses.

To make matters worse, when the failed drive is replaced, its contents must be reconstructed and stored on the replacement drive. This process, usually referred to as automatic rebuild, normally occurs in the background while the array continues to fulfill user requests. Because the automatic rebuild process requires heavy disk access to all other drives in an already crippled array, performance of the array can be degraded unacceptably. The best way to limit this degradation is use a reasonably small stripe width, limiting the number of physical drives in the array to five or six at most.

Choosing a RAID Implementation

Choosing the best RAID implementation for your needs means making two decisions. First, you must determine which RAID level or which combination of RAID levels provides the optimum mix of speed and safety for the type of data that you need to store. Second, you must decide whether to implement that RAID level by using Windows NT Server native RAID software or by purchasing a dedicated RAID hardware solution.

Picking the Best RAID Level for Your Needs

In theory, there are two important considerations in selecting the best RAID implementation for your particular needs. The first consideration is the type of data to be stored on the array. The various RAID levels are optimized for different storage requirements. The relative importance in your environment of small, random reads versus large, sequential readsóand of small, random writes versus large, sequential writesóas well as the overall percentage of reads versus writes can determine, at least in theory, the best RAID level to use. The second consideration is the relative importance to you of performance versus the safety of your data. If data safety is paramount, then you should choose a lower-performing alternative that offers greater redundancy. Conversely, if sheer performance is the primary goal, then you should choose a higher-performing alternative that offers little or no redundancy, and instead use backups and other means to ensure the safety of your data. Also lurking in the background, of course, is the real-world issue of cost.

Here are some specific guidelines that might help your decision:

Understanding RAID Product Features

RAID can be implemented in a variety of ways. The SCSI host adapter in your current server might provide simple RAID functionality; if so, you can replace it with a host adapter that offers full RAID support. Software RAID support is provided natively by most NOSs; if the RAID functionality provided by your NOS is inadequate, you can find what you need in third-party RAID software. If you are purchasing a new server, chances are that hardware RAID support is standard, or at least available as an option. If you need to upgrade your existing server, you can select among various external RAID arrays that provide features, functionality, and performance similar to that provided by internal server RAID arrays.

Each of these methods has advantages and drawbacks in terms of cost, performance, features, and convenience. Only you can decide which method best suits your needs. Let's look first at some of the features of various RAID implementations, and then at some of the tradeoffs involved with each.

Disks for Hot Swapping and Hot Sparing

Most external RAID subsystemsóand many servers with internal RAID subsystemsóallow hard disk drives to be removed and replaced without ever turning off the server. This feature, known as hot swapping, allows a failed disk drive to be replaced without interrupting ongoing server operations.

A similar method, known as hot sparing, goes one step further by providing a spare drive that is installed and powered up at all times. This drive automaticallyóand on a moment's noticeócan take the place of a failed drive. Most systems that provide hot sparing also support hot swapping to allow the failed drive to be replaced at your leisure.

Obviously, both the drive itself and the system case must be designed to allow hot swapping or hot sparing. Most internal server RAID arrays, and nearly all external RAID arrays, are designed with front external access to hard drives for just this reason.

Automatic Rebuild

With either hot swapping or hot sparing, the integrity of the array itself is restored by doing a rebuild to reconstruct the data formerly contained on the failed drive, and re-create it on the replacement drive. Because rebuilding is a resource-intensive process, a well-designed RAID subsystem gives you the choice of taking down the array and doing a static rebuild, or allowing an automatic rebuild to occur dynamically in the background while the array continues to service user requests. Ideally, the array should also let you specify a priority level for an automatic rebuild, allowing you to balance your users' need for performance against the time needed to reestablish redundancy.

In practice, performance on an array with a failed driveóparticularly a RAID 5 arrayómight already be degraded to the extent that attempting any sort of rebuild while users continue to access the array is unrealistic. The best solution in this case is to allow the users to continue to use the array as is for the remainder of the day, and then do the rebuild that night. In this situation, you choose a static rebuild, which is far faster than a background rebuild.

Disk Drive Issues

In the scramble to choose the best RAID level, fastest RAID controller, and so on, one issue that frequently is overlooked is that of the disk drives themselves. The RAID implementation you select determines how much flexibility you have in choosing the drives to go with it.

External RAID arrays and internal server RAID arrays typically offer the least flexibility in choice of drives. Although they use industry-standard disk drives, these drives are repackaged into different physical form factors to accommodate custom drive bay designs as well as the proprietary power and data connections needed to allow hot swapping. Because these proprietary designs fit only one manufacturer's serversóor sometimes even fit just one particular modelóthey are made and sold in relatively small numbers. This combination of low volume and a single source makes these drives quite expensive. Another related issue is that of continuing availability of compatible drives. Consider what will happen a year or two from now when you want to upgrade or replace drives. The best designs simply enclose industry-standard drives in a custom chassis that provides the mechanical and electrical connections needed to fit the array. These designs allow the user to upgrade or replace drives by merely installing a new standard drive in the custom chassis. Beware of other designs that make the chassis an integral part of the drive assemblyóyou will pay a high price for replacement drives, if you can find them at all.

Third-party RAID controllers offer more flexibility in choosing drives, at the expense of not providing hot swapping. These controllers simply replace your existing standard SCSI host adapter, and are designed to support standard SCSI disk drives. The situation is not as simple as it seems, however. You might reasonably expect these controllers to be able to use any otherwise suitable SCSI drive. The reality is differentómost of these controllers support only a very limited number of models of disk drive, and often specify the exact ROM revision level required on the drive. Before you buy such a controller, make sure that the drives you intend to use appear on this compatibility list. Make sure also that the controller's drive tables can be easily updated via flash ROM or similar means, and that the manufacturer has a history of providing such updates. Don't assume any of thisóask.

Software-based RAID offers the most flexibility in selecting drives. Because most software based RAID implementationsóboth those native to NOSs and those provided by third partiesóare further isolated from the disk drives than hardware-based RAID implementations are, they care little about the specifics of your disk drives. Software-based RAID depends on the host adapter to communicate with the disk drives themselves. As long as your host adapter is supported by your RAID software, and your drives in turn are supported by the host adapter, you should have few compatibility problems with software-based RAID. Typical software-based mirroring, for example, does not even require that the second drive in a mirror set be identical to the firstÅ97Ÿ only that it be at least as large.

Power Supplies

Most external RAID arraysóand some internal server RAID arraysóuse dedicated redundant power supplies for the disk drives. The arrangement of these power supplies has a significant impact on the reliability of the array as a whole. Some systems provide a dedicated power supply for each individual disk drive. Although this superficially seems to increase redundancy, in fact it simply adds more points of failure to the drive component. Failure of a power supply means failure of the drive which it powers. Whether the failure is the result of a dead drive or a dead power supply, the result is the same.

A better solution is to use dual load-sharing power supplies. In this arrangement, each power supply is capable of powering the entire array on its own. The dual power supplies are linked in a harness that allows each to provide half the power needed by the array. If one power supply fails, the other provides all the power needed by the array until the failed unit is replaced. Another benefit of this arrangement is that because the power supplies normally run well below their full capacity, their lives are extended and their reliability enhanced, when compared with a single power supply running at or near capacity. Power supplies also can be hot swappable, although this feature more commonly is called hot pluggable when referring to power supplies.

Stacked and Multiple-Level Independent RAID Support

Some environments require a stacked array for performance, redundancy, or sizing reasons. Others require multiple independent arrays, each running a different RAID level or mixture of RAID levels. If you find yourself in either of these situations, the best solution is probably either an external RAID array or a high-end internal server RAID array. The obvious issue is whether or not a given RAID implementation offers the functionality needed to provide stacks and multiple independent arrays. The not-so-obvious issue is the sheer number of drives that must be supported. External RAID arrays support many disk drives in their base chassis, and usually allow expansion chassis daisy-chaining, to extend even further the maximum number of disks supported. High-end servers support as many as 28 disk drives internally, and again often make provision for extending this number via external chassis addition. Mid-range servers are typically more limited, in the number of drives they physically support, and in their provisions for stacking and multiple independent arrays. A typical mid-range server RAID array does not support multiple independent arrays, but might offer simple RAID 0+1 stacking.

Manageability

Look for a RAID implementation that provides good management software. In addition to providing automatic static and dynamic rebuild options, a good management package will monitor your array for loading, error rates, read and write statistics by type, and so on. Better packages will even help you decide how to configure your RAID array.

Hardware RAID Implementations

Hardware RAID is available in many forms. If you are purchasing a new server, chances are that the manufacturer offers one or more hardware RAID options for that server. If instead you plan to install hardware RAID on an existing server, you have even more choices available. You might decide to mirror or duplex drives using your existing SCSI adapter or you might install a dedicated RAID controller. Whether you are adding RAID to a new or existing server, buying an external RAID enclosure might make more sense than the alternatives. The available choices vary widely in both features and flexibility and may vary even more widely in cost. Let's look at these choices in more detail and see why one might be much better for your environment than another.

RAID as a Server Option

If you are purchasing a new server, then by all means consider the RAID options offered by the server manufacturer. Any system seriously positioned for use as a server will have RAID as an option (most low-end servers), or as standard equipment (mid-range and high-end servers). Servers that come standard with RAID often offer optional external enclosures to expand your disk storage beyond that available in the server chassis alone.

Purchasing RAID as part of your server has several advantages, most of which are related to the single-source aspect:

Upgrading an Existing Server to Hardware RAID

If your current server is otherwise suitable, upgrading it to hardware RAID is a viable alternative. This upgrade can range from something as simple and inexpensive as adding another disk drive and enabling mirroring on your SCSI host adapter, to something as complex and potentially expensive as adding an external RAID array cabinet. A happy medium in both cost and complexity is replacing your existing SCSI host adapter with a dedicated RAID controller. Each of these solutions provides the basic reliability and performance benefits of hardware RAID, and each varies in the level of features, convenience, and extended RAID functionality it provides.

Mirroring with Your Current SCSI Host Adapter

The SCSI host adapter in your current server might support RAID 0, RAID 1, or both. If it supports none of those, then replacing the host adapter with one that offers simple RAID support is an inexpensive alternative. If your server has only one or two SCSI hard drives, this method allows you to implement mirroring at the cost of simply buying a matching drive for each of your existing drives.

This method buys you 100-percent redundancy and decent performance, and does so inexpensively. What it does not provide are other features of more expensive hardware RAID implementationsóhot swappable drives, redundant power supplies, and so on. Still, for smaller servers, this is a set-and-forget choice. You can do it, walk away from it, and stop worrying about it. If your server is small enough that buying the extra disk drives is feasible, and if you don't care that you have to take down the server to replace a failed drive, this method might be the best choice. It gives you the overwhelming majority of the benefits of a full-blown RAID implementation, for a fraction of the cost.

Adding a Dedicated RAID Controller Card

The next step up in hardware RAID, in terms of both cost and performance, is the dedicated RAID controller card. This card replaces your existing SCSI host adapter, and includes a dedicated microprocessor to handle RAID processing. Such a card can range in price from less than $1,000 to perhaps $2,500, depending on the levels of RAID it supports, its feature set, the number of SCSI channels provided, the amount and type of on-board cache supplied, and so on. All cards support at least RAID 1 and RAID 5, and the newest ones offer a full range of RAID levels, often including various enhanced non-standard RAIDs.

The prices of dedicated RAID controller cards have been dropping rapidly, due to increasing sales volume and to competition from RAID software alternatives. The best cards offer RAID functionality and performance comparable to internal server RAID arrays and external RAID array enclosures. In terms of convenience features, however, the cards are obviously at an inherent disadvantageóthey can do nothing to provide hot swap capabilities, redundant power supplies, and so on.

Most RAID controller cards are sold through OEM arrangements with server manufacturers. For example, the Mylex DAC960óone of the better examples of this type of cardóis used by Hewlett-Packard to provide RAID support in their NetServer line of servers. HP modifies the BIOS and makes other changes to optimize the DAC960 for use in their servers.

Think long and hard before you decide to buy one of these cards as an individual item rather than as a part of a packaged solution. Although the card itself seems inexpensive relative to the quoted price for an external RAID enclosure, you usually will find that after adding up other costsóincluding disk drives, cabling, and possibly an external enclosureóyou have met or exceeded the price of the turnkey solution. To add insult to injury, it's then still up to you to do the systems integration, locate and install the appropriate disks and drivers, and maintain the subsystem. If you decide to use one of these cards, budget for two of them. Few organizations tolerate having their LAN down for an extended period because the RAID controller has failed. On-site maintenance is the exception rather than the rule for these cards, and even using overnight delivery, a swap requires that your LAN be down for at least a day or two.

Using an External RAID Enclosure

External RAID enclosures are the high end of hardware RAID products. They offer everything that internal server arrays do, and then some. Hot pluggable, load-sharing dual power supplies are a common feature, as are hot swappable drives, extensive management capabilities, a full range of RAID options, and provision for stacked RAID. Most of these units support multiple independent RAID arrays, and some allow connection of more than one host. Most units allow you to add additional slave enclosures to expand your disk capacity. As you might expect, all this functionality doesn't come cheaply.

These units are of two types. The first is based on one of the dedicated RAID controller cards described in the preceding section. In this type of unit, a dumb external array, all the RAID intelligence is contained on the card installed in the host, and the external enclosure simply serves to provide space and power for the disk drives. The enclosure makes provision for hot swapping, redundant power supplies, and so on, but the actual RAID functionality remains with the host server. RAID configuration and management is done at the host server. Although they physically resemble more sophisticated external arrays, in concept these units are just simple extensions of the dedicated RAID controller card method, and are accordingly relatively inexpensive. They are in the $3,000 to $5,000 range for the enclosure and controller, without disk drives.

Dumb external arrays often are created from their component parts by mail-order to allow second- and third-tier computer companies to offer a RAID solution for their servers. These arrays suffer from most of the same drawbacks as the dedicated RAID controller cards doólimited drive type support, infrequent driver updates, lack of on-site maintenance, and so on. Think twice before choosing one of these unitsóthen think some more.

The second type of unit, a smart external array, relocates RAID processing to the enclosure itself, and provides one or more SCSI connectors by which the host server or servers are connected to the array. The host server sees a smart external array as just another standard SCSI disk drive or drives.

With this type of array, RAID configuration and management is done at the array itself. Because these arrays are intended for use in diverse environmentsóincluding NetWare, Windows NT Server, and UNIXóthey usually offer a variety of methods for setup and programming. A typical unit might be programmable in a UNIX environment by connecting a dumb terminal to a serial port on the external array, or by using Telnet. In a NetWare or NT Server environment you instead might use provided client software for that NOS. These arrays have full software supportódrivers, management utilities, and so onóavailable for several operating systems, although they usually come standard with support for only one operating system of your choice. Support for additional operating systems, or for extended functionality with your chosen operating system, is often an extra cost option. Smart external arrays start at around $8,000 without drives and go up rapidly from there.

Smart external arrays offer everything you might want in a RAID unit, including support for stacked RAID, multiple independent arrays, and support for multiple hosts. Because manufacturers realize that these are mission-critical components, on-site maintenance is available, provided either by the manufacturer itself or by a reputable third-party organization. In construction, these units resemble minicomputer and mainframe components more than typical PC components.

There are, as always, a few things to look out for when shopping. Make no assumptions about compatibility, cost, or support. Ask, and even if you like the answers, get them in writing.

The first major concern is drive support. Some units allow you to add or replace drives with any SCSI drive of the appropriate type that is at least as large as the old drive. Other units require that you use only drives that exactly match the existing drives in make, model, and sometimes even ROM revision level. Still other units can only use drives supplied by the array manufacturer, because the drives have had their firmware altered somehow. These manufacturers usually tell you they make such alterations for performance and compatibility reasons, which could be true. The net effect, though, is that you are chained to that manufacturer for new and replacement drives, and will have to pay their price for the drives.

The second major concern is software support. With smart external arrays, you are at the mercy of the array manufacturer for NOS support, drivers, management utilities, and so on. Make absolutely certain before purchasing one of these arrays that it has software support availableónot only for your current NOS environment, but for other environments that you might reasonably expect to need in the future. Examine in detail which NOSs are supported, and at what version levels. It does you no good to accept a vendor's assurance that the array supports UNIX, only to find later that the array supports SCO UNIX when what you needed was support for BSDI UNIX. Similarly, support for BSDI 1.x doesn't help if you're running BSDI 2.0.1.

Check the array manufacturer's history of providing support for NOS upgrades soon after the upgrade's release. Although this is not a perfect means of predictionócompanies can change for the better or worseóa history of frequent updates for many NOS environments is a reasonable indicator that the company is committed to providing continuing support for its users. On the other hand, a supported driver list that includes older versions of NOSs but fails to include later versions might indicate that the array vendor tends to drop support for environments that do not sell in large volumes. This might not be a major concern if you use a mainstream product like NetWare or Windows NT Server, but is cause for great concern if your NOS is less popular.

The third major concern is understanding the pricing structure of the array manufacturer. Since these units do not sell in high volume, development costs for drivers and so on have to be distributed over a relatively small number of users. This can make updates, enhanced optional features, and support for additional NOSs very expensive. If you might want to add, for example, SNMP management to your array in the future, do not assume that it will be inexpensive or free. Ask lots of questions before you buy.

External RAID enclosures can be your best choice, particularly if you require large amounts of disk storage, have multiple servers, or use more than one NOS. Don't rule out external RAID enclosures simply on the basis of sticker shock. Examine the true cost involved in acquiring, maintaining, and managing one of these units, versus the cost involvedóincluding increased staff timeóin providing similar functionality using other means.

One final itemósome external RAID enclosures exist that use no RAID processor at all, depending instead on the server processor to perform RAID operations. These units run only with NetWare, using an NLM to provide RAID processing. Because these units support only NetWareóand for other reasons covered fully in the upcoming "Novell NetWare Software RAID" sectionóthey are best avoided.

Software RAID Implementations

All the RAID implementations we have examined so far have their basis in specialized hardware. It is possible, however, to use the server CPU to perform RAID processing, and thereby avoid buying additional hardware. NetWare and Windows NT Server, which between them dominate the PC LAN NOS market, both provide native RAID support. Although there are obvious cost advantages to using software-based RAID, there are also subtle drawbacks to doing so, both in performance and reliability.

In theory, at least, there are some scalability performance advantages to using software RAID. Because software RAID depends upon the server processor, upgrading the server processor simultaneously upgrades the RAID processor. In practice, however, this potential advantage turns out to be illusory. Benchmark tests nearly always show software RAID bringing up the rear of the pack, and hardware RAID out in front.

Microsoft Windows NT Server Software RAID

RAID 1 mirroring is supported directly by Microsoft Windows NT Server for any hardware configuration with at least two disk drives of similar size. NT Server does not require that the mirrored drive be identical to the original driveóonly that it be at least as large. This considerably simplifies replacing failed drives if the original model is no longer available. Similarly, RAID 1 duplexing is supported directly for any hardware configuration with at least two disk drives of similar size, and two disk controllers. As with any duplex arrangement, this removes the disk controller as a single point of failure. As with mirrored drives, NT Server does not require duplexed drives to be identical.

RAID 5 also is supported natively by NT Server for any hardware configuration with at least three disk drives, and one or more disk controllers. NT Server allows as many as 32 drives in a stripe set, although for the reasons mentioned above it is a better idea to limit this number to five or six drives. Microsoft refers to this RAID 5 support as Disk Striping with Parity.

Since NT Server itself provides these disk redundancy options, you might wonder why anyone would purchase expensive additional hardware to accomplish the same thing. The first issue is performance. Although Microsoft has done a good job of incorporating RAID functionality into NT Server, a well-designed hardware RAID solution offers better performance, particularly on larger arrays. Also, although using the server CPU to perform RAID processing can be acceptable on a small (or lightly loaded) server, doing so on a heavily loaded serveróparticularly one running as an application serverósteals CPU time from user applications, and therefore might degrade overall server performance.

The second issue that speaks against using NT Server software RAID is that of flexibility and convenience. Unless you are running NT Server on a system equipped with hot swappable drives and other RAID amenitiesówhich usually would be equipped with a hardware RAID controller, anywayóyou lose the ability to hot swap drives or otherwise maintain the array without taking down the server.

If yours is a small array on a server supporting a limited number of users, then the drawbacks of NT Server software RAID might be an acceptable tradeoff for reduced costs. For larger arrays and mission-critical environments, however, do it right and buy the appropriate RAID hardware solution.

Novell NetWare Software RAID

Like Microsoft Windows NT Server, Novell NetWare provides native software RAID support. Unlike NT Server, which provides both RAID 1 and RAID 5, NetWare offers only RAID 1 mirroring and duplexing. As a result of user demand for RAID support beyond that provided natively by NetWare, various third-party vendors supply software to add enhanced RAID functionality to NetWare.

NetWare software RAID solutionsóboth those native to NetWare and those supplied by third-party vendorsósuffer the same performance and flexibility drawbacks as do the solutions for NT Server. In addition, NetWare products have problems all their own. To understand why, it is necessary to understand a little bit about the way Intel processors work, and some of the architectural differences between NT Server and NetWare.

Intel processors allow processes to run at different privilege levels. The highest privilege level, referred to as Ring 0, is the fastest and most dangerous level. Processes at Ring 0 have complete access to the processor, and a rogue process running at Ring 0 can crash the whole system. Processes running at higher levels are more restricted in how much damage they can do, but they in turn run more slowly because of the extra overhead involved in moving between rings.

NT Server does not run any user processes at Ring 0, including low-level processes represented by vendor-supplied drivers. Windows NT is therefore inherently more stable than NetWare is, but at the same time is inevitably slower. NetWare uses NetWare Loadable Modules (NLMs) as plug-ins to enhance and extend the capabilities of the base operating system. Because NLMs run at Ring 0, any NLM, no matter how poorly written, has full access to the CPUóand therefore has the potential to crash the entire server. NLM technology is speedy but dangerous. With later versions of NetWare, Novell has made provision for testing NLMs at higher ring levels to determine their stability, but a functioning NLM in a production server still runs at Ring 0.

Consider, then, what might happen to the NLM you use to provide RAID services. Even if the RAID NLM itself is well-written and stable, running another NLM can cause a server crash. If this occurs, the possibility exists that the contents of your RAID array are corrupted beyond salvage. You can minimize this risk by using only Novell-certified NLMs, but even then the possibility of data corruption remains. NetWare integral mirroring has been available for years, and has been used on thousands of servers with few problems. Reports of problems with NetWare duplexing are less uncommon, but still rare. Attempting to extend NetWare RAID functionality with third-party products is more problematic.

If, for some reason, you must use one of these products, at least understand the differences between the two Novell levels of approval. The less rigorous level is indicated by a symbol incorporating the Novell logo with the word Yes superimposed, and the phrase It runs with NetWare appearing below the logo. This symbol means little more than that the manufacturer has tested the product with NetWare, and represents that the product is NetWare-compatible. The second, more rigorous level also uses the Novell logo with the word Yes superimposed, but has the words NetWare Tested and Approved below the logo. This symbol indicates that the product has undergone testing by Novell Labs and has been certified by them as NetWare-compatible. Give strong preference to the latter symbol.

Recommendations

Given the wide diversity of RAID products available, and the equally broad range of needs and budgets, it is difficult to make hard and fast recommendations for the most appropriate means of implementing RAID. However, the following observations should serve as useful guidelines:

ï If your disk storage requirements are small, consider using your existing SCSI host adapter to implement mirroring. If your host adapter does not support hardware mirroring, consider replacing it with one that does. Purchase an additional disk drive to mirror each existing drive. The result is greatly increased data safety, and better performance, at minimal cost. Choose this hardware method in preference to the RAID 1 software functions of NT Server or NetWare on the basis of performance and minimizing load on the server.