Chapter 5

The Server Platform

The file server is the focal point of the client/server network model. It is the means by which users on the network gain access to the resources that they need to accomplish their daily tasks. The operative concept behind the local area network is to spread a company's computing resources around the network, instead of having everything centralized as in a mainframe system. The file server functions as a gateway to all these resources, whether they've been located in the server computer or not. The server also functions as a security device, allowing users access only to the resources that they need, and protecting the core operating system from intervention that could interrupt the operations of other users.

Once access is granted to the necessary network resources, users are actually performing their computing tasks using the hardware within their workstations. This is the basis of the client/server model. Processing tasks are performed by every computer on the network, decreasing the load on the central systems. Unlike a mainframe system in which all processing takes place within one computer, file servers allow individual computers to perform the same functions without the need for redundant resources at every workstation. Thus, once centrally located copy of a particular software application can be accessed by dozens of users, each using the processor and memory of their individual machines to run it, and outputting their final results to a single, centrally located printer without placing an excessive burden on the server. The file server can be substantially smaller than a mainframe in size, processing capability, and most importantly cost. For a far smaller price than the "big iron," a local area network can be assembled that will provide users the same (or greater) functionality, along with far greater flexibility and fault tolerance.

Many people who are new to networking believe that there is an inherent difference between a file server and a normal workstation PC. From a pure hardware standpoint, this is nearly always not the case. The primary difference between a file server and a workstation computer is the software that is running on it. As far as the hardware is concerned, the differences are mostly a matter of degree, and marketing. A file server is simply a PC with the same resources as a workstation, but in greater abundance. It usually has a faster processor, more memory, a greater amount of disk space, and a wider array of peripheralsóbut this need not be so. The primary difference is that this machine runs a network operating system (NOS) such as Novell NetWare, as opposed to a client operating system (OS) such as DOS. See Part II for an in-depth examination of the differences between NOSs and OSs.

How Many Servers?

As stated earlier, the basic concept of the LAN is to spread resources around the network. In many cases, this results in the use of multiple file servers that provide network access to different areas of the enterprise. Depending on the size and needs of the company, the use of multiple servers can provide several advantages as well as disadvantages. The primary alternative to this is a server configuration that has gained increasing popularity in recent years, colloquially called the super-server. This consists of a single file server that takes the place of several smaller servers. This machine has a far greater amount of hardware resources than a normal server does. It ends up functioning in almost the same way as a mainframe, in that it is the single contact point of the entire network.

The primary disadvantage of the super-server is the same as that of the mainframeófault tolerance. When operating in a distributed network environment (that is, one utilizing multiple servers spread around the network) users can, in the case of a server failure, log in to another server and continue working, as long as the network has been designed and configured to accommodate this sort of redundancy. In a super-server environment, users tend to be completely dependent on a single machine. Should it fail for any reason, productivity often comes to a halt until repairs are made.

One of the primary advantages of the super-server, though, is the fact that administrative tasks are centralized. Only one set of user accounts needs to be maintained, and the server hardware is all located in one place, rather than having multiple file servers located in different areas of the office, building, or campus.

One of the first questions to be considered when planning an upgrade or expansion of an existing network is whether to augment the capabilities of existing file servers or to add additional servers to the network. Hardware costs are certainly a major factor in this decision, but there are other criteria to be taken into account as well. One of these is the skill level of your user base. When administering a large network, having knowledgeable users or workgroup administrators spread throughout the enterprise can be a tremendous aid to the MIS department. If these people can be counted upon to responsibly administer the small parts of the network local to their department, then a distributed network might be more desirable. If the user base is familiar only with the operations of a workstation, and has little knowledge of networking, then having fewer file servers of greater capacity allows for simpler administration, and less traveling time between servers or departments.

Other criteria include the working habits and the geographical organization of the enterprise. Discrete departmental workgroups in remote locations, with little interaction and different computing requirements, would be better candidates for multiple servers than a large group of identically tasked workers in one location. Of course, circumstances are rarely this clear-cut, and the decision as to the number of servers needed is likely to be a compromise between several of these factors.

As we examine the various components of the network server, we will cover the ways in which the hardware can be assembled to provide a smooth passage of data from source to destination that is free of bottlenecks. Other sections of this book examine the ways in which a suitable level of fault tolerance can be provided for a selected hardware configuration, to also accommodate the needs of the organization that use it. We will also attempt to identify the components that are the most critical to particular types of network use. This way, you should be able to find information that allows you to create a network or upgrade into something that suits the way your business worksóinstead of the other way aroundóand provide additional performance or fault tolerance in the most critical areas and in the most economical manner.

About the Motherboard

Although the microprocessor or CPU is often thought of as defining the capabilities of a computer, it is useless without an architecture designed to support it properly. At the most basic level, a computer operates on data by performing three fundamental operations: storage, transfer, and calculations. Data may be stored in memory or on a hard or floppy disk drive, and calculations are performed within the microprocessor. The task of providing the means by which the data is transferred from the storage media to the processor and back again is accomplished by the motherboard.

In years past, several different means were used to connect the disparate parts of a computer. Minicomputers and mainframe systems used backplanes or wiring systems of a modular design that allowed for the simple replacement of individual parts. The space considerations of the modern PC, however, have led to the use of a single board onto which the processor, system memory, I/O bus, and other vital circuitry are all integrated. This is referred to as the system board, main board, or motherboard (see fig. 5.1).

Figure 5.1 This is a typical PC motherboard

In this chapter, we examine this crucial component to a well-designed file server, as well as all the basic components that are mounted on a motherboard. We look at the means by which the different parts of the computer communicate, and at the little-known ancillary components that can be crucial to the assembly of a system with true compatibility. A time will come when it seems as tho efore you improve it, however, and with this book, you should be able to judge whether or not you need "the latest thing" in the networking industry, and whether or not you can easily adapt your existing system to it.

Documentation

If there is one piece of advice that you retain from this entire book, it should be this one:

Do not buy a PC for use as a file server (or even as a workstation, for that matter) without finding out specific information about the motherboard. If the vendor cannot tell you who makes their motherboards, hang up the phone! If they cannot tell you what chipset or BIOS is on their motherboards, hang up the phone! If they cannot furnish detailed documentation for their motherboards (by this I mean more than an 8-page pamphlet), do not buy from them! Hardware incompatibilities are a major cause of file server outages, and if you do not know what you have inside the box, then no one can help you.

When purchasing PCs as a unit, as opposed to assembling them yourself, you must be aware of the places in which unscrupulous vendors are most liable to cut corners. They realize that potential customers are likely to ask what make of hard drive is in the machine, or how much memory it has in it, but they are far less likely to ask about the speed of the memory, or the chipset on the motherboard. Any reputable vendor should be able to supply you with a spec sheet containing all the technical details concerning a particular system. If they can't, look for a different vendor!

If you are dealing with legacy equipment for which you have no documentation, then use a notebook, spreadsheet, or text file to keep your own documentation. As you identify aspects of the hardware in a machine, write them down in a safe place. This will save you from repeatedly having to open the system to see what's inside, and will help to prevent you from sounding like a fool when you're on the phone with vendors, salespeople, and consultants.

Many simple upgrades can end up being a nightmare. Always budget what you believe to be a sufficient amount of downtime when upgrading a server, and then double it. Murphy's Law always seems to be in force when you've got the case open, and the guts of your file server are all over the floor. A properly documented system often means the difference between a successful upgrade and having to put the server back the way it was because you ordered the wrong parts.

Microprocessors

The microprocessor, sometimes known (rather inaccurately) as the central processing unit or CPU, is the place where all the actual calculations within a computer take place. Hard drive space, memory, and even caching RAM are all merely storage media of different types and speeds that facilitate the delivery of data to the CPU, where the actual processing takes place. A microprocessor is built around thin silicon wafers that have been grown like crystals and subsequently exposed to intense heat. During the heating process, the wafers are exposed to gases containing particles that are impregnated into the silicon, thus creating different degrees of conductivity in precisely specified areas of the chip. Since the late 1950s, when the first techniques for clustering multiple transistors on a single chip were patented, these integrated circuits have evolved in complexity and capability at a phenomenal rate, resulting in today's microprocessors, each of which represents the equivalent computing power of a football stadium full of 1950s vacuum tubes.

On the simplest level, a microprocessor is a component that, when furnished with a particular set of electrical signals input at specified points, will always return exactly the same response. The phenomenal achievement of these devices is the speed at which millions of these responses can be generated and the way in which sequential responses can be made to interact, based on previous responses. These input signals correspond to the actions that a program requests from a processor and the data that the actions are performed upon. Every microprocessor has a vocabulary of these actions, called its instruction set or command set, which is used to form the language by which programmers create their code. The code, which is stored on the computer's memory chips, on a hard drive, or on some other storage device, is then reduced to binary electrical signals that are the only means of communication with the microprocessor, and are fed to its input/output (I/O) unit.

The I/O unit of the microprocessor is essentially the staging area for the electrical input, precisely controlling the rate and timing by which signals are fed to the other two parts of the chip, the control unit and arithmetic-logic unit (ALU). In strict technical terms, it is these two units, exclusive of the input/output unit, that compose the CPU. The I/O unit consists of two kinds of connections to the motherboard of the computer: the address bus, which furnishes the processor with information about the memory location of the commands and data to be processed, and the data bus, through which the information itself travels. Once the signals are fed to the control unit, this part of the chip's architecture then relays the data and command instructions to the ALU, where the actual calculations are performed at the proper time. The control unit contains its own internal clock which is used to control the rate at which all the units function, but the entire processor is also highly reliant on the motherboard's system clock, which is used to ensure that the vast number of electrical signals entering and exiting the microprocessor are not garbled or confused.

All three of these basic units, however, have an influence on the overall speed of the processor. The width and speed of the I/O unit's bus control the rate at which data enters and exits the processor, and the ALU controls the amount of data that can be operated on at any one time by the size and number of its registers. Registers are holding areas where data is temporarily stored while calculations are being performed by the ALU (see fig. 5.2). Because even simple arithmetic calculations require multiple steps when they are reduced to binary electrical signals, the registers are an integral part of the decision-making process, retaining the interim results of calculations within the processor itself, instead of having to offload this information through the I/O bus to the computer's memory chips.

Figure 5.2

CISC and RISC Processors

Although different microprocessor platforms, on a high level, may be performing what are fundamentally the same operations, they often go about them in very different ways. Even a simple arithmetic calculation, when it is reduced to binary electrical signals, is performed in a series of steps, the number and complexity of which may vary greatly in different types of processors. This is due primarily to differences in the various processors' instruction sets. This can be compared to two sentences with the exact same meaning, expressed in two different languages, using two different alphabets. The meaning transmitted is the same, but the intermediate steps can be utterly unlike each other.

The first microprocessors executed many of the instructions in their command sets through the use of numerous intermediate steps. The various combinations of these intermediate steps allowed a large number of highly complex instructions to be added to the command set. These intermediate steps, by which instructions were carried out, came to be known as microcode. The processor's microcode is essentially the alphabet upon which its command set (or language) is based. It runs on a separate operational level within the microprocessor, called a nanoprocessor.

This technique allows the processor a rich and varied instruction set, but also introduces an added layer of processing overhead that can lessen the overall speed at which instructions are executed. This has become known as complex instruction set computing or CISC, and is the operational model for the entire Intel PC microprocessor family. In the 1970s, it was discovered during studies of the actual usage of CISC microprocessors of the day, that roughly 20% of the instructions in the command set were being used to perform 80% of the work. In other words, only a small subset of the existing vocabulary was being used, most of the time. This discovery led to the development of microprocessors that were designed to optimize the performance of the most frequently used commands, at the expense of the seldom used ones. This resulted in what became known as reduced instruction set computing (RISC). RISC processors eliminate the microcode layer that slows down the CISC processing method, substituting a relatively small vocabulary of simpler commands that can be combined to adequately emulate the seldom-used, more complex commands that comprise the other 80% of the typical CISC processor instruction set.

At the time of their introduction, RISC processors, by streamlining the command set and reducing processor overhead in other ways, significantly reduced the number of clock cycles required to execute a specific instruction, and were clearly faster than their CISC counterparts in many operations. However, the continued refinement of both techniques has resulted today in microprocessors of both types that are far more comparable than their predecessors were. CISC command sets have also been streamlined to some extent, and RISC command sets have been enlarged to accommodate today's needs. Comparing the state-of-the-art in both schools of microprocessor design today demonstrates that, while the fastest processors available today still use the RISC model, both processor types offer the speed and the flexibility required by contemporary operating systems.

In addition, the ever-expanding market for these processors has resulted in development cycles that are significantly shorter than they were only a few years ago. Faster and more powerful processors are continually being introduced, and the benchmark ratings of the RISC processors used in many UNIX and Windows NT machines can now be all but matched by the Intel platform, with its Pentium and Pentium Pro processors. When considering an operating system that can run on machines using either CISC and RISC processors, such as Windows NT or SunSoft's Solaris, the RISC solution is likely to provide more pure speed, but this should not necessrily be the deciding factor in the purchase. Costs between the two processor types can vary widely, and care should be taken to decide whether the additional performance gained is worth the additional expenditure. The availability and cost of application software should also be a part of the equation.

File Server Processing

In a network file server, due to the nature of the client/server model, the speed and effectiveness of the microprocessor often do not have a major impact on the overall performance of the network, beyond a minimum required level of adequacy. On most networks, the majority of application processing is carried out by the workstation's processor. While it is obviously an essential part of the system, the microprocessor in the average file server should not be overtaxed by the administrative functions (such as network communications, file sharing, and printer sharing) that are its primary everyday tasks, unless the processor is grossly underpowered or the server is heavily overloaded. In these cases, upgrading the processor can provide little return for what might be a large expenditure in time, money, and effort.

However, in the case of servers that run database applications or other processor-intensive tasks, it is important to ensure that the processor is not overburdened, because delays or even stoppages in logins and network file access can result. Machines such as these can more accurately be called application servers, rather than file servers. If your business requires the use of processor-intensive server modules such as these, then this is a good reason to consider using multiple servers to separate these functions from the traditional network tasks, rather than overburdening a single machine's microprocessor.

As the networking industry progresses, we should begin to see a clearer division between application and file servers. This will result naturally from the trend toward heterogeneous networks composed of mixed platforms, and the continued development of network operating systems and hardware configurations that are clearly better suited for one or another of these tasks. In the next few sections, we examine the Intel microprocessor family in detail, as well as some other processors available for network use today. We also discuss the future in which the use of multiple processors in a single server will allow for even more powerful applications to be run across a network, using the client/server model to spread the resource load more efficiently between server and workstation.

Novell NetWare

While different NOSs are designed for use on machines running different types of microprocessors, and while some NOSsósuch as Microsoft Windows NT and Sunsoft Solarisóare available in versions compiled for various different processors, the most commonly used NOS, Novell NetWare, is currently limited to operating solely on the Intel processor platform. NetWare is used on such a vast majority of network file servers primarily because of the ease and flexibility with which it provides resource sharing and security features, two basic requirements of a network. Although it is likely that the future will see something of a reduction in the overall market share that NetWare currently enjoys, it is equally likely that many networks will continue to run NetWare servers for basic networking tasks.

While the NetWare versions currently shipping can run on any PC with an Intel 80386 microprocessor or better, it is economically and technologically impractical, given the state of today's market, to consider running a NetWare file server on anything less than an Intel 80486 processor. Most of the 386 file servers constructed during the time when that was the cutting edge of processor technology have long since been relegated to the status of workstation.

I have no doubt that there are still many sites running 386 servers, and as long as the services required from the network remain unchanged, this is fine, but attempts to upgrade these machines are probably not worth the effort. Technology has advanced considerably in the past few years, not only in processors, but in memory, storage systems, and expansion buses as well. To get one of these machines up-to-speed for today's definition of a file server, you probably would be left with nothing from the original machine but the case, and even this might be inadequate.

486 and Pentium machines do, however, have processor upgrade paths, but due to the many different chips on the market, these paths may be somewhat convoluted, to say the least. The process of upgrading a NetWare server's microprocessor is no different from that of a workstation. For this reason, details of the upgrade process, as well as material on the history and architecture of the Intel microprocessor line, may be found in chapter 6, "The Workstation Platform."

Buying New Processors

Although processor upgrades may be complex, if there is one component in a new NetWare file server whose purchase can be considered a no-brainer, it is that of the microprocessor. Buy a Pentium. While I would not have said this during the initial release period of the 60MHz and 66MHz models, the technology is now more than stable enough to warrant reliance on the currently available family of chips, ranging in speed from 75MHz to 133MHz. Sufficient care is now being taken to ensure that the chips are properly cooled. At this point in time, however, I recommend a certain amount of caution concerning the new Pentium Pro units, based solely on the fact that they are the result of a new manufacturing process, and have not yet been fully proven in the marketplace. Remember that your file servers are not a place to experiment. Buy the fastest, most reliable products that you can affordóand since you probably will end up paying less for a processor than for nearly any other major component of a server, why not get the best available?

Other Microprocessors

While the Intel platform dominates the file server microprocessor market by a tremendous marginódue primarily to the enduring popularity of Novell NetWareóit is by no means the only game in town. The growing popularity of Windows NT and the explosive growth of the Internet have boosted sales of several alternative processor platforms. The cross-platform support provided by Windows NT is a clear indication of the direction in which the networking industry is progressing. SunSoft Corporation also supports several different processors with its Solaris operating system, and rumors and announcements have been circulating for years concerning a processor-independent version of NetWare. While nothing is likely to disturb Intel's reign in the desktop and home PC markets, several of the cross-platform NOSs now available might be better served by non-Intel processors.

While many of the UNIX platforms available are more mature, they have also become more specialized. Our focus in this section is on platforms supporting the newer NOSs that have attracted the attention of many exclusive NetWare shops, due to the growing number of powerful client/server applications now available and in development, which run on them. The future of networking will be in open, mixed environments in which compatibility is a given, and platforms can be chosen for their suitability to the task at hand, rather than their ability to conform to the needs of a restricting, proprietary architecture. This philosophy has given a new lease on life to many processor manufacturers who at one time faced a limited future in the commercial networking industry.

MIPS Technologies

Like the other major processor manufacturers who rival Intel, MIPS Technologies markets a line of RISC-based microprocessors that are used for a wide range of applications, from embedded electronics and desktop workstations to high-end graphics terminals and supercomputers. Their R4x00 line of processors is used in workstations that can run Windows NT 3.5 or UNIX V.4 as well as specialized online transaction processing systems (OLTP). Windows NT support allows for compatibility with the entire line of x86-based 16-bit DOS and Windows software, as well as the newer 32-bit applications being designed for the NT platform, thus opening the door to a whole new world of users.

The MIPS R4x00 microprocessors are based on a true 64-bit architecture, with full 64-bit registers, virtual address space, and integer and floating point operations. An extended 36-bit address bus gives them a 64G physical address space. The capability for a 64-bit-wide virtual address space is what makes them true 64-bit processors with a virtual address space of 1 terabyte. Able to execute one instruction per clock cycle, the R4400 has separate 16K level 1 write-back caches for instructions and data. The caches are virtually indexed, to allow for simultaneous data accesses and the data cache has a two-entry store buffer so that two store operations per cycle can be executed without latency penalties or the need for instruction pairing, as in the Pentium. There is also a one-line write buffer that allows processing to continue while output is waiting to be written to memory. Support is also provided for communication with a level 2 cache up to 4M in size over a 128-bit data bus. Using standard static RAM chips, the level 2 cache can be configured as a unified cache, or split into instruction and data caches.

Other processors in the MIPS line are designed for more specialized applications. For example, the R8000 utilizes a separate FPU that, in combination with its integer unit, can execute up to four instructions per clock cycle. It is now being used in 3D and graphics workstations manufactured by firms like Silicon Graphics.

DEC Alpha AXP

Digital Equipment Corporation, in designing its Alpha AXP architecture, set about plotting the growth of microprocessor technology for the next 25 years. By quantifying the advancements made since the introduction of the first microprocessors, DEC has predicted that the processors available 25 years from now will have to be 1000 times faster than current technology to keep pace with other developments. They have, therefore, eliminated anything from their designs that they felt would become a bottleneck at any time during that period. For example, since they saw the possibility for an ultimate limit in clock speed, they concentrated on executing more instructions per clock cycle. Indeed, this has become a primary design restriction for the Alpha processors. Everything that could have a negative effect on their multiple-instruction-issue technology has been eliminated from their designs, resulting in an architecture intended to eventually sustain the execution of up to ten instructions per clock cycle.

DEC has made significant inroads toward this goal. The Alpha AXP 21164 microprocessor, available in speeds of 266MHz and 300MHz, is the first processor able to execute over 1 billion instructions per second (bips). It is listed in the Guinness Book of Records as the world's fastest microprocessor. Manufactured using DEC's own fourth-generation CMOS technology, the Alpha AXP processors, like the MIPS, are also of 64-bit RISC architecture, and are designed for the pipelining of instructions, in which execution of subsequent instructions is begun before the first is completed.

As with MIPS, DEC's processor platform is supported by Windows NT, as well as DEC's own OSF/1 and OpenVMS operating systems. Prior to the release of Windows NT, DEC's networking solutions were all highly proprietary and not at all designed for compatibility with other systems. They have recently expanded their presence in the conventional LAN world, though, by introducing lines of PCs based on both Intel and Alpha AXP processors. All processor manufacturers have vast amounts of benchmark data readily available to prove that their processors are the fastest, and DEC is no exception. I have deliberately avoided using this information because benchmarks rarely have any significance in the real world of computing, but I think it is safe to say that DEC has the fastest processors on the market. This performance comes at a heavy cost, though, as their top-of-the-line processors are priced well above the $2,000 mark, while the fastest Pentium is available for under $1,000. For a heavy-duty application server managing multiple databases or other processor intensive tasks, however, a single Alpha AXP processor might be able to replace a significantly more complex, and more costly, multiprocessor system.

PowerPC

Unlike the other processors discussed in this section, the PowerPC is not the product of a single company. It is a microprocessor designed in accordance with a standard developed jointly by IBM, Motorola, and Apple Computer. Although based on the POWER processor architecture used in IBM RS/6000 workstations, the standard calls for a common instruction set architecture that allows any manufacturer to design and build chips running the same code.

PowerPC microprocessors currently are available only in 32-bit implementations, but the standard is scaleable up to a full 64-bit data path. Motorola's chips range in clock speed from 50MHz to 100MHz, with all but the lowest-end processors being manufactured using a 0.5 micron CMOS technology, and running separate instruction and data caches up to 16K. A prototype processor called the MPC620, manufactured by the same process but not yet in production, has 64-bit data and bus widths and runs at a 133MHz clock speed. Incorporating 7 million transistors and including 32K data and instruction caches, this is the first in what is intended to be a line of high-performance processors that will rival the Alpha and MIPS chips for the workstation and application server market.

Many hardware and software companies have expressed interest in the PowerPC, due in part to the fact that an open standard will eliminate the possibility of a single chip manufacturer dominating the industry with a technological monopoly, as Intel currently does with the x86 platform. PowerPC processors are currently being manufactured by both IBM and Motorola for use in IBM RS/6000 servers and workstations, and in Apple Power Macintosh computers. Power Macs use a software emulation routine that allow them to run the standard Apple System 7.x operating system, and the RS/6000s run IBM's AIX flavor of UNIX without alteration, as the processor is completely binary-compatible with the POWER-based RS/6000s.

The RS/6000 systems are also PReP compliant. The PowerPC Reference Platform (PReP) is a standard created by IBM that allows systems built by different manufacturers to be compatible with each other. All PReP-compliant machines should be able to run AIX, as well as the PowerPC versions of OS/2, Solaris, and Windows NT. Another open platform standard, called the Common Hardware Reference Platform (CHRP), has been developed by Apple, IBM, and Motorola, and is designed to be compatible with both PReP and Power Mac machines. Thus, CHRP systems (which are not yet available) should be able to run the MacOS, AIX, and the PowerPC versions of OS/2 and Windows NT. A version of NetWare for the PowerPC is in development, and plans are also in the works for the use of PowerPC processors in future generations of video game systems as well as automobiles.

The PowerPC microprocessors themselves have been available for some time, and desktop systems based on the PowerPC are available from a handful of manufacturers, including IBM. However, attempts to gain a foothold into the Intel market have been hampered by the platform's failure to deliver a persuasive reason to switch. Windows NT for the PowerPC is available, but existing NT applications must be recompiled to run on the processor. In addition, the performance improvements that were touted as being provided by the new processor turn out to hover somewhere around 15% over an Intel processor of the same clock speed, as opposed to the 30% originally estimated. And finally, the price advantage that was also supposed to facilitate the acceptance of the platform is virtually non-existent. Entry level IBM PowerPC systems are priced at $3,700 and up, hardly an incentive to experiment on a new processor with limited OS and application support.

Up to this point, the PowerPC platform has been blessed with a surfeit of good intentions and exciting speculation, but outside of the Power Macintosh, there is as yet no good reason to believe that they will have a major impact on the computing industry anytime soon.

While its potential has only been realized in a limited fashion so far, the Power PC has garnered tremendous interest from both hardware and software manufacturers. Open systems are the way of the future in the networking industry, fostering increased competition and the technological advances that always accompany such competition.

Multiple Processors

After examining the way in which manufacturers have incorporated such remarkable technology into today's microprocessors, the prospects seem almost unlimited. The fact is, however, that software designers keep adding additional power and capabilities to their programs, and it is up to hardware designers to continue making machines that can run them faster and better. The next step in increasing the speed and capability of client/server computing in the LAN environment, as previously demonstrated in the mainframe world, is to use multiple processors in a single machine, accessing a single memory array to share an application's processing tasks evenly among them. This is known as symmetric multiprocessing (SMP), and hardware features facilitating SMP have been designed into all the contemporary microprocessor designs discussed in this section.

The use of multiple processors requires significant alterations in both the hardware of the computer and the software running on it. Among LANs, it is primarily on various flavors of UNIX that multiprocessing has been realized effectively. High-performance workstations used for graphics, CAD, and financial work, as well as large database and OLTP servers, have long made use of multiple microprocessors manufactured by IBM, DEC, and MIPS. It was only with the multiprocessor support built into Windows NT, however, that the possibilities of multiprocessor systems were made available to the primarily Intel-based commercial desktop LAN environment. In addition to this, Novell has recently released an SMP add-on package for NetWare 4.1 that will further assert the viability of this technology in the marketplace.

The existing multiprocessor LAN technologies are mostly the products of individual UNIX development companies, utilizing proprietary hardware with operating systems written specifically for that hardware. Systems range from dual processor Unix graphics workstations to the Massively Parallel Processing (MPP) supercomputers recently marketed by Cray Research, which can utilize up to 2,048 Alpha AXP multiprocessors (this top-of-the-line model can be had for a mere $31 millionócall Cray at (612)452-6650 for more information). With the introduction of multiprocessing on the Intel platform by Windows NT, however, and the subsequent NetWare release, it has been quickly realized that there will be a commercial market for this technology in the near future. Intel has therefore set about creating a hardware standard for Intel-based multiprocessor systems that will make it financially feasible for the developers of operating systems (the traditional laggards in the realization of new technologies) to modify their environments to accommodate the special needs of multiprocessing systems. Without a standard such as this, it would be necessary to customize operating systems and applications to specific hardware designs, all but relegating this technology to a niche market that would never gain widespread acceptance.

The Intel multiprocessing hardware standard is based on SMP. An SMP system is one in which all the processors are functionally identical. Each processor is capable of communicating with every other processor, and with shared memory and I/O systems. Note that some multiprocessing systems are not symmetric. These asymmetric multiprocessing systems usually function by assigning specific tasks to individual processors. For example, the three-stage multiprocessor development effort undertaken by Novell for its NetWare product eventually will allow individual domains created by the NetWare 4.x NOS to be assigned to specific processors, thus allowing for an added measure of fault tolerance. The Intel standard, however (which may or may not be utilized by Novell), is designed to allow an operating system to distribute tasks dynamically among processors for increased speed. This is not to say, however, that two processors deliver twice the speed of a uniprocessor system. In fact, this is almost never the case, and if the jury remains out on this technology, it is because overall speed in some systems increases at a diminishing rate with the addition of each extra processor.

The goal of the Intel standard is to allow SMP systems to run all existing shrink-wrapped uniprocessor software as well as multiprocessor-enabled applications with a minimal amount of modification to the hardware of the system. The essential difference in hardware will be the inclusion of an advanced programmable interrupt controller (APIC) that allows communications between individual processors as well as between processors and the I/O system over a separate bus called the interrupt controller communications (ICC) bus. It is through this conduit that interrupts are delivered from source to destination anywhere in the SMP system. In this way, no additional interrupt traffic is created on the system's memory bus, and the processors have a dedicated communications conduit to facilitate the sharing of tasks among themselves.

Although it functions as a single unit, the APIC uses a distributed architecture, and can actually be divided among several chips. The local APIC is the module associated with the microprocessor itself. All Pentiums of the second generation or later (that is, 75MHz or above) have a local APIC integrated into the chip; other processors require the use of the Intel 82489DX interrupt controller, which contains the local APIC as well as the I/O APIC. The I/O APIC is available also as part of an I/O chipset, such as Intel's 82430 PCI-EISA bridge chipset. Each local and I/O APIC has a unit ID register that functions as the physical name of the unit for purposes of internal communications over the ICC bus, as well as for identification of specific I/O and interprocessor interrupts by software.

Intel has also incorporated a cache consistency protocol, named MESI, into the second-generation Pentium's internal data cache design. This protocol assigns one of four states to each line in the data cache, which allow determinations to be made regarding the read/write status of each line. This is done to ensure that data in the cache remains consistent with data in memory, despite access to the same memory addresses by multiple processors.

Apart from the APIC, the primary specific hardware modifications that have to be made are in the system BIOS. It is necessary to have a BIOS that can identify and initialize all the microprocessors and other MP-related hardware in the system. Specifications other than these are more general, and fall easily under the realm of common sense. Obviously, a system with more processing power creates a larger burden on other subsystems, particularly on memory. Intel recommends the use of a level 2 cache with high-performance features such as write-back technology and error correction, but does not include any specifications other than that it be completely software-transparent. The Intel specification also does not recommend a specific system bus type, but the company clearly has plans to make PCI the bus of choice for these implementations, either in place of or in addition to one of the other AT standard buses.

Other standards are in development that rival Intel's standard, including one by their traditional competitor Cyrix. It is too early to tell how popular the multiple processor concept will become in the corporate network, but it is a technology that certainly merits watching. Digital's 25-year plan that predicts microprocessors becoming one thousand times faster than those available today, specifies higher instruction-per-clock-cycle ratios as the source of the first tenfold speed increase. The second tenfold speed increase is allocated to the use of multiple processors. As our technologies approach the ultimate physical limits of their component parts, it will be replication that provides the next step in efficiency. Whether this concept is marketable on a large scale remains to be seen, but multiple processor systems are one of the most viable methods of pushing the performance envelope that forms the core of networking technology today.

Of course, for any appreciable gain in speed to be realized, applications software must also be altered to accommodate SMP. NetWare 4.1 SMP, which is an add-on to the standard NetWare 4.1 product, allows for backwards compatibility with all existing NetWare NLMs but provides only a marginal performance improvement in their operation, as well as in the standard OS file and print services. In order for significant gains to be realized, server-based applications will have to be modified to utilize the functionality provided by the new OS. NetWare has also released an API to facilitate this process, but the extent to which applications developers will commit their efforts to the new environment remains to be seen.

At this time, I would say that if you are already using a Windows NT or NetWare application (such as a database engine) that has become available in an SMP-enabled version, and you are having problems achieving the level of performance that you require, an SMP server may be worth looking into. Otherwise, until the technology is ratified by a greater level of acceptance and commitment in the Intel community, it remains experimental, and therefore risky.

Memory

During our examination of the microprocessors used in today's network file servers, we have seen the current state of an evolutionary developmental process that began with the first PC and has progressed rapidly ever since. This section is concerned with memory, one of the other primary components of a file server, and one that has always lagged behind the rest of the computer in technological development. Intel microprocessors have advanced in speed from 4.77MHz to 133MHz, with more speed to come, and potential I/O bus throughput has increased equally dramatically, but the standard dynamic random access memory (DRAM) chips that populate the average PC have all but topped out, technologically, at a maximum refresh rate of 54 nanoseconds (ns=one billionth of a second), which works out to an approximate peak transfer speed of 18.5MHz on a bus that can be running at speeds of up to 66MHz.

Additional speed can be added, though, through a layer of static RAM (SRAM) cache between the main memory banks and the microprocessor. SRAM is available with refresh rate speeds as fast as 10ns, more than enough for today's high speed buses, but it is still just a cache of 512K at most. While a system that uses SRAM as its primary memory array is possible, SRAM chips are not only much faster, but unfortunately are also much larger, run much hotter, and are ten times more expensive than normal DRAM.

Fortunately, there may be a solution in sight for this memory bottleneck. There are several new RAM technologies entering the market that promise a performance rate similar to that of SRAM, but at only a fraction of the size, heat, and cost. While it is too early to say which, if any, of these new memory types will revolutionize the industry, be assured that the players who have the most interest in the new microprocessor and I/O bus technologies (in particular, Intel) are very interested in supporting these new memory types as they become available. In this section, we examine the types of memory used in today's file servers, cover the essential aspects of memory upgrades, and then take a look at the new technologies that hopefully will bring RAM technology up to speed with the rest of the motherboard.

The Primary Memory Array (DRAM)

The basic hardware component of the dynamic RAM chip has changed little since the first PCs hit the market. They are essentially composed of capacitors that hold a charge, and transistors that read the state of the capacitors' charges and report them in binary format (1=charge, 0=no charge). The chips themselves have gotten faster, and contain more memory in a single package thanks to the use of semiconductor equivalents on an integrated circuit instead of actual capacitors, but the inherent problems of the design remain unchanged.

Capacitors have a natural tendency to leak off their charge over time, causing a 1 to become a 0, and data corruption to occur. To prevent this, a transistor must be dedicated to each capacitor for the purpose of checking the state of the charge and refreshing it before it can decay. This refresh overhead is what imposes a speed limitation on this technology.

What has changed over the years is the hardware used to mount the chips in a computer system. In the earliest PCs, individual DRAM chips of far smaller capacity than today's models were mounted by hand onto a board. If you wanted to add more memory to your system, you bought more chips and popped them into any free sockets available on the memory board. This became increasingly impractical, due to both the increase in the average system's memory requirements and the larger amount of space that was required to make the chips accessible to human fingers or tools.

PCs today all use memory chips that have been premounted on small circuit boards called SIMMs, or single inline memory modules (see fig. 5.3). SIMMs make it much easier to handle large amounts of memory, because several chips can be mounted on one board, and since they are permanently soldered in place, they can be placed much closer together. The primary drawback to this concept is that if one chip on a SIMM goes bad, the entire unit must be replaced (unless you are very good with a soldering iron). Older PCs may use SIPPs, or single inline pin package modules, which substitute pins for the SIMMs' edge connectors, or even the individual chips mentioned above called dual inline pin chips (DIPs). Since we are concerned only with file servers in this chapter, we continue to assume that a minimum of an 80486-based PC is being used.

Figure 5.3 This is a typical SIMM.

Adding and Replacing Memory

SIMMs are mounted roughly perpendicular to a PC's motherboard in slots specifically designed to hold them. The process of adding or removing SIMMs is quite easy, but knowing which ones to buy as well as how many and where to put them is highly dependent on the rest of the system, and can be rather confusing.

The main concerns when purchasing SIMMs are

If, as recommended earlier, you have obtained the documentation for your motherboard, the upgrade process should be easy. The manual should tell you what type of SIMMs to buy, and how you can safely array them in the slots provided. If you don't have the documentation, the best thing to do is to examine the SIMMs already in the machine and duplicate them as closely as possible.

Memory is arrayed on the motherboard in banks that correspond to the width of the microprocessor's data bus. A single bank may consist of up to four individual SIMM slots. Thus, a motherboard designed for a 486 is likely to have 36-bit banks (with a parity-checking mechanism accounting for the extra 4 bits above 32). A Pentium motherboard therefore has 72-bit banks (64 bits plus 8 for parity). Since SIMMs are available either as 30-pin (9-bit) or 72-pin (36-bit) modules, the 486 motherboard has either four 30-pin slots per bank (9 bits times 4 slots for a 36-bit bank) or one 72-pin slot. Most newer 486 systems use 72-pin slots and virtually all Pentium motherboards use two 72-pin slots per bank.

It is important to identify how many slots per bank your system has, because all the SIMMs in any one bank must be identical. Depending on the configuration of the motherboard, this in itself can impose severe limitations on the amount of memory that you may add. If, for example, you have a 486 PC with two banks of four 30-bit slots each, you are limited to a relatively small number of memory configurations, due to the fact that SIMMs are only available in 1M, 4M, 16M, and 64M sizes. Since each bank must consist of four identical SIMMs, you must have either 4M, 16M, 64M, or 256M of memory in any one bank. If your system has more than one bank, then you can safely put 1M SIMMs in one bank and 4M SIMMs in the other, or even leave one bank completely empty. This leaves you a limited growth path. In addition to this, you must take the possibility of future upgrades into account. If all your slots are full, and you wish to add more memory, then some of the old SIMMs must be replaced with higher capacity models. The SIMMs that are removed can be used in another machine, or else go to waste. This is one of several good reasons to maintain a measure of uniformity in your PC purchases. Virtually every network has servers or workstations that could use a few more megabytes of memory; as long as other machines use the same type, SIMMs can easily be transferred.

For reasons of flexibility, then, it is preferable that you purchase a motherboard that uses the 72-pin SIMMs. In a 486 machine, this means one slot per bank, and many more possible ways to configure the server's memory to your needs, rather than the other way around.

Identifying SIMMs

SIMMs generally are looked at in one of two ways, both of which can be confusing: on paper or in your hand. That is, looking at one of the SIMMs currently installed in your PC and at the listings of available SIMM types in catalogs or advertisements and correlating the two to figure out what you already have and what you need can be difficult.

Catalogs usually do not provide illustrations of individual SIMM types, since they're not much to look at, and you probably wouldn't be able to read the printing on the chips in a photograph, anyway. This section provides the means for identifying the specifications of the memory in a PC in order to purchase additional chips that are compatible. When you lack a definitive source for documentation of your motherboard, however, it is impossible to be absolutely sure that a SIMM will be compatible, unless you are able to purchase exact duplicates of those already in the machineóand this is unlikely. Therefore, be sure you know your selected chip vendor's return policies in case you have problems. Most memory vendors allow SIMMs to be returned, primarily because inserting them into an incompatible motherboard rarely results in damage. They simply do not work. Also, there is little or no packaging overhead in the sale of SIMMs. Usually, they arrive loose, in a small anti-static bag, so there are no concerns about shrink-wrapping or resale of returned modules.

Removing SIMMs from the Motherboard

It can be difficult to identify SIMMs while they are mounted on the motherboard, so the best way to know what type of SIMMs are in the machine is to remove one of them. In many cases, SIMM sockets hold the SIMMs at an angle slightly less than perpendicular to the motherboard. This is because when they are installed, they are inserted straight in and then tilted a few degrees to engage two small clips that secure the SIMM at either end (see fig. 5.4). Some SIMMs, however, are inserted at the angle and drawn perpendicular to lock in place. The locking clips are made of metal or plastic. Be particularly careful with plastic clipsóif you break one, you have to choose from unattractive alternatives: replacing the motherboard (very expensive and a lot of work), or holding the SIMM in place with a rubber band or a piece of tape (unreliable at best).

Figure 5.4 Closeup view of a SIMM installed on a motherboard, and the clips holding it in place.

Before touching the motherboard or SIMMs at all, unplug the PC from its power source, and then ground yourself to prevent a static discharge while you're working. Many components of a PC are sensitive to static, but none are more sensitive than DRAM chips, so it is advisable to wear a grounding strap or work in an environment not prone to static buildup. Avoid carpeting and wool sweaters, and be sure to touch metal before actually handling SIMMs. Also, be sure to store any loose SIMMs in an anti-static bag, even if they will be needed in just a few minutes.

Since SIMMs usually have to be installed in order, with each one overlapping the previous one, only the first one is immediately accessible. To remove it, release the clips at either side of the first module so that you can freely move the entire chip into an upright position, perpendicular to the motherboard. Once it is free of the clips, you should be able to lift the entire SIMM out of the socket and remove it from the system.

Number of Pins

When you look closely at a SIMM, you will see that it looks like a very small expansion card. A row of 30 or 72 gold connectors runs across the bottom of a printed circuit board that has a number of chips soldered onto one or both sides. There might be a small "30" or "72" printed on the circuit board itself, to identify the number of pins. The number of pins obviously is an indication of the type of sockets that are installed on your motherboard, but also indicate the width of the chips. 72-pin SIMMs are 32 bits wide (or 36 bits with parity checking), while 30-pin SIMMs are 8 bits wide (or 9 with parity checking).

Chip Speed

Looking at the memory chips that are mounted on the SIMM you have removed, you might see the manufacturer's name, identified by word or symbol, and a part number. A relatively small number of manufacturers make memory chips. Many more companies, however, buy the chips and mount them on SIMMs. If you see a manufacturer's name on the SIMM board, do not confuse it with the maker of the chips.

The last digit in a chip's part number usually is a 6, 7, or 8, and often has a dash in front of it. This, with a zero appended to it, indicates the speed of the chip. This is measured in nanoseconds, and is a measurement of the refresh rate of the chips; the smaller the number, the faster the refresh rate, and the faster the memory performance. 60ns DRAM chips are the fastest that are commercially available. It usually is not a problem to install faster memory into a PC than the motherboard specifications require. It is not recommended, though, that different speeds be used in the same bank, and you should never use slower memory than a system specifies.

Parity

In their first PCs, IBM introduced a basic error checking concept called the parity bit; it has remained in popular use ever since. The system works by including one extra bit for every 8 bits (1 byte) of memory. Most SIMMs do this by including an extra memory chip on the board. Non-parity SIMMs have two or eight chips installed on them, while parity SIMMs have three or nine. As data is written to each byte of memory, this extra bit is set to a 0 or 1 by a dedicated circuit on the motherboard, depending on the state of the other eight bits. If the total value of the other eight bits is an even number, then the parity bit is turned on to make the number odd. If the eight bits add up to an odd number, then the parity bit is left off, keeping the number odd. As each byte is read from memory, the same mechanism checks to see if the total value of the byte is an odd number. If it is not, then the data on that chip is considered corrupted. Most systems generate a non-maskable interrupt (NMI) when a parity error is detected, and immediately halt the system to prevent the tainted data from propagating itself into any other media. An error message usually is displayed, supplying the location of the error in memory, and giving the option of continuing to work or rebooting the system. It should be emphasized that a single parity error does not necessarily indicate the failure of a memory chip. Environmental conditions can cause isolated errors that seldom or never reoccur. Repeated parity errors, however, usually indicate that a SIMM needs to be replaced.

Obviously, the parity bit is a very limited mechanism for error checking. While it should not be able to cause a false positive result, it obviously could fail 50% of the time to detect cases of data corruption in more than one bit (when the data is corrupted but the total remains odd). It is, nevertheless, an added measure of security that creates little overhead for the system. SIMMs containing the parity bit, however, cost more than those that do not, and some vendors have been known to disable parity checking on a motherboard in order to save a few dollars by populating the machine with cheaper SIMMs. If you wish to use parity checking, make sure that the system's motherboard is capable of supporting it (usually through DIP switch or jumper settings) and that all SIMMs installed in the system have a parity bit.

Bear in mind that the relatively infrequent occurrences of corruption errors in today's DRAM chips makes parity checking something of a moot point in most new purchasing decisions. Do not pass up an otherwise well-equipped system merely because it lacks memory parity checkingómany vendors and motherboard manufacturers are beginning to leave this feature out of their newest products.

Error Checking and Correcting Memory

While parity checking has been available for a long time, a newer technique takes the same concept further by being able to actually correct memory errors as well as check for them. On servers in which system halts due to non-maskable interrupts are unacceptable or intolerable, error checking and correcting (ECC) memory systems utilize a checksum technique to effectively detect errors when they occur in any two bits of a data word, and then correct one of the two.

To perform these functions, a coprocessor or application-specific integrated circuit (ASIC) is used to compute a checksum on a data block of designated size. The checksum is appended to the block and sent to its destination. There, the checksum is computed again and compared with the original results. The nature of the algorithm is such that, should an error in a single bit be detected, the exact location of the bit can be derived from the checksum and corrected. Remember that since we are dealing with binary data at this point, simply locating the exact bit in which the error occurs is all that is needed to correct it. Errors in two bits generate an NMI and halt the system. Compared to parity checking, this technique requires much more overhead, in both data storage and processing cycles. The amount of overhead is dependent on the size of the block used and the location of the ECC mechanism in the system. ECC features have been incorporated into devices ranging from tape backup drives to processor caches, and everywhere in-between.

For example, in a 32-bit 486-based system, ECC increases the data path from 32 to 39 bits wide, as compared to parity, which needs a 36-bit width. This requires a significant change in the memory architecture that might not be worth the effort and expense. In 64-bit Pentium systems, however, as well as in other systems using 64-bit processors, the data overhead is only one bit, the same as for parity checking. For little more than the price of a processor, occasional memory errors that previously would have halted the system can be corrected on the fly.

There is a certain amount of disagreement in the industry regarding the real benefits of ECC memory, beyond peace of mind. System outages due to transient memory corruption are far less common than hard drive or power regulation problems, and many people feel that they occur so seldom as to be a negligible risk. If you want to have every measure of protection that you can get for your server's data, however, ECC is a giant step beyond parity checking.

Purchasing Memory

When shopping for memory, you will see SIMMs listed in ads and catalogs using an X[ts]Y formatófor example, 1[ts]9óalong with a speed listing. The first numeral identifies the amount of memory on the SIMM, and the second indicates the width of the chips, the presence of parity, and the number of pins on the SIMM. In the example above, the 1 indicates that it is a 1M SIMM, and the 9 indicates that it uses 8-bit-wide chips and therefore has a 30-pin connector, plus has one bit for parity.

Since 72-pin SIMMs have four times the density of their 30-pin counterparts, the second numeral is always a 32 or a 36 (with parity). For the same reason, the first numeral is one-fourth of what normally would be shown for an 8-bit-wide SIMM. Thus, a 1[ts]36 SIMM consists of 4M of parity memory on a 72-pin board. The same SIMM without parity would be a 1[ts]32. If the width aspect confuses you, it is always possible to tell by the price whether you are getting 4M or just 1M.

You might also see memory described using three numerals, such as 1[ts]9[ts]3. This indicates that the manufacturer has utilized higher-width DRAM chips to make a lower-width SIMM. Though statistically identical to a 1[ts]9 SIMM, this configuration has only three chips mounted on the board instead of nine; two 4 bit chips plus one 1 bit chip for parity. This same configuration has also been expressed as 1[ts]3 by some vendors. While these are functionally equivalent to normal 1[ts]9 SIMMs, some motherboards are intolerant of this configuration. Do not attempt to use such SIMMs if your motherboard manufacturer recommends against it.

Installing Memory

Installing new SIMMs into the motherboard is simply the opposite of the removal procedure outlined above. Follow the same precautions regarding static, and note also the way that the SIMMs should lean when fully installed. Very often, you have to insert SIMMs in order, or else previously installed SIMMs block access to other sockets. Insert a SIMM into its slot in the proper manner, either perpendicular or at an angle to the motherboard. One side of the module has a notched corner to prevent you from inserting it the wrong way. Once the SIMM is seated firmly in the slot, tilt it gently until the clips fasten themselves around the board (see fig. 5.5). You might have to open the clips by hand to accomplish this, rather than force the SIMM into place. Unlike some expansion boards and processor chips, a great amount of force is not required here, so remain gentle.

Figure 5.5 This is a correctly installed SIMM.

Once all your new memory is installed, double-check that all the banks being used are completely filled with identical SIMMs, connect the power cord, and boot the machine. If the memory check display of the BIOS (basic input/output system) is enabled, you should immediately see the results of your efforts as the system counts up to the total amount of memory installed in the machine. If the total does not reflect the additional memory you have installed, reflects only part of it, or the machine fails to boot at all, then one or more of the SIMMs is defective or is not fully seated in its slot, or you have not properly filled a memory bank.

If, however, the memory check displays the proper amount, yet you receive an error message from the BIOS indicating a "memory error" or "memory size mismatch," then everything is all right. What has happened is that the computer's BIOS has detected a different amount of memory in the system than is specified in its configuration ROM. Simply run the BIOS setup program (usually by holding down a certain key combination during the boot process) and save the BIOS configuration as though you had altered some of its settings. After you reboot the machine again, you should have no further problems.

Adding additional memory is one of the simplest and most effective upgrades that can be performed on a server. NetWare, as well as other NOSs, always runs better when it has enough memory to utilize, because many of its processes are flexible enough to take advantage of additional memory as soon as it is supplied. For example, with more memory, greater amounts of file system information can be cached, resulting in faster file access. Therefore, while NOSs often have precise formulas to determine the amount of memory required in a server that depend on the amount of disk space in the machine and other variables, it is always recommended that some extra DRAM be added. Even as much as 25% or 50% more will not go to waste. It is not expensive, and it often makes a noticeable difference in performance.

RAM Caches (SRAM)

The word "cache" is one that is much overused in the computer industry today. Modern computers can have several different caches operating at the same time in different places. Simply put, a cache is an area of memory that is used to buffer input and/or output between two resources of different speeds. Your system might be running a software-based cache, like the MS-DOS Smartdrive program, that utilizes part of your computer's memory to buffer I/O from the hard drive. Hard drives or I/O controllers might have memory chips integrated into their construction to provide for onboard caching. Multiprocessors like the Intel 80486 and Pentium chips have small internal caches where data moving to or from memory can be temporarily stored. This section, though, concerns separate RAM caches that function much the same as those within the processors listed above.

As stated earlier, the DRAM technology that is used for the primary memory arrays in modern PCs is far slower than today's processors and I/O buses. This means that the faster components often sit idle as they wait for data to be transferred to or from memory. Caching is an attempt to minimize these memory latency periods by storing a small subset of the data residing in DRAM memory in a separate, smaller, but much faster memory array. Thus, a RAM cache is a completely separate memory bank with its own controller that is located between the microprocessor and the primary DRAM array. The memory chips comprising this cache are called static random access memory (SRAM). While DRAM has a refresh rate of no less than 54ns, SRAM technology can provide refresh rates of as little as 10ns.

How Caches Work

When a call is made by the microprocessor for data that is present in memory, the microprocessor first checks its own internal cache arrangement (designated the level 1 cache or primary cache); if the data is not found, the request leaves the processor chip on its way to the memory array. When an SRAM cache (called a level 2 cache or secondary cache) is present, however, the data request is intercepted by the cache's controller chip. This controller's function is to maintain a continuously updated listing of the data that is currently in the cache, indexed by the data's memory address in the primary array. If the data is not found in the cache, then a cache miss is declared, and the request is passed on to DRAM, where it is processed in the usual manner. If the data is found in the cache, however, this is called a cache hit, and the data is immediately sent to the processor at its much higher speed, eliminating the delay caused by access to the slower DRAM.

While DRAM chips use equal numbers of capacitors (to hold a charge) and transistors (to read and relay the status of the charge for each capacitor), SRAM chips utilize a pair of transistors for each memory bit. There is no leak-off of the charge in a transistor as there is in a capacitor, so there is therefore no refresh overhead. This is why such dramatically faster speeds are possible. As a tradeoff to the greater speeds, however, SRAM chips are larger than DRAM chips, they run hotter, consume more power, and are far more expensive to manufacture. This is why SRAM is used only in small amounts, for caching, rather than as a primary memory resource.

Because of the extra heat and power consumption, SRAM caches are rarely found in laptops or portables.

Cache Designs

Different cache controller designs utilize different methods of data storage and indexing within the cache, providing various levels of performance. A direct mapped cache splits the main memory array into virtual blocks the same size as the cache itself, and then splits each block into lines. Data from any particular line in any DRAM block can be stored only in its corresponding line in the cache. This method allows for a very small index, but severely limits the possible locations in which a datum from DRAM can be stored in the cache, resulting in many more cache misses.

A full associative cache can store data from any location in the primary memory array into any location on the cache. This causes an improved cache hit ratio, but also creates a much larger index that has to be checked by the cache controller for every request, increasing overhead and decreasing overall performance levels. A set associative cache is something of a compromise between the previous two methods. This design calls for the splitting of the cache into a number of discrete direct mapped areas. A two-way set associative cache is thus split into two areas, a four-way into four, and so on. In this way, there is greater flexibility regarding where a particular piece of data can be stored, without creating an index as large as that of the full associative cache. Difficulties implementing this design have made it more expensive to manufacture than the other two methods, but this nevertheless is the most popular design today.

Another factor to take into account in cache design is the two-directional nature of memory access. The microprocessor is continually sending data to the main memory array and reading data from it. Wait states can be caused by memory latency in either direction, and caches can operate on data being read only from memory or bidirectionally. A write-through cache is one that operates only in a single direction. Data being written to memory by the processor is saved in the cache and immediately passed to main memory. If main memory signals its inability to write at that exact moment, a wait state occurs until main memory signals its readiness. A write-back cache operates in both directions. Data written to the cache is held there until main memory signals its readiness to receive it. This allows other operations to continue in the interim, increasing overall performance. As you can imagine, a write-back cache design essentially doubles the complexity of the caching operation, adding additional overhead to the controller and additional expense to the unit. For maximum performance, though, most PC manufacturers have accepted that a four-way set-associative write-back cache is the best possible design.

Static RAM caches range in size from 64K to 512K. They sometimes are integrated into motherboard designs, but more often are packaged as small daughterboards that plug into a socket on the motherboard. This allows for varying amounts of cache to be provided in a server. The general rule of thumb in caching is "more is better." While some other types of caches might run into the law of diminishing returns, in which the addition of more memory beyond a certain point leads to a disproportionate return in performance, SRAM caches utilize small enough amounts of memory for them not to be affected by this. The only drawback is system design and cost. As with most things, it is not wise to use more cache than your motherboard manufacturer indicates as a maximum. It is important also to check the width of the bus between the microprocessor and the RAM cache. Some motherboards include a dedicated 128-bit bus for this purpose, which can provide an extra measure of added performance.

The SRAM cache is the most common technique in use today for overcoming the speed limitations of DRAM memory arrays. A well-designed cache can provide a tremendous increase in a file server's performance. Care should be taken, however, to utilize only the cache design recommended by the manufacturer of the server's motherboard. There are far fewer manufacturing "standards" in place for SRAM cache design than there are for most other PC components, so the odds of buying the SRAM cache unit that you want and having it work in a particular motherboard are slim. When not purchased with the motherboard itself, cache upgrades are best purchased directly from the motherboard vendor or manufacturer. When purchasing a new file server or motherboard, however, a measure of care taken in selecting a model that provides a fast, efficient SRAM cache can provide a great deal of extra performance for the cost.

Memory Interleaving

Another technique for speeding up memory access times is interleaving. This is a technique in which two or more memory banks of equal capacity are used in parallel. Data bits are written alternately to the two banks, so that sequential memory reads alternate their access between the two.

While one bank is being accessed, the other is being refreshed, effectively reducing wait states. When non-sequential accesses are required, the overall efficiency diminishes, as the laws of probability determine whether or not the next bit required is located on another bank or not. This can be mitigated, however, by the use of more banks, and there is no performance penalty (beyond traditional DRAM access times) for sequential reads from the same bank. Requiring no alterations to the DRAM chips, this is probably the most inexpensive means of reducing memory access delays, and can be used in conjunction with a cache for additional performance gains. This technique, however, is a property of the motherboard, not the DRAM chips themselves. Check the motherboard documentation to find out if memory interleaving is possible on your server.

New Memory Technologies

While SRAM caching provides a measure of relief for the slow performance of standard DRAM chips, it is no more than a stopgap measure. Processor and bus speeds continue to rise, and the added cost of additional SRAM chips and more powerful caching controllers will become increasingly unmanageable.

An alternative is clearly required that will provide performance near that of SRAM for the entire main memory array at a manageable cost, preferably without a major change in motherboard architecture. This sounds like a tall order, but there are several new technologies in the works that may provide just that.

As is usually the case in the computer industry, however, one problem generates several possible solutions, and the requisite period of corporate bickering over standards will no doubt ensue before any technology emerges as the dominant solution.

Nevertheless, steps are being taken in the right direction, and the first systems utilizing some of these technologies are already on the market. Never a trailblazer as far as my file servers are concerned, I do not recommend the use of any of these technologies until they have been adequately tested under real world conditions, but I certainly will be keeping a watchful eye on their progress.

EDO DRAM

Enhanced data output (EDO) DRAM is the first of these new memory technologies to be available on the open market today. Several giant semiconductor corporations, including Samsung, Mitsubishi, Hyundai, and particularly Micron Technologies, have announced that they will manufacture the chips, and systems and chipset manufacturers are also beginning to endorse this new technology, which is said to deliver zero wait states at bus speeds of up to 66MHz. Pentium systems using EDO DRAM are already available from Micron Computer and Toshiba. These systems contain motherboards that utilize the new Intel Triton core-logic chipset, the only chipset on the market at this time to support EDO DRAM (and synchronous DRAM as well).

EDO DRAM is based on traditional fast page mode DRAM, is pin-compatible to the old RAM chips, and does not require any substantial alterations to the system board architecture. A small modification to the DRAM chips themselves allows for data to be passed through them at increased rates, and a subsequent level of this technology, called burst EDO, uses a pipelining technique that allows an operation to begin before the previous one has been completed, adding additional speed.

Synchronous DRAM

Another new technology that could end up rivaling EDO for the first solid niche in the marketplace is synchronous DRAM (SDRAM). Current DRAM technology uses asynchronous row and column addressing, and the SDRAM concept simply alters this arrangement so that inputs and outputs can both be conducted with each clock cycle. Column access is also pipelined, allowing a second request to begin execution before the first has finished. The result is performance equivalent to 10ns SRAM, at prices comparable to traditional DRAM.

The Triton Pentium chipset from Intel contains built-in support for both SDRAM and EDO DRAM. With this level playing field on which to compete, it remains to be seen which technology takes hold in the marketplace.

Cached DRAM

Two other manufacturers have taken separate approaches toward essentially the same end. Mitsubishi's cached DRAM (CDRAM) and Ramtron's enhanced DRAM (EDRAM) each include a small amount of SRAM cache memory along with DRAM on each chip. They differ in that CDRAM uses a set-associative cache design, while EDRAM uses a direct mapped cache design. The SRAM portions of both chips have a refresh rate of approximately 15ns, and the apparent advantage held by CDRAM due to its superior cache design is effectively nullified by the faster refresh rate of the EDRAM chip (35ns versus 70ns). Both designs are estimated to cost $50 per megabyte.

RDRAM

While all the technologies previously listed have attempted to retain the basic memory architecture of the PC, a company named Rambus has designed a completely revolutionary scheme that uses the system's processor or a custom-designed ASIC to control the transfer of data over a proprietary Rambus channel in packets of up to 256 bytes. Twin 250MHz clocks and the ability to transfer data on both clock edges yields a maximum transfer rate of 500M/sec, equivalent to a 2ns refresh rate. This fantastic performance is strongly mitigated, however, by the need for major hardware design changes, which motherboard manufacturers will be very reluctant to adopt. This is likely to end up as a proprietary design which, if adopted by system manufacturers with sufficient clout in the industry, could be successful, but unless the innovations described earlier prove to be unusable, the performance improvement of RDRAM over EDO DRAM will not justify the investment of millions of dollars for new motherboard designs and manufacturing techniques.

I/O Bus Types

It is fairly easy to predict the trials and tribulations that the new memory technologies will go through before a definitive standard is realized. This is because the personal computer industry has seen the same process occur several times, most notably in designs for the I/O bus that were developed as a means of improving the performance standard set by the archaic ISA bus from the original IBM PC designs. It's difficult to believe that so many systems still are being sold today with high-powered processors like the 486 and the Pentium that still rely so heavily on this antiquated architecture.

The fundamental reason that these persistent difficulties plague the PC industry is not technological, but economic. The original PC was more of a marketing coup than a technological one. Many other computer designs existed that were as good or better than the PC, and in the same way, many bus designs have come and gone over the years, but they didn't last because they were proprietary systems. At one time, a giant conglomerate like IBM could create a standard just by making a decision, and this was how the ISA bus came about. When IBM said, "This is how we are going to build our machines," many other companies were more than willing to design, manufacture, and market expansion boards that would work in those machines. Unfortunately, this eventually became the problem rather than the solution. As technology moved forward, the old standards became inadequate, and new ones had to be developed that were compatible with the old so that the investments made by other companies would not be lost. This is where IBM found out that they were no longer the sole trendsetter in the PC industry. This led to a battle of competing I/O bus standards that is being waged to this day, and although the players have changed, the game remains the same.

ISA

The Industry Standard Architecture (ISA) I/O bus was developed for the original IBM PCs. Using early Intel 8088 microprocessors that ran at 4.77MHz, this 8-bit-wide bus was designed to run at the same speed. Indeed, it functioned more as an extension to the processor itself than as a separate subsystem. The processor's clock was the centralized timing source for the whole system, forcing even simple data transfers that required no calculation to be routed through the processor. In essence, this was a true local bus, with the expansion slots hardwired directly to the microprocessor's I/O pins.

It became apparent quickly that this would not do. Within months, processors were being manufactured that ran at unheard of speeds, like 6MHz or even 8MHz. The expansion cards designed by other manufacturers for the original PC bus would not run at these increased speeds, so it became necessary to unlink the bus from its direct connection to the processor clock. In order to relieve the processor of some of its routine tasks, a separate controller was added, called a direct memory access (DMA) controller. The original DMA controller in IBM machines ran at 5MHz and could access only 1M of memory, which was the addressable limit of an 8-bit architecture. These early systems still had a bus speed roughly equivalent to the processor's clock speed, but they were two separate clocks, and the bus was no longer dependent on the processor for timing information.

By the time IBM released the PC-AT in 1984, the company also had chosen to expand the existing bus design to accommodate the new Intel 80286 processors. They added an additional connector to the original 8-bit slot (see fig. 5.6) that allowed for a 16-bit data width while retaining compatibility with earlier designs. Still running at 8MHz despite processor speeds of 10MHz or 12MHz, the AT bus could now address up to 16M of memory, the limit for a 16-bit system, and had a theoretical maximum data transfer rate of 8M/sec. This figure, of course, represents at least double the actual transfer rate that would be achieved under normal conditions. Factor in the need to accomodate the 32-bit wide data path of today's microprocessors, and the result is a bus that is generally inadequate for many of today's computing needs. Nevertheless, this is the same ISA bus that is still used today in the vast majority of the world's PCs.

Figure 5.6 This is an adapter card's 16-bit ISA bus connector.

For use in a network file server, the ISA bus should not even be considered, except as a secondary bus type used for non-networked resources such as video adapters. ISA expansion boards still use DMA transfers, and still cannot address memory above 16M. Furthermore, they force the processor to spend part of its time operating at the bus's top speed of 8MHz and still use a 16-bit data path, so that the performance capabilities of the high-speed 32- or 64-bit processor are largely going to waste.

As we consider alternatives throughout the rest of this section, bear in mind that any of them is preferable to the ISA bus for use in a file server. I don't mean to condemn it out of hand, for the ISA bus is still a valid choice for workstation use and has led a surprisingly long and useful life in a highly volatile industry. The primary function of a file server, however, is data transferówhen I see someone spend a great deal of money on a server with the latest processor and fastest drives and then run them through a 16-bit SCSI card and wonder why it's no faster than the old machine, I have only one answer.

MCA

When IBM realized the ultimate inadequacy of even their expanded AT bus, they set about creating a new bus standard that would be able to accommodate the continual improvements being made in other components of the PC. Still the dominant player in the desktop PC market, they proved their skills by developing a bus type using what they called Micro Channel architecture (MCA) for the PS/2 systems that went to market in 1987.

MCA was remarkably visionary in its design, capabilities, and expandability, but IBM proceeded to let the euphoria over this accomplishment go to their heads, and made two drastic mistakes that forever prevented the MCA bus from becoming an industry standard. First, they decided that their requirements for the new bus overshadowed the need for backward compatibility with existing expansion cardsóISA cards cannot run in MCA slots. Second, they implemented a licensing scheme that attempted to force other manufacturers to pay retroactive royalty fees on the ISA bus technology in order to market systems using the MCA bus.

This caused enough of an uproar within the industry that a "gang of nine"óother PC manufacturers, usually cutthroat competitors with each otherógot together to develop their own standard, which resulted in the EISA bus. As a result, the MCA bus never enjoyed the popularity that it deserved outside of IBM shops, and far fewer MCA expansion cards were made by third-party manufacturers than EISA and ISA cards. IBM continued to market and maintain MCA as a proprietary standard for nearly a decade, and only recently announced that they would discontinue further development of MCA and gradually phase out its use in favor of PCI (see the "PCI" section below). This is not due to unviability of the standard or low quality of resulting products, but solely to antiquated business practices of a former industry leader now relegated to a position farther back in the crowd.

The most obvious improvement of the MCA bus was its 32-bit data path. Designed to accommodate the new 80386 processors that were just being released, the address path was also widened to 32 bits, allowing a full 4G of memory to be addressed. However, 32 bits are not required. The bus reads a signal from an expansion card that informs it of the card's capabilities. Thus, while MCA cards can be 32-bit, they do not have to be. This should be considered whenever purchasing Micro Channel adapters. The Adaptec AHA-1640 Micro Channel SCSI host adapter, for example, is a 16-bit MCA card that suffers from the same 16M memory limitation as an ISA adapter using DMA does.

From an installation standpoint, MCA expansion cards are well ahead ahead of ISA, although they do not approach the Plug and Play standards that we are striving for today. A reference disk containing the configuration program is needed, but gone are the jumpers and DIP switches that plagued the older technology. Just plug an MCA card into the slot, and the hardware aspect of the installation is finished.

The MCA bus runs at 10MHz, not blazingly fast by today's standards, but with a slight edge over ISA and even EISA systems. The primary innovation of the bus that made it incompatible with previous designs, however, was hardware-mediated bus arbitration. In a remarkably foresighted move, IBM created a bus that would be quite capable of supporting today's multiprocessor systems. The MCA bus can arbitrate between eight microprocessors and up to eight other devices, such as DMA controllers or graphics coprocessors directly at the hardware level. This is done through the inclusion of a controller on the bus called the central arbitration point (CAP) that determines which device in the system gets control of the bus at any given time. By utilizing a hierarchy that gives added weight to more critical processes (such as memory refreshes or NMIs, for example), the CAP is able to satisfy the needs of all its "customers" without monopolizing the system's microprocessor. For this to function properly, however, it is necessary for all devices involved to have comparable arbitration circuitry.

The bus also utilizes level-sensitive interrupts, which are incompatible with the edge-triggered interrupts of the ISA bus. Level-sensitive interrupts can be shared, while edge-triggered interrupts cannot, and level-sensitive ones can be sensed during the actual interrupt, while the AT bus interrupts can be sensed only at the moment that the interrupt request changes state.

Also included in the original MCA bus design was a burst mode that was a direct response to the extensive microprocessor overhead that is required for data transfers in the AT bus. Instead of the two step "addressing and mailing" process of the older technology, which required the processor to be involved in the transfer of every byte, MCA is able to transfer larger blocks of data, or bursts, without processor intervention by utilizing its own hardware to detect the status of the transfer.

Two years after the release of the initial PS/2 machines with the Micro Channel bus, IBM enhanced its specification to allow better performance. The original MCA design, running at 10MHz and using 32-bit words, had a maximum theoretical transfer rate of 20M/sec. Compared to the 8M/sec of the ISA bus, this is a vast improvement, but the Micro Channel 2 specification goes even further. While the original burst mode design was created to facilitate the transfer of random data bytes that required individual addresses, the enhanced MCA bus incorporates a process called streaming data mode that greatly enhances the transfer of sequential blocks of data.

It was recognized by studying the data access patterns of normal system use that larger, contiguous blocks of data are often transferred. Executables and large data files, for example, might reside in one large block on a drive, and have to be read into memory by the methodical access of each subsequent byte. In this situation, it is not necessary to transfer an address location for each byte, because the source and the destination are both a series of contiguous addresses. Streaming data mode eliminates the clock cycles devoted to the individual addressing of data bytes. Since a normal transfer utilizes one cycle for addressing and one for the actual transfer, the elimination of the former effectively doubles the maximum transfer rate, to 40M/sec. However, since the bus design allots 32 bus lines for addressing as well as 32 for the transfer, streaming data mode leaves address lines idle after the original address for the entire block is sent. MCA multiplexes the data transfer, using the 32 idle lines to double the throughput again to a 64-bit transfer, running at 80M/sec. Designed for use with bus masters and slaves, the best part of this extension to the MCA specification is that it is completely optional. Hardware that cannot support these techniques simply performs data transfers using the original method.

As you can see, when combined with hardware that can support its extended features, the Micro Channel architecture makes quite a robust bus that can more than adequately service the needs of the average server. Since IBM is now combining this technology in machines with the PCI busóand eventually phasing it out altogetheróit would be unwise to purchase a server that relies on MCA at this time. In addition, there never developed as wide a range of third-party hardware that supported MCA as there did for EISA and other bus types. If you already possess servers that use this bus, though, you can be sure that they will perform at least as well as EISA, and far better than ISA.

EISA

As stated earlier, the EISA bus was a direct reaction to the incompatibility and licensing practices of IBM's MCA bus. Therefore, the EISA design holds compatibility with existing ISA adapters as its highest priority. Beyond that, the standard amounts to a collection of most of the desirable attributes of the Micro Channel bus and other proprietary standards. Perhaps due in part to the fact that they had the MCA bus as a model to work from, the committee successfully created a standard that has endured for many years. Although not as fast and capable as some newer emerging standards, such as VLB and PCI, EISA definitely is thoroughly tested, completely reliable, and well-suited for use in the average server. I do not hesitate to recommend purchasing an EISA-based server, as EISA promises to remain a viable standard for some time to come.

One of the cleverest aspects of the EISA bus design is that of the connector itself. Unlike an MCA slot, into which an ISA card could not hope to be inserted, an EISA slot has connectors of two lengths. The upper level of a connector corresponds exactly to the edge connector of a standard ISA adapter. This allows all existing ISA cards to be plugged into an EISA slot just as though it were an AT bus. Adapter cards designed to the EISA standard, however, have shorter connectors that plug into the same ISA-compatible slot, as well as longer connectors that plug into a series of contacts located deeper in the EISA slot. Special cutouts on EISA cards allow them to be fully inserted into the slot at both levels, while ISA cards are blocked from extending down to the EISA connectors and possibly causing a short circuit. Thus, the additional advantages of the EISA bus are made available without utilizing any more motherboard real estate than that taken by an ISA card.

Of course, as with MCA, EISA has taken care of the primary requirement of expanding the bus to use a 32-bit data path. 32 address lines are used, allowing the bus to address the full 4G of memory that 32-bit processors like the 386 and 486 can support. The EISA specification also takes great pains to accommodate the varying power needs of the large array of ISA and EISA adapters that are to be supported. The design calls for 45 watts at four possible voltages per EISA slot. This is far more than most ISA or EISA cards will ever need, which is lucky, because a fully populated EISA bus containing eight adapters, each using the maximum amount of power available, would require a power supply of over 300 watts for the I/O bus alone, discounting the needs of the motherboard, processor, and other devices in the machine. Obviously, this is more than the average PC power supply can furnish, but it is another indication of the designers' forward-looking intentions for the EISA bus.

In the interest of backward compatibility, though, it was necessary to keep the speed of the EISA bus the same as that of the AT bus: 8.33MHz. This speed amounts to one-fourth of 33MHz, the highest processor speed available at the time. This is because, when running in its basic mode, EISA is essentially a synchronous bus, coordinating its speed with that of the system processor. EISA, however, like MCA, has special bus mastering capabilities that greatly enhance the speed of data transfers under certain conditions. Bus mastering is a system in which an adapter contains its own processor for managing data transfers, thus reducing overhead and improving performance. This prevents the adapter from having to add to the burden of the system CPU or rely on the motherboard's DMA controller for processing. The EISA counterpart of Micro Channel's CAP is called the Integrated System Peripheral (ISP) chip.

Aside from managing the transfer of bus control between the various devices in the computer, the ISP chip allows for a compressed transfer mode that uses a special timing signal that runs at double the normal bus speed, as well as a burst mode. Unlike the burst mode of the MCA bus, though, EISA's furnishes an address with each byte, allowing for transfers of non-sequential data in a single burst. Overhead is reduced by transferring only the 10 lowest bits of each address. This limits the contents of a single burst to bytes that reside with a block of 1024 32-bit double words. Each burst is also limited to either read or write dataóthe two cannot be mixed. Using burst mode, a theoretical transfer rate of 33M/sec can be achieved (although, to swipe a disclaimer from the automobile ads, your mileage may vary).

Some EISA adapters utilize bus mastering and others do not. Those that do, usually offer increased performance and, not surprisingly, carry a higher price. Also, not every EISA slot is capable of supporting a bus mastering card. As always, consult your motherboard documentation before making a purchase decision.

EISA also expands considerably on the capabilities of DMA transfers. In addition to supporting the capabilities of the standard DMA transfer from the AT bus, EISA's DMA controller allows access to the full 4G of memory supported by the specification, instead of just 16M. Three additional DMA transfer modes are also added in the EISA specification, reducing the number of clock cycles needed to perform a transfer. The original ISA bus required eight bus clock cycles for each 8- or 16-bit transfer, most of this being superfluous. EISA type A transfers eliminate two of the eight bus cycles and type B transfers eliminate four. These modes are designated for use by legacy ISA adapters, many of which have the capability to run at one or another of these accelerated speeds. Type C transfers, synonymous with the burst mode described above, are designed solely for use by EISA adapters, and eliminate all but a single bus cycle from the sequence, with the same 1024 double word address limitation.

The ability to recognize and translate between different bus widths is also built into the EISA specification. An EISA bus controller is responsible for breaking down 32-bit transfers into two or four sequential streams, 16 or 8 bits wide respectively, so that no data is lost when transferred between devices of different bus widths.

One of the design decisions that made the Micro Channel architecture incompatible with the AT bus was the use of level-sensitive interrupts that could be shared by several devices. The 15 edge-triggered interrupts of the ISA specification (which already had been raised from eight on the original PC) were clearly insufficient, and the prospect of adding more became overly complex and did not seem to be a good overall solution. The quandary was resolved by making each interrupt in an EISA machine individually configurable to edge- or level-triggered operation. The same limitations of the ISA design still apply. Edge-triggered interrupts cannot be shared with other edge- or level-triggered interrupts. However, EISA cards effectively can share level-sensitive interrupts with other EISA cards.

Installing EISA expansion cards into a PC is not quite as simple as installing MCA cards, but is still a great improvement over installing ISA cards. Rather than having a number of possible memory addresses that can be used by any adapter in the system, the EISA specification calls for each slot to have an assigned range of addresses that can only be used by that slot. In addition, all EISA cards are assigned a product identifier that makes them uniquely recognizable to the system. Configuration is performed through the use of software (although DIP switches and/or jumpers are still present on many EISA cards as a backup), with settings saved in the CMOS system ROM, which has been enlarged for this purpose. Usually, it is necessary only to identify the slot into which a particular card has been inserted, and the software is able to manage the selection of all other necessary parameters.

It took several years for EISA to become as stable a standard as it is today. Software support for its full capabilities was slow in coming, and the extra circuitry involved in building an EISA machine raised the price of a PC by as much as $1000. Both of these problems have long since been overcome, though. An EISA bus adds approximately $300 to the price of a PC today, and full software support, as well as a wide range of expansion cards, has been available for some time.

The next sections examine two newer bus types that significantly expand on the capabilities of those already considered. These up-and-coming bus types are very new technology, however, so while they hold great promise, there are definite indications that all the kinks are not yet worked out. As stated earlier, your file servers are not a place to experiment. For this reason, you should make a detailed study of every aspect of the new local bus types before purchasing a server that relies exclusively upon them.

Continued...