Chapter 1

The WWW and the Intranet


CONTENTS

The World Wide Web and the Intranet are closely joined but there are some differences. The WWW is designed to be used by everyone while the Intranet is used primarily for company employees.

This chapter describes the underlying technology used by both the WWW and the Intranet. We will also discuss how they work and what can be done to make your Intranet more usable.

In this chapter, you will learn:

World Wide Web versus Intranet

The growth of the Internet in the past few years has been explosive, in fact, the growth has been so rapid the exact number of users is still not known. While the media has touted the Internet as the greatest computing invention ever, Intranets, internal Internets usually complete with WWW servers and mailing lists, are the big hit in many companies. Intranets are easier and cheaper to set up and maintain than proprietary systems.

A recent survey by "International Treasurer" shows that 40 percent of U.S. multinational companies have an Intranet. Other studies estimate that one out of four Fortune 1000 companies have an Intranet, and two-thirds of all large companies have Intranets.

NOTE
The term Intranet refers to an internal network designed to be used by company employees. This network commonly consists of a WWW server but can also be made up of other servers such as usenet servers, FTP servers, database servers or other applications. In the text, we may use the term Web server to mean Intranet server.

IP Networks

Intranets are IP networks, like the Internet, designed to be used inside of a company. IP networks are already in use in many companies because they are based on an open standard. This allows many software companies to adopt IP technology and incorporate it into their products. In fact, IP networks have become so popular in the past few years, companies who developed competing standards, such as Novell and Microsoft, have given in to the market's demand and added IP as a networking option.

IP networking is now included in Microsoft Windows 95, Microsoft Windows NT, OS/2 Warp, Warp Server, and Novell Netware. It is also available for
Windows for Workgroups, Windows 3.x and Macintosh clients. IP networking has been in almost all UNIX software for years, and it would be hard to find a UNIX vendor that shipped a current operating system without IP networking.

TCP/IP and UDP/IP
IP is used in many applications such as FTP (File Transfer Protocol), NFS (Network File Sharing), SMTP (Email), and rlogin (Remote Login). IP networks use two main types of networking, Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Most user programs such as telnet, rlogin, or FTP use TCP layered on top of IP, hence the popular name TCP/IP networks. UDP is used mostly in file systems, such as NFS, and name servers, such as DNS.
In addition to TCP and UDP, there is a third called ICMP for Internet Control Message Protocol. The most popular user program to use ICMP is ping. Ping tests network connections between two hosts

IP is unique in several ways but one of the most important things about IP is the fact that it is a layered protocol, see figure 1.1. IP networks were designed to be used in a heterogeneous network and can be run on any type of machine and over almost any type of network topology. Whether a company uses Token Ring, Ethernet or almost any other network topology, IP can work with it.

Figure 1.1 : IP networks are layered to be more flexible.

Since IP networks are layered they also work over WAN links such as 56kbps dedicated lines, or T1 lines. This feature also allows companies to use IP between corporate sites. So IP works very well as a LAN and a WAN transport.

IP also works over common dial-up lines and can be used to easily allow remote users to dial in using a modem. This connection is usually a PPP (Point- to-Point Protocol) link or SLIP (Serial Line IP) connection, see figure 1.2. Software exists for almost any computer to use PPP or SLIP to connect to a LAN.

Figure 1.2 : IP can be used over modems to allow remote network connections.

HTTP

HTTP, which stands for HyperText Transfer Protocol, is what allows the World Wide Web to communicate. Like IP, HTTP is also an open standard and any computer that talks TCP/IP can talk HTTP, assuming someone has written the software.

HTTP is a connectionless protocol, which allows many quick connections without having to hold ports open. With WWW traffic originally grabbing only a few Kbytes for each page, and then often connecting to a different server, this design decision made sense.

Connectionless protocols have one flaw in a commercial environment; they are stateless. This means users can't be tracked through a site since they open a new connection each time to get a page. For many connections to the same site, having to set up and break down a new TCP connection for each page can cause a significant amount of overhead.

NOTE
Although we use the term page to refer to a single access, there may be multiple connections to an actual viewed page. Each hit actually uses one connection. A hit is a file access, whether that file is text, image, or sound, and requires a TCP setup and breakdown to occur.

HTTP allows two way communication, which allows the browser to send information to the server as well as vice versa. The specification also can allow content negotiation between the server and browser though this is not currently in use.

Commands

The HTTP supports several commands. These include, GET for retrieving pages, HEAD, which gets the headers of a document, POST for sending in information, PUT for placing information on a server, DELETE for removing a page, and TRACE for debugging.

These commands are part of the HTTP 1.1 specification. The most popular commands are GET, HEAD, and POST. DELETE and TRACE are rarely used.

Headers

The HTTP specification specifies different headers that are required to be sent. The Web server and browser use these headers to pass information back and forth. Some servers will allow the Web developer to read the header and use information in it to make decisions based on this information. Some of the headers include:

These header fields are the most often used ones. There are other ones as well. For more information on headers, consult the HTTP specification.

Basic Authentication

The HTTP also allows for basic authentication. This allows the server to require a username and password for each security domain defined. Normally, a server would have only one security domain but it could possibly have multiple domains.

Clients normally store the username and password for each domain during each session. This keeps the user from having to type in this information for each access.

Basic authentication is a fairly simple procedure; when the client requests a page, the server sends back a reply saying it needs to be authenticated. The client then pops up a username and password prompt to the user. Once the user enters the correct information, the client resends the request with the username and password included. The server then decides if it is the correct password or not. If not, it returns a message saying the password or username was wrong. If the username and password are okay it returns the document.

Caution

CAUTION
Basic authentication does not allow for encrypted usernames or passwords. This would allow a system cracker to sniff the passwords over the network if they were attached to a network that carried the traffic between the client and server.
If this is not enough security then a stronger protocol such as Secure Sockets Layer (SSL) or Secure HTTP (S-HTTP) is required. SSL and S-HTTP are covered in the section named "Secure Protocols."

Content Negotiation

Content negotiation is also specified in the HTTP specification. This allows clients and servers to negotiate on file formats, languages or other specifics.

Content negotiation is one of the important but underused aspects of the HTTP. How it works is like this: The client requests an image and sends a list of what formats it can handle. The server looks at the list and returns the best format for the browser.

Client negotiation also allows servers and clients to agree on different languages or multimedia formats.

NOTE
Currently, no browsers support content negotiation and scripts are often written to decide which features a particular browser should get.
This makes it harder for Web developers and could be avoided if browsers would support the standard for negotiating content.

Caching

Since the WWW can be used across the Internet, the performance can be greatly improved by using caches to store recently retrieved documents. Caching must be done carefully since Web documents change frequently.

Caching can also be used in an Intranet to help reduce network load. Many networks are made up of multiple segments. Some segments are connected via fast links others may be connected over slower links. Caching can help to reduce the amount of traffic going across the slower links and also help reduce total network traffic.

The HTTP specification contains many headers that are used to determine if a document in a cache can be used and for how long. These headers are discussed in the header section.

Caching works like this: the client requests a page from the proxy server. The proxy server checks to see if it has the page in its cache or not. If so, and it is current, it gets sent directly from the cache and never has to query the actual server for it. If it is not in the cache, then the proxy server gets the page and stores it in its cache.

There is also an expire header that tells cache servers how long to use a document without retrieving a new one. This is useful for dated documents and can also be used to automatically expire script documents.

Caching is very useful for reducing bandwidth for commonly accessed documents. Some browsers do caching both in memory and to local disk. Others only cache to memory or not at all.

Secure Protocols

Since HTTP is so flexible and popular, many companies are trying to use HTTP to transfer sensitive data. Examples of this data would be credit card numbers, purchase order numbers, or confidential data, such as personnel records.

Some studies show 80 percent of all computer attacks come from inside the company so it's easy to see why security would be important inside an Intranet as well as over the Internet. Plus, securing the Intranet makes it harder for someone to get information. Most companies have almost no security in place except for a firewall. Once an intruder gets past the firewall, they have full reign over what they can see and do. Having a secure Intranet helps to solve this problem.

HTTP, unfortunately, is not a secure protocol, it is prone to different attacks such as:

In order to make HTTP safe enough for confidential traffic to pass over a network, a means of encrypting and authenticating data connections is required. This is handled by using a secure protocol to transfer information between the server and client. There are currently two popular secure protocols in use.

One protocol is called SSL for Secure Sockets Layer. It is more popular since it can be used with protocols other than http. The other protocol is S-HTTP, for Secure HTTP. These protocols are discussed in the next two sections.

SSL

SSL or Secure Sockets Layer is a protocol that sits between HTTP, or another protocol, and the TCP/IP stack. It allows secure connections using digital certificates. It allows for authentication, encryption, and data integrity.

NOTE
Authentication allows clients and servers to be sure who they are talking to.
Encryption ensures no one else can read a document or eavesdrop on a transaction.
Data integrity makes sure the document hasn't been altered during transmission.

SSL is available on the Netscape Enterprise Server, but not the Netscape FastTrack server. If you need secure transactions, you have to purchase the Enterprise server. SSL is also available on Microsoft IIS and a version of Apache called ApacheSSL.

CAUTION
SSL servers only work with browsers that understand SSL. If a browser doesn't understand SSL, it can't communicate with an SSL- secured server.

How SSL Works

SSL requires a digital certificate. This is an encrypted piece of data that contains specific certificate information such as the name of the server, the server's public key, the expiration date and the name of the Certificate Authority (CA).

The other part of the digital certificate is the signature. The digital signature is an unforgetable piece of data proving the certificate has been signed by the certificate authority. The certificate authority is a server that is known and trusted by many other servers. The CA is used to verify the relationship between a server and its public key.

To obtain a certificate you must generate a public and private key. The private key is stored and kept secret and the public key is sent to the CA along with proof of the servers identity and some other information such as:

The CA then generates a digital signature for the server and sends back a signed certificate. This certificate is then published or attached to messages that the server sends out.

NOTE
Anyone can see this certificate since it contains the server's public key. Any documents that encrypt using the public key in the certificate can only be read by using the private key. The public key in the certificate can't be changed without ruining the signature.

Users can then verify the certificate using the digital signature and the public key of the CA who signed the certificate. Once the certificate has been verified, the data inside the certificate can be trusted.

In use, the client will send a connection request to the server. The server will return a signed digital certificate. The client then authenticates the certificate using the digital signature and the public key of the CA.

If the certificate is not authentic the connection is dropped. If it is authentic then the client sends a session key and encrypts the data using the servers public key. This ensures only the server can read it since decrypting requires knowing the servers private key. This is why the private key must be kept secret.

Once the server has the session key, it can use this to encrypt and decrypt data with the client. Since the data sent between the client and server is encrypted, it can't be read by anyone else.

Using SSL

To enable SSL you must do the following:

  1. Generate a keypair. Netscape servers can generate a keypair by filling out a form in server manager. The form is located under the configuration area under "Generate a key." On IIS, you can use the command keygen. The syntax for keygen is: keygen password keypair.txt certreq.txt "C=US, S=Massachusetts, L=Boston, O=org, OU=Sales, CN=www.company.com". The information in quotes corresponds to the information required by the CA as shown in the list above. With ApacheSSL you use the command "req" with the newkey option.
  2. Generate a certificate request. Netscape servers have a form under Server Manager that must be filled out to request a certificate. It is located under the link labeled "Request or Renew a Certificate." IIS and APacheSSL automatically generate a certificate request when creating a keypair.
  3. Install the signed certificate. Once the certificate comes back from being signed you need to tell your server where it is located. Netscape can be told from the Server Manager under the "Install a Certificate" link. IIS uses the setkey command. The syntax is: setkey Password keypair.txt certif.txt. Password and keypair.txt were used in the keygen command. certif.txt is the file that we saved the signed certificate in. ApacheSSL needs to be told via the SslCertificateFile directive in the httpd.conf file.
  4. Add SSL security for the server. Netscape again uses the Server Manager to add this. Select the link "Activate Security and verify ciphers" and fill out the information. This includes the path to the files we used, the port (default is 443), and which ciphers to use. IIS needs to be told to use SSL in the Internet Server Manager. Select the directory and click the "Requires Secure SSL" box.

S-HTTP

Secure HTTP is a version of HTTP that provides secure transactions. It allows for Data integrity, encryption, and authentication. Secure HTTP or S-HTTP, unlike SSL can be used in conjunction with http. It does this by recognizing the fact that the browser can't handle secure transactions and either refusing to communicate or by notifying the user that the transaction is insecure.

S-HTTP allows browsers to perform several cryptographic functions.

How S-HTTP Works

The default cryptographic behavior is for the client to encrypt traffic but not sign it. The server transmits in the clear. The idea behind this default is to allow the client to encrypt payment information, such as credit card numbers. The server normally responds with a page saying the order has been placed. This is not considered sensitive enough to be encrypted.

Normally, when a client receives a form that needs to be securely submitted, it will also receive the Distinguished Name (DN) of the server. The client can tell if it is a secure form by the Action form. If the Action URL has HTTP as the protocol it does not get encrypted, if it is S-HTTP then it needs to be encrypted. The DN allows the client to look up the servers public key.

Once the client has the server public key, it can encrypt the message using that. However, using that encryption key would be very slow, so what the client does is generate a symmetric key, which is faster to decrypt and more secure. The client then encrypts the message with the symmetric key and then encrypts the symmetric key with the servers public key. Since the message can only be decrypted using the private key, only the server can get the symmetric key.

Once the server receives the message, it decrypts the symmetric key and then the rest of the message.

The Webmaster can also force the return document to be encrypted, authenticated, or signed. This is done using the Privacy-Enhancement header. This header can force the server to encrypt, authenticate, or sign.

The client can also be requested to sign, authenticate, or encrypt by adding a CRYPTOPTS block in the FORM tag. The CRYPTOPTS can have a field defining how the form should be submitted. The syntax would be CRYPTOPTS "Shttp-Privacy-Enhancements: recv-required=sign." The client could also be asked to authenticate or encrypt by separating them by commas.

When signing a form submission, the client is required to have a digital certificate. This digital certificate is used the same way that SSL uses it. The client attaches its signed certificate. When the server receives that certificate it verifies its authenticity. If it is authentic then the server can be assured it was really the client that submitted it.

The same is true when the server signs a document. The client verifies the signature and if it is authentic can be assured the server really sent it.

Secure http can also handle out of band key exchanges. This allows a client and server to have a shared secret that has been pre-arranged either by phone or other media. This allows for faster encryption/decryption then public key encryption.

Using Shttp

Shttp isn't as popular as SSL and very few browsers and servers support it. One of the browsers is Secure Mosaic, and a server that supports shttp is Secure NCSA httpd. More servers and browsers may be coming out that support shttp, but most companies seem to be more interested in SSL.

You can get more information on Secure Mosaic from http://www.commerce.net/software/SMosaic. More information for Secure HTTPD from NCSA is available from http://www.ncsa.uiuc.edu/.

HTML

HTML or HyperText Markup Language allows browsers to display documents based on the logical layout of the document. It does not allow exact formatting since different screens would display things differently. Instead, HTML allows you to describe what the document is and allows the browser the flexibility to display the text however it looks best on the screen. This allows, for example, a employee handbook to be created that can be read on any size screen on any type of machine.

One disadvantage of HTML is the fact that the formatting is not as exact as it would be using a programming language such as Visual Basic, or a word processor such as FrameMaker. Newer HTML standards are working on this issue and much more can be done using HTML 3.0 tags and external viewers, than could be previously done. HTML 3.0 is still being defined and may still have changes before it is the official standard. The current HTML standard is 2.0.

HTML also allows hypertext links, or hyperlinks, in the article. These special areas in the text tell the browser to get a different document to display. This document however doesn't need to be text, it can be pictures, sounds or video.

Open Standards

Open standards and Internet technology can also be used to make corporate networks more efficient, in fact, many companies are doing exactly that. Using Web technology for internal use is a natural evolution for several reasons:

In the "Why Intranets Make Sense" section we will discuss in detail why companies are finding Intranet technology so useful.

What Are Some Intranet Applications?

Intranets are used in many different types of companies, from high tech computer companies, to real estate firms to oil refineries. Intranets aren't just useful in large companies either, smaller companies can gain a competitive advantage by using their Intranet to deliver information to the right people faster. Intranets can also make development time quicker and can help save costs.

This section will cover some uses of Intranets and give some examples of what can be done using Intranets. Different companies will have different needs and this is by no means a definitive list of what can be done. It is merely a list of ideas that may fit in with your company, or help to give you new ideas of what can be done.

Almost all Intranet applications fall into three main categories:

Publishing Applications

These are the usual first steps in creating an Intranet. These applications are easy to setup and may not even require a WWW server.

NOTE
It is possible to view simple static HTML files without a WWW server. Most browsers allow you to open a local file and view it. The name local file simply means it is not delivered over http. The file can be remotely mounted, or shared from a server.

Document Repository

Intranets were originally used to allow groups of people to share documents. This make sense since that is what the WWW was originally developed for. Using a Web server for a central document repository allows employees to quickly locate documents and save time searching for a particular piece of information. Also, since the documents are all stored in one place, changes are easier to make, plus changing one copy changes everyone's copy.

Since HTML documents don't need to be simply text, companies can also store graphics, audio, or even video. This allows almost any type of information to be centrally located and easily found.

Documents may include, employee handbooks, company newsletters or policies, design specifications, phone or e-mail lists, manuals, or job postings.

Bulletin Boards

Most companies have bulletin boards where company notices are placed and employees can place "for sale" items. Bulletin boards converted to HTML allow employees to get to a central bulletin board from their desks. This is helpful if the company is spread among different building or countries.

Having a Web server act as a bulletin board also makes sense for other reasons, searching capabilities and ease of maintenance. Since HTML is text- based, it is easy to set up a search program. This would allow employees to easily search for items of interest, for example "Car for sale."

Since the bulletin boards items would be files on a Web server, standard tools can be used to remove them after a certain period of time. Plus, since they are protected by the operating system, people can't simply remove them or change them.

Workgroup Server

The Web can be used to share information between group members as well as between different groups. For some companies, it might make sense to have each workgroup have an area where they can discuss what they are working on, who they are and what they know. This can help groups get to know each other and can help find out who knows about a particular type of software or component.

Workgroup servers can also be used to track project status. This can be done by having a team leader make changes to a page containing project schedules. Workgroup servers can also be used to store design specifications, notes, memos, or other information useful to the group.

Workgroup servers can also be used to introduce new team members and could be a central area for them to gather information on what has been done.

NOTE
Some of these applications actually could also be considered discussion or interactive applications.

Group Bookmarks

The Internet has many useful Web pages but they aren't always easy to find so setting up an area pointing to external links often allows users to quickly find information they are looking for. This bookmark page is often how Web sites start, one person may be designated as an Internet Researcher who builds a directory of useful pages for others to use.

Discussion Applications

Intranets can do more than simply store documents, they can also be used as a front end for group discussions. Different ways to do this could be as simple to set up as writable pages or as complex as real-time chat servers.

Usenet News servers also work nicely to facilitate communication between group members, or different departments. News servers are readily available, and most browsers allows reading of news without the need for a special program. Mailing list software can also be used to allow groups to communicate together.

Discussion lists can include many topics from design specifications to the best restaurants in town. Some discussion topics can include software products, engineering issues, outages and problems, corporate policy, and general discussion. Since lack of communication is one of the biggest problems in many corporations, allowing groups to communicate is one of the important uses of an Intranet server.

Good communication can save money in many ways such as:

Interactive Applications

Interactive applications are the applications that do work. These applications are used to query or search databases or to view what is happening on the network. You can, for example, see which machines are busy or what is running on a server. Interactive applications are handled by using CGI or Java or another programming language. These topics are covered in detail in Part III "Writing HTML for the Intranet."

Standard User Interface

Using HTML as a standard front end to existing software will allow users to use any type of machine to access the system. Since HTML makes things look the same on different machines, users will be more comfortable using different types of hardware.

Creating HTML front ends also makes it easier to develop sophisticated looking applications without having to learn a complicated programming language such as C++. Reducing the time it takes to develop applications means more work gets done.

Central Form Submission

Many companies have different forms for different requirements. When a new person starts, his manager usually needs to fill out a new user account form, a request for a network drop, a request for a phone and other forms.

HTML allows forms to be built and accessed via a WWW browser. Creating a central form area can make getting equipment or services much easier and in some cases automatic. Even if the form can't be handled automatically, having it submitted electronically can help eliminate forms getting lost or misdirected.

CAUTION
Acting on data received from an HTML form can be a time saver but it can also be dangerous. Automatically creating user account or making other changes may save time but it can also cause many problems if used improperly. Data of this sort should be forwarded to an administrator who can verify it and then act on it if required.

Almost any form can be converted to HTML and placed on a Web server. Some examples are, new user account forms, time off requests, equipment request forms, and problem tracking forms.

Development Platform

With the introduction of Sun Microsystem's Java to the Web, developers can start building applications that are cross platform and distributed. Java allows the same piece of code to run on any machine that has a Java virtual machine ported to it. Examples of such machines are Solaris 2.x, Windows NT and Windows 95, and soon to be Macintosh system 7.x.

Building Java applications or "applets" allows developers to not only develop the same code for multiple platforms, it is also true distributed computing. The processing is done on each client machine, not on the server. This allows the server to be dedicated to I/O applications instead of CPU processing and can help reduce costs. Adding Java applets can increase network traffic since each time someone uses an applet they must download it again. However Java applets can also reduce traffic by eliminating the need for a client to send all processing requests to the server.

System Status Tools

Different operating systems have different ways to query print queues and other system-specific information. By creating an HTML front end to these applications, any user on any system can easily check out what is going on, without having to know the correct command or syntax.

Almost any command that can be done from a command line can easily be converted to an HTML page. Some examples of status or monitoring tools are:

Why Intranets Make Sense

This section will cover why it makes sense to create an Intranet server for your company. With Internet servers, many companies decide they need to get on the Net because everyone else is. Whether this is reason enough to create an external Web server is not known but publicity is a reason for some companies to develop an Internet presence. Intranets however are not created for publicity reasons and, if done properly, no one outside of your company will even know you have one. Therefore there must be a better reason to create one. This section will discuss some reasons to integrate an Intranet into your existing or new network.

Using the Intranet to Merge Technology

Many software products work with specific operating systems. Microsoft BackOffice works great with NT servers or other Microsoft products but not as well with UNIX machines. Likewise for Novel's Groupwise product. It works really well on a Novell network but when integrating with other platforms can be troublesome.

Intranets, however, can integrate with Microsoft products as well as products from Novell, Lotus, or UNIX vendors. Intranets can be used as the middleware to allow you to merge your disparate systems into one homogenous computing environment, see figure 1.3.

Figure 1.3 : Using the WWW can allow users to talk to different types of systems.

Companies with multiple databases such as Oracle, Sybase, or Microsoft SQL server can use the Web as a front end for all of them. Users no longer will need to learn syntax for extracting data from each one or even where it all comes from.

Even different types of file servers can be combined into one seamless Web of information. Web servers can run on Netware servers, NT servers, or UNIX servers and can be set up to allow users to get information from them without knowing what type of server they are talking to.

There are other products that allow communication between different types of systems but none are as universally accepted as the combination of HTTP and HTML.

Intranets can allow many different systems to talk the same language. This can lead to many benefits including:

Saving Money with Intranets

Intranets are cost effective. The Web server and browser markets are very competitive, making prices and features very competitive.

Web browsers range in price from nothing for Mosaic or Microsoft Internet Explorer, to under a hundred dollars for Netscape or Lotus Internotes. Web servers such as Apache or IIS are free; others, such as Netscape Enterprise server or Netware Web server, are under a few thousand dollars.

Browsers and servers are also available for almost every platform in use today. This means existing servers can be utilized, saving both hardware costs and management costs. These costs must be considered when comparing Web servers to other technology since the costs to administer a new server can add up quickly.

Programmers can quickly be taught to adapt to programming for the Web since the biggest change is in how data is presented. Data from a Web page is usually passed in environment variables or on the command line instead of read from the keyboard. Output must also be formatted to meet the HTML specification. These are trivial changes and most programmers should be able to pick this up in a short time.

Many converters and other useful utility programs are free for downloading over the Internet. These include programs to convert word processor files or spreadsheets to HTML. Other utilities are link checkers, to make sure that links point to another page, and HTML editors, to make creating HTML pages easier.

Expandability

Web servers can start out small, with just a few pages, and expand to several thousand pages very easily. Since HTML allows links to point to pages on separate servers, it is easy to add a new server when one gets to busy.

This allows groups to have separate departmental servers and have a central corporate server act as a main directory. This allows great scalability, limited only by network bandwidth and the amount of machines available.

The fact that Web servers can start out small allows even small companies to try using Intranets and expand only as needed. It also allows large companies to grow as large as needed without expensive per user charges (see figure 1.4).

Figure 1.4 : Using Web servers allows unlimited scalability.

User Friendliness

World Wide Web technology is designed to take complicated tasks and make them accessible from one user interface. The original WWW browser, Mosaic, allowed users to ftp, read news, connect to gopher servers, and view HTML files all from the same GUI. This made it easy for non power users to get information easily.

Web browsers have only gotten better. In addition to everything Mosaic allowed WWW pioneers to do, new browsers allow posting of News articles, reading and sending e-mail, and many other things.

Using HTML as a front end for different processes makes it easy for users to learn new things since they don't need to relearn the look and feel of the software. It also makes users more at ease since they are familiar with the environment. Even on a different hardware platform the Web looks the same. Standardizing on a corporate browser makes it even easier since then the browser is exactly the same regardless of system software.

NOTE
Many new releases of programs are coming with either an HTML interface or HTML output. Online help is also commonly written in HTML to allow easy reading.

Reduced Development Time

Using CGI interfaces and HTML can help developers create front-end interfaces to complex programs quickly and easily. Best of all, if there is an HTML interface any browser can take advantage of it, reducing porting time and cost. Less time spent on GUI design allows developers to spend more time optimizing internal code.

Since the WWW is client-server, the company can take advantage of the benefits of this flexible environment without having to understand all the details of it.

Using Java can allow the company to go a step further than client-server to true distributed computing. Java downloads the applets to the client and uses the processing power of the client instead of the server. This greatly reduces the load on the server, especially if the applets are CPU intensive programs.

Java programming is more sophisticated than standard CGI programming since Java is a separate language. CGI programs can be written in any language the programmer is comfortable with, as long as they can output HTML and receive input from environment variables or the command line. Programmers familiar with object oriented languages, such as C++, will be able to fairly easily learn Java.

CAUTION
Not all browsers support Java so care must be taken when choosing a browser. This is covered in Chapter 8"Choosing and Configuring Browser Software." Java will also not run on all platforms, so care must be taken when deciding to develop applications in Java.

What Are Web Servers?

Web servers are programs that can understand and talk the HTTP. They are used to answer HTTP requests and respond with HTTP answers. A basic Web server can be used to perform any HTTP operation and return the correct headers and documents. More sophisticated servers though have many features that make it easier to server HTML documents.

Some of the more popular features include ways to allow the server to parse the files or run external programs, more sophisticated authentication such as using DBM files instead of text files, advanced logging features, or access controls to limit by IP address.

Server Parsed Files

In order to make something besides static HTML files it is required for the server to parse files or to run external programs. There are many ways this is done but the two most common are Server Side Includes and CGI programs.

Server Side Includes or SSIs allow an HTML page to include different things at the time the file is downloaded. This allows, for instance, programs to run and insert their output. One common use for SSI is to generate a header and footer. This allows you to customize the look and feel of a site by changing one or two files, instead of having to change every file.

CGI or Common Gateway Interface, allows programs to be run. These can be used to do almost anything any other program can do. The only real difference between CGI programming and other types of programming is the input and output the program uses. CGI normally passes input as command line options or environment variables. The output should be valid HTML.

Authentication

The HTTP specification allows for basic authentication, which allows the server to require a username and password. How these are verified is up to the Web server.

Most Web servers allow basic authentication to be done by verifying the username and password from a text file. This text file however stores the passwords in clear text. While no one should be able to see this file, it is possible to misconfigure and expose your passwords to the Internet. Searching a text file for a password is OK for a small amount of users, but when you start getting a few hundred it gets to be too slow.

More sophisticated servers allow other ways to verify passwords. Some allow you to use DBM password files, which are mush faster than straight text files. Others allow native lookups. An example of this is the NDS authentication that the Netware Web server uses.

Logging

Many sites need to know who is using their Web server. This may be used for billing or for helping to decide what features users are after. For whatever reason, logging features are one of the most important features a Web server can have.

Most Web servers now support the common log format. This logs many different things, including the client's machine name, the user ID and group ID (if authenticated) the date and time, the http request, the status of the request and the number of bytes transferred.

There are also other log file formats that may log more information. Netscape proxy server also offers extended log file formats and extended-2 log file formats. These include other details about the transaction such as communications between the proxy and server and transfer times. Apache allows users to create new logging modules to log any header information that they want.

Access Controls

Many Web servers allow the administrator to create access lists on what machines or what users can connect to different parts of the server.

This allows you to isolate your Intranet server from the Internet, based on IP addresses.

Administrative Interface

Web servers can get very complicated to administer and one of the important aspects of a Web server is the administration of it. Early Web servers required editing the configuration files by hand and knowing what all the options did and how they worked together.

Newer Web servers can be administered via a GUI. This GUI helps to keep you from doing things that could cause problems. It also makes it easier to change different options. Netscape's administrative interface is shown in figure 1.5.

Figure 1.5 : Netscape's server manager can be used to administer the server.

How It All Works

Web servers and browsers talk using the HTTP, but how do they work? This simple example will help to explain the process that happens when a browser requests a document.

In this example, we mentioned file types and helper applications. The next section discusses mime-types and helper applications in more detail.

Mime-types

Mime-types are used to tell a remote client what sort of file is being sent. The server usually has a file called mime-types.

This file acts as a reference to tell the server which files can be used with which applications. This is usually done by looking at the file extension. A sample mime-types file would look like:

application/postscript   eps ps
application/zip          zip
image/gif                gif
video/mpeg               mpeg mpg mpe
text/html                html htm
text/plain               txt

What the server does when it sends a document is attach a Content-Type header with the defined mime-type along with it. For example, if the server were sending index.html it would tell the client to display it as text/html.

The client also has a lookup table similar to the servers. It knows that it can display text/html in the browser without having to start an external viewer so it does. If however it has a file-type of video/mpeg and can't display mpeg video, it will look for an external viewer to use to display it, or prompt the user to save the file.

This allows great flexibility, especially in Intranets. For example if your company develops an application, say bar, that takes as input .foo files, you could set up a mime-type to handle them. On the server you would set up a mime-type that looked like:

application/bar         foo

On the clients, you would tell the software when it gets an application/bar mime type to use the external viewer bar. Now you have an easy way to distribute foo files to your company.