Chapter 3

Installing and Configuring HTTPD for UNIX


CONTENTS

If your site is running UNIX and looking for a free Web server, then you are probably looking for Apache. Apache is currently the most popular free server for UNIX machines. It is also a drop in replacement for the Web server from NCSA.

Apache runs on most versions of UNIX and comes in source code form so it can be easily compiled for other platforms. Having source code also makes it easy for a programmer to make changes to the program.

In this chapter, you will learn the following:

Apache

Apache is a high performance UNIX based httpd server. It is developed by the Apache Group and is available free of charge under the normal free software terms.

Apache is a drop in replacement for NCSA httpd server. If you are already running NCSA, you can simply compile Apache and replace the httpd binary from NCSA with the new Apache binary.

Apache comes in source form and can be compiled on many platforms such as, AIX, HPUX, IRIX, Linux, SCO UNIX, SunOS, NeXT, BSDI, FreeBSD, and Solaris.

There is also a version of Apache for OS/2. It is available from SoftLink Services (http://www.slink.com). There is also a version of Apache that supports Secure Sockets Layer, called ApacheSSL. It is available from Community Connexion (http://apachessl.c2.org).

Apache is currently one of the leading UNIX Web servers. NCSA httpd can be installed using almost the same instructions as the Apache httpd. These instructions along with the NCSA documentation will allow you to install that server as well. There are also other UNIX Web servers, that have similar instructions for installing, by using this chapter in conjunction with the instructions that come with the other servers you should have no problem installing them.

You can download Apache from many sites including:

Once you have the file saved, you need to unzip and untar it to get at the files. The filename should be apache_1.1.tar.gz. To unzip the file you use the program gunzip. The syntax is "gunzip apache_1.1.tar.gz", this will create a file called apache_1.1.tar. This tar file can be untarred using "tar xvf apache_1.1.tar". This will extract the files into a directory called apache_1.1. CD.

NOTE
Some WWW browsers can automatically unzip a file for you. If you don't have a .gz file, check to see if it is already unzipped.
Gunzip is available from most GNU mirrors. The main GNU site is ftp://prep.ai.mit.edu/pub/gnu

Compiling Apache

It is recommended that you use gcc for compilation though any ANSI C compiler should work. If you need gcc, it is available from the GNU archives at ftp://prep.ai.mit.edu/pub/gnu.

The first step in building Apache is to edit the configuration file. To start, cd into the src directory and copy "Configuration.tmpl" to "Configuration." This file contains compiler directives and other Makefile lines, such as CFLAGS, CC, and LFLAGS. These allow you to compile in options that are machine specific. This file is commented very well and by reading, you can easily set the right options for your system.

Module Definitions

After the Makefile rules come the module definitions. Apache is very customizable and you can add or remove modules that you don't want. By removing any unused modules you can have a very small, fast server.

The different modules vary depending on the release, but for the current release 1.1, the following basic modules are present:

These basic modules should probably be left in unless you are sure you do not need them. There are also some other useful modules that can be commented out if not being used. These include:

There are also experimental modules. These are probably okay if you are running a test server and like debugging programs. For a server that people are depending on though these probably should be used sparingly.

NOTE
Once you define which modules you want, the defaults will work if you don't want to worry about them yet, you need to run Configure. This makes changes to the Makefile and creates a file called modules.c which is where the modules get included. One of the popular features of Apache is the ability to add new modules. This should be done only be someone who is very familiar with the "C" language and how the server works. There is very good documentation on this on the Apache Groups Web server, http://www.apache.org/.

After Configure finishes, typing make will build Apache. This may take a few minutes. If you get any errors during the compiling or linking phase you will need to fix them before trying to run Apache.

Once Apache finishes building you should end up with a binary file called httpd. If you are switching from NCSA httpd to Apache you can simply copy this file over your existing one and restart.

NOTE
A simple test to make sure httpd built OK is to run httpd -v. This should tell you what version it is. If this doesn't work, stop and try to find out why the program doesn't work before going further.

If you are installing Apache from scratch, there are a few things you need to do before you can test it out.

Testing It Out

First create a "Home Page". This is simply an HTML file called index.html in the ServerRoot/htdocs directory. If you aren't familiar with HTML, then you can simply add this to the index.html file:

<HTML>
<HEAD>
<Title>It works!</TITLE>
</HEAD>
<BODY>
Apache is running!!!!
</BODY?
</HTML>

Start httpd, unless you changed the port in the httpd.conf file, it will run on port 80. This will require you to be logged in as root. If you receive any errors, go back and check your configuration files. Also check to make sure your httpd is running. On SunOS and BSD based UNIXes you can use "ps -aux | grep httpd" to see if the process is still running. On SystemV based UNIX versions you might need to use "ps -aef | grep httpd".

If it seems to be running OK, then point your Web browser to http://server/. You should see your index.html file displayed in your browser.

NOTE
If you don't have a WWW browser you can test out the server using telnet. Simply "telnet server 80" and type "GET /" after you are connected. This should return your index.html file to you and close the connection.

Apache logs errors to ServerRoot/logs/error_log. If you have problems you should check in this file to try to find the cause. ServerRoot normally is /usr/local/etc/httpd.

Configuration

In addition to the compile time module definitions and other compile time options, there are many runtime options that you need to decide on. These include running as a standalone server or via inetd, where the server should reside, how many processes to start, and many other options.

These options are covered in the next few sections. They are split up by file to make it easier to reference later.

access.conf file

The access.conf file defines what features are available to the users. It is defined on a directory by directory basis using directives.

A directory directive is made up of several lines. The first line must contain:

<Directory [directory/name]>

[directory/name] is the absolute path to the directory, for example /usr/local/etc/httpd/htdocs.

The directory directive ends with a line that looks like:

</Directory>

Inside these directives can be Options or Limit.

Options can be one of the following:

The Limit directive allows you to limit who can get what from this directory. This can be used to limit which IP addresses have access to this directory.

The Limit directive must start with a line like:

<Limit [access]>

[access] can be GET, PUT, POST, or DELETE.

The limit directive must end with:

</Limit>

Limit directives can contain:

XBITHACK is used to tell Apache that text/html files with the owner execute bit set to be server parsed. To use XBITHACK you must have compiled with the -DXBITHACK flag set. There are three different options to XBITHACK:

AllowOveride tells Apache which options can be overridden by the .htaccess file. Uses the same names as Options. This can be used with the AuthConfig directive to allow authentication files and methods to be overridden on a per directory basis.

httpd.conf

The httpd.conf file describes the server process and other global parameters. It has directives for:

NOTE
Virtual hosting is used by many ISPs to allow companies who don't have a WWW server to have a virtual one. This would allow one machine to be configured with multiple IP addresses and multiple Apache servers running and acting as though they were separate machines.
Here is a sample VirtualHost directive:
<VirtualHost virtual.company.com>
ServerAdmin webmaster@virtual.company.com
DocumentRoot /virtual/httpd/htdocs
ServerName virtual.company.com
</VirtualHost>
The first line tells Apache this directive applies to virtual.company.com. This is the address that the request came in from. See BindAddress. The ServerAdmin for this server is webmaster@virtual.company.com. The DocumentRoot is /virtual/httpd/htdocs and the ServerName should be virtual.company.com.

srm.conf

The srm.conf file is used to tell Apache how to handle requests. It defines such things as user HTTP directories, icon definitions, and languages.

Some of the directives in srm.conf are:

AddHandler footer-action html
Action footer-action /cgi-bin/footer

Mime-Types

The mime-types file maps extensions to file types. These file types are then used on the client to assign a viewer to be used. There are many "standard" mime-types and you may never need to add a new one. If you want to add a custom mime-type, it is very easy to add one to the mime-types files.

The format for the mime-types file is simple:

File-type    file-extension1 file-extension2

Adding a new mime-type is as simple as adding a line to mime-types and restarting the server. For example to add a mime-type called "foo" that deals with files ending with ".bar" you would add the following line to mime-types:

application/foo   bar

You would also need to configure the client software to start an external viewer to view the .bar files. This is done differently on different browsers but most are either controlled by a GUI or a mime-types file.

Running Apache

Once Apache is installed and configured there is not much to do to keep it running. It must be started when the machine boots, the log files should be checked and any problems troubleshot and fixed.

This section will cover these topics and help to give you an overview of how to run a Web server.

Starting Apache

It is possible with most systems to have programs start automatically when the machine starts up. In UNIX there are three basic ways:

These ways may not all be available on your version of UNIX but at least one of them should be.

The first way we will discuss is starting the server via inittab. Inittab is a file init used to start programs at certain run levels. The most common levels are 1 or S for single user mode, 0 for halted, and 2 for multiuser mode. Init is the first program started and always has the process id of 1.

The inittab file normally consists of lines made up of:

id:rstate:action:process

id is a unique identifier. state is the run level that the process should run in. If a process is not defined for a run level, it is terminated. A process can be defined to be run in multiple run levels by adding another level. Multiple run levels are not separated.

The action field tells init how to run the process. Common action fields are:

The process is the program and arguments that should be run.

To start Apache from inittab we need to add a line for run level 2. We can use the respawn or once action. A normal inittab line would look like:

as:2:once:/usr/local/etc/httpd

It is also possible to start Apache from RC scripts. These are commonly located in /etc/rc.local or in a separate directory under /etc, such as /etc/rc2.d.

These scripts are run at bootup. To start Apache using RC scripts like /etc/rc.local you would simply add a line at the end of the file such as:

/usr/local/etc/httpd

If your system has separate directories, you need to create a start script. The directories are named using the syntax rc#.d, where # is replaced with the run level. In the case of Apache the run level directory would be /etc/rc2.d. The script names usually begin with S and a number, followed by a name. One example would be S99Apache. The scripts are run by numerical order so you can place the script wherever you need to in the startup process.

To start Apache with a script you would create a file called /etc/rc.d/S99Apache. In it would be:

#!/bin/sh
/usr/local/etc/httpd

The last way to start Apache automatically is via inetd. Inetd is a process that listens on network ports, when it gets a connection it starts the correct server. Inetd looks at a configuration file called inetd.conf. This is usually in /etc or /etc/inet.

The lines in inetd.conf consist of comments or server lines. The server line has a number of fields that are separated by spaces or tabs. They may vary from UNIX to UNIX but they are usually:

To get Apache to run from inetd we also need to look at the services file. This is usually in /etc or /etc/inet. It consists of lines of the form:

service-name  port/protocol   aliases

Service name is the name of the service. This is used by inetd and must be spelled the same in both the inetd.conf file and the services file. Port/protocol is the port number and the protocol, for example 80/tcp. Aliases are other names the service might be known as.

In the services file we need to add a line that looks like:

httpd    80/tcp    

In inetd.conf we need to add the following line:

httpd   stream  tcp  nowait  nobody /usr/local/etc/httpd/httpd httpd

NOTE
Don't forget to change the httpd.conf file in DocumentRoot. The ServerType directive must be changed to inetd.

Troubleshooting Apache

Occasionally, Apache will have problems running. These problems are often related to other processes and rebooting the machine eliminates them. Other times though rebooting either is not possible or simply doesn't work. This section will cover some common error messages and how to fix them.

These are, of course, just the more common error messages. It is possible to have other messages that aren't listed. If so, you need to try to narrow down the problem.

Checking to see if the server is compiled okay is a good first step to narrowing down the problem. If httpd -v works, it is probably compiled okay. It is also very common for the network to have problems. Telnetting to the port from the localhost eliminates the network. If it appears to be compiled okay and eliminating the network doesn't fix the problem, then it is most likely a configuration problem.

NOTE
It is fairly uncommon for a compiler problem to show no errors when compiling and running the -v flag properly. This does happen though. If you can't find the problem, you may have a bug. Bugs or suggestions can be submitted to the Apache group at apache-bugs@mail.apache.org.
If you aren't sure if you have a bug or just a problem, you should check out the Usenet newsgroup comp.infosystems.www.servers.unix. Many people on this newsgroup are very helpful and knowledgeable.

File Pruning

Apache logs connections and error messages. These log files continue to grow until they are removed or truncated. Simply removing the file though isn't enough since Apache references the files by inode. When you move the file it may appear to be different in a listing, but it isn't really closed until the server restarts (and Apache will happily continue writing to the moved file). If you truncate the file, you will find that Apache will keep writing in the same spot it was in and simply fill the file with blanks up to the size of the original log file. There is an easy way to truncate log files though.

In addition to the log files, if you are using Apache as a caching proxy server, the disk space used for cache needs to be maintained.

This is normally done automatically by Apache as long as you set the CacheMaxSize and CacheGC directives correctly.

CacheMaxSize tells the server how much space it can use. If you set CacheMaxSize too high, you will use all of your diskspace for cache. If you set it too low, you will not have good cache performance. Experimentation is required to get an acceptable compromise of space and speed.

CacheGC tells the server how often to clean out the cache's old files. If CacheGC doesn't run often enough, you will have a stale cache. If it runs too often, you may have performance problems. It is important to experiment and find the best value for this parameter.