Chapter 11

Gateways


CONTENTS


Several different types of network services are available on the Internet, ranging from e-mail to database lookups to the World Wide Web. The ability to use one service to access other services is sometimes convenient. For example, you might want to send e-mail or post to USENET news from your Web browser. You might also want to do a WAIS search and have the results sent to your Web browser.

A gateway is a link between these various services. Think of a gateway between two different pastures: one representing one service and the other representing another. In order to access one service through another, you need to go through the gateway (see Figure 11.1).

Figure 11.1 : A gateway.

Very often, your CGI programs act as gateways between the World Wide Web and other services. After all, CGI stands for Common Gateway Interface, and it was designed so that you could use the World Wide Web as an interface to other services and applications.

In this chapter, you see a couple of examples of gateway applications, beginning with a simple finger gateway. You learn how to take advantage of existing client applications within your CGI applications, and you learn the related security issues. You see an example of developing a gateway from scratch rather than using an existing application. Finally, you learn how to design a powerful e-mail gateway.

Using Existing Network Applications

Network applications all work in a similar fashion. You need to know how to do two things: connect to the service and communicate with it. The language that you use to communicate with the service is called the protocol. You have already seen one type of protocol in great detail: the web or http protocol, discussed in Chapter 8, "Client/Server Issues."

Most network services already have clients that know how to properly connect to the server and that understand the protocol. For example, any Web browser understands the http protocol. If you want to get information from a Web server, you don't need to know the protocol. All you need to do is tell the browser what information you want, and the browser does all the communicating for you.

If you already have a suitable client for various services, you can easily write a Web gateway that gets input from the browser, calls the program using the input, and sends the output back to the browser. A diagram of this process is in Figure 11.2.

Figure 11.2 : Using existing clients to create a Web gateway.

Because the existing client does all the communicating for you, your CGI program only needs to do a few things:

The first and last steps are easy. You know how to get input from and send output to the browser using CGI. The middle two steps are slightly more challenging.

Running a Program Using C

Several ways exist to run a program from within another program; some of them are platform specific. In C, the standard function for running other programs is system() from stdlib.h. The parameters for system() and the behavior of this function usually depend on the operating system. In the following examples, assume the UNIX platform, although the concepts can apply generally to all platforms and languages.

On UNIX, system() accepts the program as its parameter and its command-line parameters exactly as you would type them on the command line. For example, if you wanted to write an application that printed the contents of your current directory, you could use the system() function to call the UNIX program /bin/ls. The program myls.c in Listing 11.1 does just that.


Listing 11.1. The myls.c program.
#include <stdlib.h>

int main()
{
  system("/bin/ls"); /* assumes ls resides in the /bin directory */
}

Tip
When you use the system() or any other function that calls programs, remember to use the full pathname. This measure provides a reliable way to make sure the program you want to run is run, and it reduces the security risk by not depending on the PATH environment.

When the system() function is called on UNIX, the C program spawns a shell process (usually /bin/sh) and tells the shell to use the input as its command line. Although this is a simple and portable way to run programs, some inherent risks and extra overhead occur when using it in UNIX. When you use system(), you spawn another shell and run the program rather than run the program directly. Additionally, because UNIX shells interpret special characters (metacharacters), you can inadvertently allow the user to run any program he or she wishes. For more information about the risks of the system() call, see Chapter 9, "CGI Security."

To directly run programs in C on UNIX platforms is more complex and requires using the exec() class of functions from unistd.h. Descriptions of each different exec() function are in Table 11.1.

Table 11.1. The exec() family.

Function
Description
execv() The first argument indicates the path to the program. The second is a null-terminated array of pointers to the argument list; the first argument is usually the name of the program.
Execl() The first argument is the path to the program. The remaining arguments are the program arguments; the second argument is usually the name of the program.
Execvp() Same as execv(), except the first argument stores the name of the program, and the function searches the PATH environment for that program.
Execlp() Same as execl(), except the first argument stores the name of the program, and the function searches the PATH environment for that program.
execle() Same as execl(), except it includes the environment for the program. Specifies the environment following the null pointer that terminates the list.

In order to execute a program directly under UNIX, you need to create a new process for it. You can do this using the fork() function. After you create a new process (known as the child), your program (the parent) must wait until the child is finished executing. You do this using the wait() function.

Using the exec() function, I rewrote myls.c, shown in Listing 11.2. The program is longer and more complex, but it is more efficient. If you do not understand this example, you might want to either read a book on UNIX system programming or just stick to the system() function, realizing the implications.


Listing 11.2. The myls.c program (using exec()).
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main()
{
  int pid,status;

  if ((pid = fork()) < 0) {
    perror("fork");
    exit(1);
  }
  if (pid == 0) { /* child process */
    execl("/bin/ls","ls");
    exit(1);
  }
  /* parent process */
  while (wait(&status) != pid) ;
}

Parsing the Output in C

These programs print their output, unparsed, to stdout. Although most of the time this is satisfactory, sometimes you might want to parse the output. How do you capture the output of these programs?

Instead of using the system() function, you use the popen() function, which uses UNIX pipes (popen stands for pipe open). UNIX users will be familiar with the concept of the pipe. For example, if you had a program that could manipulate the output of the ls command, in order to feed the output to this program you could use a pipe from the command line (| is the pipe symbol).

ls | dosomething

This step takes the output of ls and feeds it into the input of dosomething.

The popen() function emulates the UNIX pipe from within a program. For example, if you wanted to pipe the output of the ls command to the parse_output() function, your code might look like the following:

FILE *output;

output = popen("/bin/ls","r");
parse_output(output);
pclose(output);

popen() works like system(), except instead of sending the output to stdout, it sends the output to a file handle and returns the pointer to that file handle. You can then read from that file handle, parse the data, and print the parsed data to stdout yourself. The second argument of popen() determines whether you read from or write to a pipe. If you want to write to a pipe, you would replace "r" with "w". Because popen() works like system(), it is also susceptible to the same security risks as system(). You should be able to filter any user input for metacharacters before using it inside of popen().

Because popen() suffers from the same problems as system(), you might sometimes prefer to use the pipe() function in conjunction with an exec() function. pipe() takes an array of two integers as its argument. If the call works, the array contains the read and write file descriptors, which you can then manipulate. pipe() must be called before you fork and execute the program. Again, this process is complex. If you don't understand this, don't worry about it; you probably don't need to use it. An example of pipe() appears later in this chapter, in "Parsing the Output in Perl."

In each of these examples, the output is buffered by default, which means that the system stores the output until it reaches a certain size before sending the entire chunk of output to the file handle. This process usually operates faster and more efficiently than sending one byte of output to the file handle at a time. Sometimes, however, you run the risk of losing part of the output because the file handle thinks no more data exists, even though some data is still left in the buffer. To prevent this from happening, you need to tell your file handles to flush their buffers. In C, you do this using the fflush() function, which flushes the given file handle. For example, if you wanted your program not to buffer the stdout, you would use the following call:

fflush(stdout);

Running a Program Using Perl

The syntax for running a program within a Perl program is less complex than in C, but no less powerful. Perl also has a system() function, which usually works exactly like its C equivalent. myls.pl in Listing 11.3 demonstrates the Perl system() function.


Listing 11.3. The myls.pl program.
#!/usr/local/bin/perl

system("/bin/ls");

As you can see, the syntax is exactly like the C syntax. Perl's system() function, however, will not necessarily spawn a new shell. If all the arguments passed to system() are separate parameters, Perl's system() function is equivalent to the forking and execing of programs in C. For example, Listing 11.4 shows the Perl code for listing the contents of the root directory and Listing 11.5 shows the C equivalent.


Listing 11.4. The lsroot.pl program.
#!/usr/local/bin/perl

system "/bin/ls","/";


Listing 11.5. The lsroot.c program.
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main()
{
  int pid,status;

  if ((pid = fork()) < 0) {
    perror("fork");
    exit(1);
  }
  if (pid == 0) { /* child process */
    execl("/bin/ls","ls","/");
    exit(1);
  }
  /* parent process */
  while (wait(&status) != pid) ;
}

You will find it considerably easier to obtain the efficiency and security of forking and then executing a program in Perl than in C. Note, however, that if you had used the following:

system("/bin/ls /");

instead of this:

system "/bin/ls","/";

then the system call would have been exactly equivalent to the C system call; in other words, it would spawn a shell.

Note
You can also run programs directly in Perl using fork() and exec(). The syntax is the same as the C syntax using fork() and any of the exec() functions. Perl only has one exec() function, however, that is equivalent to C's execvp().
The exec() function by itself is equivalent to system() except that it terminates the currently running Perl script. In other words, if you included all of the arguments in one argument in exec(), it would spawn a shell and run the program, exiting from the Perl script after it finished. To prevent exec() from spawning a shell, separate the arguments just as you would with system().

Parsing the Output in Perl

Capturing and parsing the output of programs in Perl is also simpler than in C. The easiest way to store the output of a Perl program is to call it using back ticks (`). Perl spawns a shell and executes the command within the back ticks, returning the output of the command. For example, the following spawns a shell, runs /bin/ls, and stores the output in the scalar $files:

$files = `/bin/ls`;

You can then parse $files or simply print it to stdout.

You can also use pipes in Perl using the open() function. If you want to pipe the output of a command (for example, ls) to a file handle, you would use the following:

open(OUTPUT,"ls|");

Similarly, you could pipe data into a program using the following:

open(PROGRAM,"|sort");

This syntax is equivalent to C's popen() function and suffers from similar problems. In order to read from a pipe without opening a shell, use

open(OUTPUT,"-|") || exec "/bin/ls";

To write to a pipe, use

open(PROGRAM,"|-") || exec "/usr/bin/sort";

Make sure each argument for the program gets passed as a separate argument to exec().

To unbuffer a file handle in Perl, use

select(FILEHANDLE); $| = 1;

For example, to unbuffer the stdout, you would do the following:

select(stdout); $| = 1;

Finger Gateway

Using the methods described in the preceding section, you can create a Web gateway using existing clients. Finger serves as a good example. Finger enables you to get certain information about a user on a system. Given a username and a hostname (in the form of an e-mail address), finger will contact the server and return information about that user if it is available.

The usage for the finger program on most UNIX systems is

finger username@hostname
For example, the following returns finger information about user eekim at the machine hcs.harvard.edu:
finger eekim@hcs.harvard.edu

You can write a Web-to-finger CGI application, as shown in Listings 11.6 (in C) and 11.7 (in Perl). The browser passes the username and hostname to the CGI program finger.cgi, which in turn runs the finger program. Because finger already returns the output to stdout, the output appears on the browser.

You want the finger program to be flexible. In other words, you should have the capability to specify the user and host from the URL, and you should be able to receive information from a form. Input for finger.cgi must be in the following form:

finger.cgi?who=username@hostname

If you use finger.cgi as the action parameter of a form, you must make sure you have a text field with the name who.


Listing 11.6. The finger.cgi.c program.
#include <stdio.h>
#include <stdlib.h>
#include "cgi-lib.h"
#include "html-lib.h"
#include "string-lib.h"

#define FINGER "/usr/bin/finger "

void print_form()
{
  html_begin("Finger Gateway");
  h1("Finger Gateway");
  printf("<form>\n");
  printf("Who? <input name=\"who\">\n");
  printf("</form>\n");
  html_end();
}

int main()
{
  char *command,*who;
  llist entries;

  html_header();
  if (read_cgi_input(&entries)) {
    if (cgi_val(entries,"who")) {
      who = newstr(escape_input(cgi_val(entries,"who")));
      html_begin("Finger results");
      printf("<pre>\n");
      command = malloc(strlen(FINGER) + strlen(who) + 1);
      strcpy(command,FINGER);
      strcat(command,who);
      fflush(stdout);
      system(command);
      printf("</pre>\n");
      html_end();
    }
    else
      print_form();
  }
  else
    print_form();
  list_clear(&entries);
}


Listing 11.7. The finger.cgi program (Perl).
#!/usr/local/bin/perl

require 'cgi-lib.pl';

select(stdout); $| = 1;
print &PrintHeader;
if (&ReadParse(*input)) {
    if ($input{'who'}) {
        print &HtmlTop("Finger results"),"<pre>\n";
        system "/usr/bin/finger",$input{'who'};
        print "</pre>\n",&HtmlBot;
    }
    else {
        &print_form;
    }
}
else {
    &print_form;
}

sub print_form {
   print &HtmlTop("Finger Gateway");
   print "<form>\n";
   print "Who? <input name=\"who\">\n";
   print "</form>\n";
   print &HtmlBot;
}

Both the C and Perl versions of finger.cgi are remarkably similar. Both parse the input, unbuffer stdout, and run finger. The two versions, however, differ in how they run the program. The C version uses the system() call, which spawns a shell and runs the command. Because it spawns a shell, it must escape all metacharacters before passing the input to system(); hence, the call to escape_input(). In the Perl version, the arguments are separated so it runs the program directly. Consequently, no filtering of the input is necessary.

You can avoid filtering the input in the C version as well, if you avoid the system() call. Listing 11.8 lists a version of finger.cgi.c that uses execl() instead of system(). Notice that in this version of finger.cgi.c, you no longer need escape_input() because no shell is spawned.


Listing 11.8. The finger.cgi.c program (without spawning a shell).
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include "cgi-lib.h"
#include "html-lib.h"
#include "string-lib.h"

#define FINGER "/usr/bin/finger"

void print_form()
{
  html_begin("Finger Gateway");
  h1("Finger Gateway");
  printf("<form>\n");
  printf("Who? <input name=\"who\">\n");
  printf("</form>\n");
  html_end();
}

int main()
{
  char *command,*who;
  llist entries;
  int pid,status;

  html_header();
  if (read_cgi_input(&entries)) {
    if (cgi_val(entries,"who")) {
      who = newstr(cgi_val(entries,"who"));
      html_begin("Finger results");
      printf("<pre>\n");
      command = malloc(strlen(FINGER) + strlen(who) + 1);
      strcpy(command,FINGER);
      strcat(command,who);
      fflush(stdout);
      if ((pid = fork()) < 0) {
        perror("fork");
        exit(1);
      }
      if (pid == 0) { /* child process */
        execl(FINGER,"finger",who);
        exit(1);
      }
      /* parent process */
      while (wait(&status) != pid) ;
      printf("</pre>\n");
      html_end();
    }
    else
      print_form();
  }
  else
    print_form();
  list_clear(&entries);
}

For a variety of reasons, you might want to parse the output before sending it to the browser. Perhaps, for example, you want to surround e-mail addresses and URLs with <a href> tags. The Perl version of finger.cgi in Listing 11.9 has been modified to pipe the output to a file handle. If you want to, you can then parse the data from the file handle before sending it to the output.


Listing 11.9. The finger.cgi program (Perl using pipes).
#!/usr/local/bin/perl

require 'cgi-lib.pl';

select(stdout); $| = 1;
print &PrintHeader;
if (&ReadParse(*input)) {
    if ($input{'who'}) {
        print &HtmlTop("Finger results"),"<pre>\n";
        open(FINGER,"-|") || exec "/usr/bin/finger",$input{'who'};
        while (<FINGER>) {
            print;
        }
        print "</pre>\n",&HtmlBot;
    }
    else {
        &print_form;
    }
}
else {
    &print_form;
}

sub print_form {
   print &HtmlTop("Finger Gateway");
   print "<form>\n";
   print "Who? <input name=\"who\">\n";
   print "</form>\n";
   print &HtmlBot;
}

Security

It is extremely important to consider security when you write gateway applications. Two specific security risks exist that you need to avoid. First, as previously stated, avoid spawning a shell if possible. If you cannot avoid spawning a shell, make sure you escape any non-alphanumeric characters (metacharacters). You do this by preceding the metacharacter with a backslash (\).

You must note that using a Web gateway could circumvent certain access restrictions. For example, suppose your school, school.edu, only allowed people to finger from within the school. If you set up a finger gateway running on www.school.edu, then anyone outside the school could finger machines within the school. Because the finger gateway runs the finger program from within the school.edu, the gateway sends the output to anyone who requests it, including those outside of school.edu.

If you want to maintain access restrictions, you need to build an access layer on your CGI program as well. You can use the REMOTE_ADDR and REMOTE_HOST environment variables to determine from where the browser is connecting.

True Client/Server Gateways

If you do not already have an adequate client for certain network services, or if you want to avoid the extra overhead of calling this extra program directly, you can include the appropriate protocol within your CGI application. This way, your CGI gateway talks directly to the network service (see Figure 11.3) rather than call another program that communicates with the service.

Figure 11.3 : A gateway that talks directly to the network service.

Although this way has an efficiency advantage, your programs are longer and more complex, which means longer development time. Additionally, you generally duplicate the work in the already existing client that handles the network connections and communication for you.

If you do decide to write a gateway client from scratch, you need to first find the protocol. You can get most of the Internet network protocols via ftp at ds.internic.net. A nice Web front-end to various Internet protocols and RFC's exists at <URL:http://www.cis.ohio-state.edu/hypertext/information/rfc.html>.

Network Programming

To write any direct gateways, you need to know some basic network programming. This
section briefly describes network client programming on UNIX using Berkeley sockets. The information in this section is not meant to serve as a comprehensive tutorial to network programming; you should refer to other sources for more information.

TCP/IP (Internet) network communication on UNIX is performed using something called a socket (or a Berkeley socket). As far as the programmer is concerned, the socket works the same as a file handle (although internally, a socket is very different from a file handle).

Before you can do any network communication, you must open a socket using the socket() function (in both C and Perl). socket() takes three arguments-a domain, a socket type, and a protocol-and returns a file descriptor. The domain tells the operating system how to interpret the given domain name. Because you are doing Internet programming, you use the domain AF_INET as defined in the header file, socket.h, which is located in /usr/include/sys.

The socket type is either SOCK_STREAM or SOCK_DGRAM. You almost definitely will use SOCK_STREAM, which guarantees reliable, orderly delivery of information to the server. Network services such as the World Wide Web, ftp, gopher, and e-mail use SOCK_STREAM. SOCK_DGRAM sends packets in datagrams, little packets of information that are not guaranteed to be delivered or delivered in order. Network File System (NFS) is an example of a protocol that uses SOCK_DGRAM.

Finally, the protocol defines the transport layer protocol. Because you are using TCP/IP, you want to define the network protocol as TCP.

Note
AF_INET, SOCK_STREAM, and SOCK_DGRAM are defined in <sys/socket.h>. In Perl, these values are not defined unless you have converted your C headers into Perl headers using the h2ph utility. The following values will work for almost any UNIX system:
  • AF_INET: 2
  • SOCK_STREAM: 1 (2 if using Solaris)
  • SOCK_DGRAM: 2 (1 if using Solaris)
Solaris users should note that the values for SOCK_STREAM and SOCK_DGRAM are reversed.

After you create a socket, your client tries to connect to a server through that socket. It uses the connect() function to do so (again, this process works in both Perl and C). In order for connect() to work properly, it needs to know the socket, the IP address of the server, and the port to which to connect.

A Direct Finger Gateway

In order to demonstrate network programming, this chapter shows finger.cgi programmed to do a direct network connection. This example appears in Perl; the C equivalent works in a similar way. Once again, check a book on network programming for more information.

In order to modify finger.cgi into a direct finger gateway, you need to change three things. First, you need to initialize various network variables. Second, you need to split up the value of who from e-mail form into a separate username and hostname. Finally, you need to create the socket, make the network connection, and communicate directly with the finger server. Listings 11.10 and 11.11 show the code for the first two tasks.


Listing 11.10. Initialize network variables.
$AF_INET = 2;
$SOCK_STREAM = 1; # Use 2 if using Solaris
$sockaddr = 'S n a4 x8';
$proto = (getprotobyname('tcp'))[2];
$port = (getservbyname('finger', 'tcp'))[2];


Listing 11.11. Separate the username and hostname and determine IP address from hostname.
($username,$hostname) = split(/@/,$input{'who'});
$hostname = $ENV{'SERVER_NAME'} unless $hostname;
$ipaddr = (gethostbyname($hostname))[4];
if (!$ipaddr) {
    print "Invalid hostname.\n";
}
else {
    &do_finger($username,$ipaddr);
}

Communicating directly with the finger server requires understanding how the finger server communicates. Normally, the finger server runs on port 79 on the server. In order to use it, the server expects the username followed by a CRLF. After it has the username, the server searches for information about that user, sends it to the client over the socket, and closes the connection.

Tip
You can communicate directly with the finger server using the telnet command. Suppose you want to finger ed@gunther.org:
% telnet gunther.org 79
Trying 0.0.0.0...
Connected to gunther.org
Escape character is '^]'.
ed
After you press Enter, the finger information is displayed.

The code for connecting to and communicating with the finger server appears in the &do_finger function, listed in Listing 11.12.


Listing 11.12. The &do_finger function.
sub do_finger {
    local($username,$ipaddr) = @_;

    $them = pack($sockaddr, $AF_INET, $port, $ipaddr);
    # get socket
    socket(FINGER, $AF_INET, $SOCK_STREAM, $proto) || die "socket: $!";
    # make connection
    if (!connect(FINGER,$them)) {
        die "connect: $!";
    }
    # unbuffer output
    select(FINGER); $| = 1; select(stdout);
    print FINGER "$username\r\n";
    while (<FINGER>) {
        print;
    }
}

The completed program-dfinger.cgi-appears in Listing 11.13. Although this program works more efficiently overall than the older version (finger.cgi) you can see that it is more complex, and that the extra complexity might not be worth the minute gain in efficiency. For larger client/server gateways, however, you might see a noticeable advantage to making a direct connection versus running an existing client from the gateway.


Listing 11.13. The dfinger.cgi program (Perl).
#!/usr/local/bin/perl

require 'cgi-lib.pl';

# initialize network variables
$AF_INET = 2;
$SOCK_STREAM = 1; # Use 2 if using Solaris
$sockaddr = 'S n a4 x8';
$proto = (getprotobyname('tcp'))[2];
$port = (getservbyname('finger', 'tcp'))[2];

# unbuffer output
select(stdout); $| = 1;

# begin main
print &PrintHeader;
if (&ReadParse(*input)) {
    if ($input{'who'}) {
        print &HtmlTop("Finger results"),"<pre>\n";
        ($username,$hostname) = split(/@/,$input{'who'});
        $hostname = $ENV{'SERVER_NAME'} unless $hostname;
        $ipaddr = (gethostbyname($hostname))[4];
        if (!$ipaddr) {
            print "Invalid hostname.\n";
        }
        else {
            &do_finger($username,$ipaddr);
        }
        print "</pre>\n",&HtmlBot;
    }
    else {
        &print_form;
    }
}
else {
    &print_form;
}

sub print_form {
    print &HtmlTop("Finger Gateway");
    print "<form>\n";
    print "Who? <input name=\"who\">\n";
    print "</form>\n";
    print &HtmlBot;
}

sub do_finger {
    local($username,$ipaddr) = @_;

    $them = pack($sockaddr, $AF_INET, $port, $ipaddr);
    # get socket
    socket(FINGER, $AF_INET, $SOCK_STREAM, $proto) || die "socket: $!";
    # make connection
    if (!connect(FINGER,$them)) {
        die "connect: $!";
    }
    # unbuffer output
    select(FINGER); $| = 1; select(stdout);
    print FINGER "$username\r\n";
    while (<FINGER>) {
    print;
    }
}

E-Mail Gateway

This chapter ends with examples of a very common gateway found on the World Wide Web: a Web to e-mail gateway. The idea is that you can take the content of a form and e-mail it to the specified location using this gateway.

Many current browsers have built-in e-mail capabilities that enable users to e-mail anyone and anywhere from their browsers. Clicking on a tag such as the following will cause the browser to run a mail client that will send a message to the recipient specified in the <a href> tag:

<a href="mailto:eekim@hcs.harvard.edu">E-mail me</a>

Why does anyone need a Web to e-mail gateway if most browsers can act as e-mail clients?

An e-mail gateway can have considerable power over the built-in mail clients and the mailto references. For example, you could force all e-mail to have the same format by using a fill-out form and a custom mail gateway. This example becomes useful if you are collecting information for future parsing, such as a poll. Having people e-mail their answers in all sorts of different forms would make parsing extremely difficult.

This section shows the development of a rudimentary mail gateway in C. This gateway requires certain fields such as to and uses an authentication file to limit the potential recipients of e-mail from this gateway. Next, you see the form.cgi-the generic form parsing CGI application developed in Chapter 10, "Basic Applications"-extended to support e-mail.

A Simple Mail Program (C)

mail.cgi is a simple e-mail gateway with the following specifications:

As you can see, mail.cgi is fairly inflexible, but it serves its purpose adequately. It will ignore any field other than those specified. You could not include a poll on your HTML form because that information would simply be ignored by mail.cgi. This CGI functions essentially equivalent to the mailto reference tag, except for the authentication file.

Why use an authentication file? Mail using this gateway is easily forged. Because the CGI program has no way of knowing the identity of the user, it asks the user to fill out that information. The user could easily fill out false information. In order to prevent people from using this gateway to send forged e-mail to anyone on the Internet, it will enable you to send e-mail only to those specified in a central authentication file maintained by the server administrator. As an added protection against forged e-mail, mail.cgi adds an X-Sender mail header that says this e-mail was sent using this gateway.

The authentication file contains valid e-mail recipients, one on each line. For example, your authentication file might look like this:

eekim@hcs.harvard.edu
president@whitehouse.gov

In this case, you could only use mail.cgi to send e-mail to me and the President.

Finally, you need to decide how to send the e-mail. A direct connection does not seem like a good solution: the Internet e-mail protocol can be a fairly complex thing, and making direct connections to mail servers seems unnecessary. The sendmail program, which serves as an excellent mail transport agent for e-mail, is up-to-date, fairly secure, and fairly easy to use. This example uses popen() to pipe the data into the sendmail program, which consequently sends the information to the specified address.

The code for mail.cgi appears in Listing 11.14. There are a few features of note. First, even though this example uses popen(), it doesn't bother escaping the user input because mail.cgi checks all user inputted e-mail addresses with the ones in the central authentication file. Assume that neither the e-mail addresses in the central access file nor the hard-coded Web administrator's e-mail address (defined as WEBADMIN) are invalid.


Listing 11.14. The mail.cgi.c program.
#include <stdio.h>
#include "cgi-lib.h"
#include "html-lib.h"
#include "string-lib.h"

#define WEBADMIN "web@somewhere.edu"
#define AUTH "/usr/local/etc/httpd/conf/mail.conf"

void NullForm()
{
  html_begin("Null Form Submitted");
  h1("Null Form Submitted");
  printf("You have sent an empty form. Please go back and fill out\n");
  printf("the form properly, or email <i>%s</i>\n",WEBADMIN);
  printf("if you are having difficulty.\n");
  html_end();
}

void authenticate(char *dest)
{
  FILE *access;
  char s[80];
  short FOUND = 0;

  if ( (access = fopen(AUTH,"r")) != NULL) {
    while ( (fgets(s,80,access)!=NULL) && (!FOUND) ) {
      s[strlen(s) - 1] = '\0';
      if (!strcmp(s,dest))
        FOUND = 1;
    }
    if (!FOUND) {
      /* not authenticated */
      html_begin("Unauthorized Destination");
      h1("Unauthorized Destination");

      html_end();
      exit(1);
    }
  }
  else { /* access file not found */
    html_begin("Access file not found");
    h1("Access file not found");

    html_end();
    exit(1);
  }
}

int main()
{
  llist entries;
  FILE *mail;
  char command[256] = "/usr/lib/sendmail ";
  char *dest,*name,*email,*subject,*content;

  html_header();
  if (read_cgi_input(&entries)) {
    if ( !strcmp("",cgi_val(entries,"name")) &&
        !strcmp("",cgi_val(entries,"email")) &&
        !strcmp("",cgi_val(entries,"subject")) &&
        !strcmp("",cgi_val(entries,"content")) )
      NullForm();
    else {
      dest = newstr(cgi_val(entries,"to"));
      name = newstr(cgi_val(entries,"name"));
      email = newstr(cgi_val(entries,"email"));
      subject = newstr(cgi_val(entries,"subject"));
      if (dest[0]=='\0')
        strcpy(dest,WEBADMIN);
      else
        authenticate(dest);
      /* no need to escape_input() on dest, since we assume there aren't
         insecure entries in the authentication file. */
      strcat(command,dest);
      mail = popen(command,"w");
      if (mail == NULL) {
        html_begin("System Error!");
        h1("System Error!");
        printf("Please mail %s and inform\n",WEBADMIN);
        printf("the web maintainers that the comments script is improperly\n");
        printf("configured. We apologize for the inconvenience<p>\n");
        printf("<hr>\r\nWeb page created on the fly by ");
        printf("<i>%s</i>.\n",WEBADMIN);
        html_end();
      }
      else {
        content = newstr(cgi_val(entries,"content"));
        fprintf(mail,"From: %s (%s)\n",email,name);
        fprintf(mail,"Subject: %s\n",subject);
        fprintf(mail,"To: %s\n",dest);
        fprintf(mail,"X-Sender: %s\n\n",WEBADMIN);
        fprintf(mail,"%s\n\n",content);
        pclose(mail);
        html_begin("Comment Submitted");
        h1("Comment Submitted");
        printf("You submitted the following comment:\r\n<pre>\r\n");
        printf("From: %s (%s)\n",email,name);
        printf("Subject: %s\n\n",subject);
        printf("%s\n</pre>\n",content);
        printf("Thanks again for your comments.<p>\n");
        printf("<hr>\nWeb page created on the fly by ");
        printf("<i>%s</i>.\n",WEBADMIN);
        html_end();
      }
  }
  else {
    html_begin("Comment Form");
    h1("Comment Form");
    printf("<form method=POST>\n";
    printf("<input type=hidden name=\"to\" value=\"%s\">\n",WEBADMIN);
    printf("<p>Name: <input name=\"name\"><br>\n");
    printf("E-mail: <input name=\"email\"><br>\n");
    printf("Subject: <input name=\"subject\"></p>\n");
    printf("<p>Comments:<br>\n");
    printf("<textarea name="content" rows=10 cols=70></textarea></p>\n");
    printf("<input type=submit value=\"Mail form\">\n");
    printf("</form>\n");
    html_end();
  }
  list_clear(&entries);
  return 0;
}

You might notice that the example uses statically allocated strings for some values, such as the command string. The assumption is that you know the maximum size limit of this string because you know where the command is located (in this case, /usr/lib/sendmail), and you assume that any authorized e-mail address will not put this combined string over the limit. The example essentially cheats on this step to save coding time. If you want to extend and generalize this program, however, you might need to change this string to a dynamically allocated one.

Extending the Mail Program (Perl)

mail.cgi doesn't serve as a tremendously useful gateway for most people, although it offers some nice features over using the <a href="mailto"> tag. A fully configurable mail program that could parse anything, that could send customized default forms, and that could send e-mail in a customizable format would be ideal.

These desires sound suspiciously like the specifications for form.cgi, the generic forms parser developed in Chapter 10. In fact, the only difference between the form.cgi program described earlier and the program described here is that the program described here sends the results via e-mail rather than saving them to a file.

Instead of rewriting a completely new program, you can use form.cgi as a foundation and extend the application to support e-mail as well. This action requires two major changes:

If a MAILTO option is in the configuration file, form.cgi e-mails the results to the address specified by MAILTO. If neither a MAILTO nor OUTPUT option is specified in the configuration file, then form.cgi returns an error. The new form.cgi with e-mail support appears in Listing 11.15.


Listing 11.15. The form.cgi program (with mail support).
#!/usr/local/bin/perl

require 'cgi-lib.pl';

$global_config = '/usr/local/etc/httpd/conf/form.conf';
$sendmail = '/usr/lib/sendmail';

# parse config file
$config = $ENV{'PATH_INFO'};
if (!$config) {
    $config = $global_config;
}
open(CONFIG,$config) || &CgiDie("Could not open config file");
while ($line = <CONFIG>) {
    $line =~ s/[\r\n]//;
    if ($line =~ /^FORM=/) {
    ($form = $line) =~ s/^FORM=//;
    }
    elsif ($line =~ /^TEMPLATE=/) {
    ($template = $line) =~ s/^TEMPLATE=//;
    }
    elsif ($line =~ /^OUTPUT=/) {
    ($output = $line) =~ s/^OUTPUT=//;
    }
    elsif ($line =~ /^RESPONSE=/) {
    ($response = $line) =~ s/^RESPONSE=//;
    }
    elsif ($line =~ /^MAILTO=/) {
        ($mailto = $line) =~ s/^MAILTO=//;
    }
}
close(CONFIG);

# process input or send form
if (&ReadParse(*input)) {
    # read template into list
    if ($template) {
    open(TMPL,$template) || &CgiDie("Can't Open Template");
    @TEMPLATE = <TMPL>;
    close(TMPL);
    }
    else {
    &CgiDie("No template specified");
    }
    if ($mailto) {
        $mail = 1;
        open(MAIL,"-|") || exec $sendmail,$mailto;
        print MAIL "To: $mailto\n";
        print MAIL "From: $input{'email'} ($input{'name'})\n";
        print MAIL "Subject: $subject\n" unless (!$subject);
        print MAIL "X-Sender: form.cgi\n\n";
    foreach $line (@TEMPLATE) {
        if ( ($line =~ /\$/) || ($line =~ /\%/) ) {
        # form variables
        $line =~ s/^\$(\w+)/$input{$1}/;
        $line =~ s/([^\\])\$(\w+)/$1$input{$2}/g;
        # environment variables
        $line =~ s/^\%(\w+)/$ENV{$1}/;
        $line =~ s/([^\\])\%(\w+)/$1$ENV{$2}/g;
        }
        print MAIL $line;
    }
    close(MAIL);
    }
    else {
       $mail = 0;
    }
    # write to output file according to template
    if ($output) {
    open(OUTPUT,">>$output") || &CgiDie("Can't Append to $output");
    foreach $line (@TEMPLATE) {
        if ( ($line =~ /\$/) || ($line =~ /\%/) ) {
        # form variables
        $line =~ s/^\$(\w+)/$input{$1}/;
        $line =~ s/([^\\])\$(\w+)/$1$input{$2}/g;
        # environment variables
        $line =~ s/^\%(\w+)/$ENV{$1}/;
        $line =~ s/([^\\])\%(\w+)/$1$ENV{$2}/g;
        }
        print OUTPUT $line;
    }
    close(OUTPUT);
    }
    elsif (!$mail) {
    &CgiDie("No output file specified");
    }
    # send either specified response or dull response
    if ($response) {
    print "Location: $response\n\n";
    }
    else {
    print &PrintHeader,&HtmlTop("Form Submitted");
    print &HtmlBot;
    }
}
elsif ($form) {
    # send default form
    print "Location: $form\n\n";
}
else {
    &CgiDie("No default form specified");
}

The changes to form.cgi are very minor. All that you had to add was an extra condition for the configuration parsing function and a few lines of code that will run the sendmail program in the same manner as mail.cgi.

Summary

You can write CGI programs that act as gateways between the World Wide Web and other network applications. You can take one of two approaches to writing a CGI gateway: either embed an existing client into a CGI program, or program your CGI application to understand the appropriate protocols and to make the network connections directly. Advantages and disadvantages exist with both methods, although for most purposes, running the already existing client from within your CGI application provides a more than adequate solution. If you do decide to take this approach, you must remember to carefully consider any possible security risks in your code, including filtering out shell metacharacters and redefining access restrictions.