Chapter 10 Basic Applications

Two Small Programs
- Content Negotiation
- Redirection Manager
Generic Form Parser
Summary

You are now ready to begin the real learning process: programming useful CGI applications. In this and the other chapters in this section, the techniques and the CGI protocol from the first part of this book are demonstrated by developing and discussing real applications.

This chapter begins with some very basic applications. Most of the examples are relatively small. The purpose is to show how you can apply your basic knowledge to perform powerful tasks. The chapter begins with two small programs written in Perl: a redirection manager and a content negotiation program. Finally, you move on to a larger application. You develop a generic program that parses any form and saves the content to a file, and then you extend it to perform flexible manipulation of the data.

Two Small Programs

In my two years developing Web sites and applications, I've found myself reusing some small, very simple CGI programs over and over again. Two small CGI programs are reproduced here: a redirection manager and a content negotiation application. These utilities are written in Perl to emphasize the CGI routines rather than the text processing routines; however, if you are running a heavily accessed server, you might want to rewrite these applications in C for more efficient responses.

Note

The tasks that these three applications perform are so common that many servers enable you to perform these tasks internally without the extra overhead of a CGI program. For example, among UNIX servers is the Apache server, which internally controls redirection and content negotiation using configuration files. If you have a heavily accessed site that often uses these features, you might want to see whether other servers for your platform support these features internally.

Content Negotiation

As an HTML author, you might want to use some of the unique HTML extensions certain browsers support; however, you might be afraid that the pages with extensions look bad on browsers that don't support those extensions. Or, you might have a graphics-heavy Web page, and you would like to send a text-only page to browsers such as lynx that don't support graphics. Ideally, you could write a program that would determine the capabilities of the browser and then send the appropriate page.

This chapter shows a simple version of such a program called cn (which stands for content negotiation). Given the location and prefix of a document (index is the prefix of index.html, for example), cn does the following:

Checks to see what kind of HTML the browser supports
Sends the appropriate page

In order to determine what the browser is capable of viewing, cn checks two environment variables: HTTP_ACCEPT and HTTP_USER_AGENT. If you recall from Chapter 5, "Input," HTTP_ACCEPT stores a list of MIME types the browser can view. You can use HTTP_ACCEPT to determine whether the browser is text-only or not by scanning the environment variable for the word image (as in image/gif or image/jpeg). If it finds this word, it assumes the browser is a graphical browser; otherwise, it assumes the browser is text-only.

If the browser is graphical, cn then checks HTTP_USER_AGENT to determine the brand of the browser. This book primarily focuses on Netscape Navigator and Microsoft Internet Explorer. You can easily expand this program to fit your specific browser needs. Netscape and some versions of Internet Explorer store the word Mozilla in HTTP_USER_AGENT; other versions of Internet Explorer store Internet Explorer.

After cn determines the browser type, it then tries to send the appropriate file. The proper HTML files are determined by filename extension. The default, global extension that cn will use if it cannot find any other files is .html. The other extensions are .thtml (for text HTML) and .mhtml (for Mozilla/Microsoft HTML, whatever suits your need). Cn reads the PATH_TRANSLATED environment variable to determine where to look for the files and what the filename prefix is.

For example, suppose you have three different versions of the same HTML document: one standard document, one text-only document, and one that supports Mozilla/Microsoft extensions. The three filenames and locations are

/index.html /index.thtml /index.mhtml

In order to tell cn to send one of these three files according to browser type, you reference cn as follows:

<a href="/cgi-bin/cn/index">Go to Index</a>

Assume your document root is /usr/local/etc/httpd/htdocs/. When you click on the preceding link, you run cn with PATH_TRANSLATED /usr/local/etc/httpd/htdocs/index. Cn first checks the HTTP_ACCEPT variable to see if you have a text-only browser; if you do, it tries to send index.thtml. If you have a graphical browser, cn checks to see if you have either Netscape or Internet Explorer running. If you do, it tries to send index.mhtml. If cn can't find either index.thtml or index.mhtml, or if the browser is a non-Netscape/Microsoft graphical browser, cn tries to send index.html. If cn cannot find cn, it sends a File Not Found error message (status code 404).

The following section summarizes the algorithm:

Check PATH_TRANSLATED for file location and prefix. If PATH_TRANSLATED isn't defined, send an error message.
Check HTTP_ACCEPT to see if the browser is text-only. If so, try to send filename.thtml.
If the browser is graphical, check to see if it is either Netscape or Internet Explorer. If it is either of these two, try to send filename.mhtml.
If the browser is none of these, or if cn fails to find either filename.thtml or filename.mhtml, cn should try and send filename.html. If it can't find filename.html, it should send an error message.

The complete source code for cn is in Listing 10.1.

Listing 10.1. The cn source code.

#!/usr/local/bin/perl # store environment variables in local variables $PATH_INFO = $ENV{'PATH_INFO'}; $PATH_TRANSLATED = $ENV{'PATH_TRANSLATED'}; $HTTP_ACCEPT = $ENV{'HTTP_ACCEPT'}; $HTTP_USER_AGENT = $ENV{'HTTP_USER_AGENT'}; $SENT = 0; if ($PATH_TRANSLATED) { if ($HTTP_ACCEPT =~ /image/) { if ( ($HTTP_USER_AGENT =~ /Mozilla/) || ($HTTP_USER_AGENT =~ /Microsoft/) ) { if (-e "$PATH_TRANSLATED.mhtml") { &send_contents("$PATH_TRANSLATED.mhtml"); $SENT = 1; } } } else { # text-only browser if (-e "$PATH_TRANSLATED.thtml") { &send_contents("$PATH_TRANSLATED.thtml"); $SENT = 1; } } if ($SENT == 0) { if (-e "$PATH_TRANSLATED.html") { &send_contents("$PATH_TRANSLATED.html"); } else { print <<EOM; Status: 404 File Not Found Content-Type: text/html <html> <head> <title>File Not Found</title> </head> <body> <h1>File Not Found</h1> <p>Could not find the file (Error 404).</p> </body> </html> EOM } } } else { print <<EOM; Status: 403 Forbidden File Content-Type: text/html <html> <head> <title>Forbidden File</title> </head> <body> <h1>Forbidden File</h1> <p>Could not open file (Error 403).</p> </body> </html> EOM } sub send_contents { local($filename) = @_; print "Content-Type: text/html\n\n"; open(FILE,$filename); # or error while (<FILE>) { print; } }

The Perl code for cn is about as straightforward as source code gets. All input was from environment variables. No parsing was necessary, so no external programming libraries such as cgi-lib.pl are needed. In order to send the appropriate HTML files, you could use the Location header rather than open the file. Opening the files yourself, however, enables you to check for the existence of files and then look for other files if the ones you wanted didn't already exist. Sending a Location header would have been inefficient because the server would have parsed the header and then once again checked to see whether the file existed or not, something cn had already determined.

Redirection Manager

Redirection operates similar to telephone call forwarding. With call forwarding, you dial a certain phone number that consequently dials and connects you to another phone number.

If you restructure your Web site and move files around, you might want to specify a redirect for a file at its old location to its new location. For example, if the file mom.html moved from your document root to the directory /parents, you might want to redirect the request from the following:

http://myserver.org/mom.html

to:

http://myserver.org/parents/mom.html

A few options exist for you to handle this problem. You could create the file /mom.html with the following message:

<html> <head> <title>Mom Moved</title> </head> <body> <h1>Mom Moved</h1> <p>Mom moved <a href="/parents/mom.html">here</a>.</p> </body> </html>

Although this idea provides an adequate solution, it requires more maintenance and is not really the proper way to handle a redirect. To properly handle a redirect, you send a redirect status code (see Table 10.1; a complete list is available in Chapter 8, "Client/Server Issues") and let the browser determine how to properly retrieve the file at its new location.

Table 10.1. Redirect status codes.

Status Code	Status Message	Meaning
`301`	Moved Permanently	The page is now located at a new URI, specified by the `Location` header.
`302`	Moved Temporarily	The page is temporarily located at a new URI, specified by the `Location` header.

When the browser receives a status code of 301 (Moved Permanently), it redirects all subsequent requests at the old URL to the new location. For example, if you request

http://myserver.org/mom.html

and receive a status code of 301 and the new location:

http://myserver.org/parents/mom.html

it will redirect the request to the latter location. The next time you try to access the first URL, the browser doesn't bother trying the old URL again; it instead directly accesses the new URL.

Upon receipt of status code 302 (Moved Temporarily), the browser redirects only that one specific request; upon subsequent requests, it tries the original URL again. For example, suppose you request the following again:

http://myserver.org/mom.html

This time you receive a status code of 302 along with the new location:

http://myserver.org/parents/mom.html

It once again redirects your request to the latter location, but the next time you try to access the first URL, it once again tries to access the URL. Figure 10.1 summarizes the different browser procedures for the two different redirect status codes.

Figure 10.1 : This diagram demonstrates how the browser reacts upon receiving status codes 301 and 302.

Instead of creating a new HTML document that routes to the new URL, you send a status code and Location header to properly redirect a request. You can accomplish this task in two ways. You can configure many servers to redirect specific URLs. Although this has the advantage of very low overhead, your redirect options are limited. Some servers might only enable the Web administrator to configure redirections. Other servers might have limited configurability. For example, you cannot configure the NCSA server to send a status code of 301 instead of 302 with its Redirect feature.

Tip

To establish redirection using the NCSA server or a derivative (like Apache), edit the conf/srm.conf file and add the following line:

Redirect fakename newURL

where fakename is the name of the file you want to redirect relative to your document root and newURL is the new location. The latter must be a complete URL, not just a filename relative to document root. For example:

Redirect /mom.html http://myserver.org/parents/mom.html

temporarily redirects (using status code 302) all requests for

http://myserver.org/mom.html

to:

http://myserver.org/parents/mom.html

Note that the following is not a legal Redirect request because the last parameter must be a proper URL:

Redirect /mom.html /parents/mom.html

The second way to accomplish a redirect is with a CGI program. Although this method is less efficient than having the server directly process the redirect request, a CGI program offers more flexibility. For example, you can write your CGI program so that users can configure their own redirections.

Businesses that provide advertising space on their Web sites can use a redirection script to log every time someone clicks on an advertisement to go to another site. For example, suppose you have an advertisement for Mom and Pop's Candy Store (as depicted in Figure 10.2). If you click on the advertisement, you go to Mom and Pop's Web site. Normally, the HTML document for such an advertisement might look something like Listing 10.2.

Figure 10.2 : Advertisements for Mom and Pop's Candy Store.

Listing 10.2. advertisement.html.

<html> <head> <title>Front Page News</title> </head> <body> <h1>Today's Headlines</h1> <dl> <dt><b>Harvard Beats Yale!</b> <dd>In yesterday's football game, Harvard crushed Yale 64-3. The key play of the game was one Crimson linebacker Elbert Baquero sacked Bulldog quarterback Tony "the Tornado" with a minute and a half to play. <dt><b>Gates Steps Down!</b> <dd>In a move that shocked the software world, William Gates, III retired as CEO of Microsoft, stating "I want to kick back and enjoy my cash with my family." The Board of Directors appointed Matt Howitt to succeed him. </dl> <hr> <a href="http://www.mnpcandy.com/"> <img src="ad.gif" alt="Go to Mom and Pop's Candy Store's Web Site!"> </a> <hr> </body> </html>

Now it's time for you to collect your revenues from the candy store and attempt to renew your contract. Unfortunately, with the Web page in Listing 10.2, you have no way of knowing how many times people actually clicked on the advertisement to go to Mom and Pop's Web site.

Note

Although you might have no way of determining how many times people went from your Web site to the candy store's site, the candy store can usually determine this information. Many servers enable you to log the referring pages, provided the browser supplies this information. Unfortunately, not all browsers do supply this information. Besides, you might not want to rely on your customer's Reference logs for that information. The best way to record this information is to use a logging redirection script.

If you had a program that updated an access file every time someone clicked on that link, and then redirected the person to the new site, you could keep track of how many people visited the site because of the advertisement on your Web page.

The following list provides some specifications for a redirection manager:

It should enable people other than the site administrator to configure their own redirection.
It should flexibly enable you to specify either status code 301 or 302.
It should give the option of logging redirects.

This procedure is called CGI program redirect. You can use redirect in two ways. You can either access the redirect program directly, passing its instructions through QUERY_STRING, or you can have your server call redirect every time it cannot find a document.

For example, consider the preceding scenario. You want to log all accesses to Mom and Pop's Web site from the advertisement on your page. Instead of just specifying the URL in the <a href> tag, you could use redirect:

<a href="/cgi-bin/redirect?url=http://www.mnpcandy. Âcom/&log=/var/logs/redirect.log"> <img src="ad.gif" alt="Go to Mom and Pop's Candy Store's Web Site!"> </a>

Now, every time someone clicks on this advertisement, redirect would log the request to the
/var/logs/redirect.log file and would redirect that person to the Candy Store Web site.

To enable users to specify their own redirections without having access to any global configuration file, you need to configure your server to run the redirect program every time it cannot find a file. (See the following note for instructions on how to do this for the NCSA server.)

Note

By default, when you try to access a page that doesn't exist on a server, the server sends a Status: 404 header with an accompanying error message. Some servers, including NCSA, enable you to send a customized error message or to run a CGI program in place of the standard response.

To specify an alternative HTML error message or CGI error handler, edit the conf/srm.conf file and add the following line:

ErrorDocument 404 /alternate.html

where alternate.html is your customized error message. If you want to specify the program /cgi-bin/redirect as your error handler, add the following line:

ErrorDocument 404 /cgi-bin/redirect

The server sends three new environment variables to CGI error handlers: REDIRECT_REQUEST, REDIRECT_URL, and REDIRECT_STATUS. REDIRECT_REQUEST contains the complete browser request, REDIRECT_URL contains the URL the browser tried to access, and REDIRECT_STATUS contains the status code the server wants to return.

If the server cannot find a file, it will run the redirect program, which searches for a configuration file (.redirect) in the appropriate directory. The configuration file looks something like this:

LOGFILE=/var/logs/redirect.log STATUS=302 /index.html http://myserver.org/parents/index.html /mom.html /parents/mom.html 301

LOGFILE specifies where to log requests. If this line is absent, then redirect will not log requests. STATUS contains the default status code for redirection. If STATUS is not specified, redirect assumes a status code of 302 (temporarily moved). Finally, the redirect command follows this form:

document newlocation status

Document is the old document relative to the document root, and newlocation contains the new location of the file. Note that unlike the Redirect option for NCSA servers, newlocation does not have to be a URL. Status is optional; if you include it, it will use that status code.

It is time to begin coding. Figure 10.3 is a flowchart describing the program design. Two parts exist: one that handles redirects if called by the server, and the other that handles redirects specified in QUERY_STRING.

Begin with the easier of the two tasks: the Perl code that will handle redirects if given some CGI input. The task is simple:

Check to make sure redirect is not being called by the server. (Do this by looking for the existence of the REDIRECT_URL environment variable.)
Parse the input for url and log.
If log is defined, log the request.
Send a redirect request to url.

The code for this task is in Listing 10.3. The CGI input and output is straightforward. The ReadParse function parses the input. If no URL is specified, redirect sends an error message. If a log file is specified, redirect tries to append to the log file. If it can't append to the log file, it sends an error. If no errors occur, redirect sends a Status and Location header along with some HTML in case the browser does not properly handle redirects.

Listing 10.3. Handling redirects specified in QUERY_STRING.

require 'cgi-lib.pl'; # reads and parses input &ReadParse(*input); $logfile = $input{'log'} unless !($input{'log'}); $url = $input{'url'} unless !($input{'url'}); if (!$url) { &CgiDie("No URL Specified"); } if ($logfile) { # try to open and append to $LOGFILE # if that doesn't work, append to $DEFAULT_LOGFILE # if that doesn't work, send an error message open(LOG,">>$logfile") || &CgiDie("Can't Append to Logfile: $logfile"); print LOG "$url\n"; close(LOG); } # prints forwarding output in HTML to the user print "Status: 302 Forwarding to Another URL\n"; print "Location: $url\n\n"; print &HtmlTop("Web Forwarding"); print "<p>Go to: <a href=\"$url\">$url</a></p>\n"; print &HtmlBot;

Now the second part of the program is added: user configurable redirections. The steps are as follows:

Check for REDIRECT_URL.
If it exists, parse it.
Look for the configuration file in the directory specified by REDIRECT_URL.
Parse the configuration file.
Take the appropriate action.

Look at the second step for a moment. After you have the REDIRECT_URL, you need to determine where the directory is located. REDIRECT_URL tells you a relative directory in one of two forms:

/somedir/file.html /~username/somedir/file.html

You need to translate either of these two cases into the appropriate, full pathname. Translating the first case is fairly simple. Append the value of REDIRECT_URL to the value of the DOCUMENT_ROOT environment variable, and then remove the filename. If your document root were /usr/local/etc/httpd/, then

/somedir/file.html

would translate to

/usr/local/etc/httpd/somedir/

The second possibility presents more of a challenge. You need to extract the username, determine where the user's home directory is, append the name of the public HTML directory to this home directory, and then append the rest of the directories. For example, if your home directory was in /home/username and the public HTML directory was in public_html, then

/~username/somedir/file.html

would translate into

/home/username/public_html/somedir/

You can use the getpwnam() function to determine the home directory of the user. A CGI program cannot determine the name of the public HTML directory, so you can make that a user configurable item. The code to extract the directory from REDIRECT_URL is in Listing 10.4.

Listing 10.4. Extracting directory information from REDIRECT_URL.

$public_html = '/public_html'; $config = '.redirect'; if ($redirect_url = $ENV{'REDIRECT_URL'}) { $request = $redirect_url; $server_prefix = "http://$ENV{'SERVER_NAME'}:$ENV{'SERVER_PORT'}"; if ($redirect_url =~ /^\/\~/) { $redirect_url =~ s/^\/\~//; if ( ($end = index($redirect_url,'/')) < $[ ) { $end = $]; } $username = substr($redirect_url,0,$end); $prefix = &return_homedir($username); if (!$prefix) { &CgiDie("Invalid Directory"); } $start = index($redirect_url,'/'); $end = rindex($redirect_url,'/')+1; $suffix = $public_html.substr($redirect_url,$start,$end - $start); } else { $prefix = $ENV{'DOCUMENT_ROOT'}; $suffix = substr($redirect_url,0,rindex($redirect_url,'/')+1); } $config_loc = $prefix.$suffix.$config; }

Now that you know where to look for a configuration file, you must open and parse that file. If it doesn't exist, then you just send a regular File not found error message (status code 404). If it does exist, parse it for the options listed earlier. The code for opening and parsing the file is in Listing 10.5. Listing 10.6 contains the function not_found, which sends the appropriate 404 error message.

Listing 10.5. Parsing the configuration file.

if (-e $config_loc) { open(CONFIG,$config_loc) || &CgiDie("Can't Open Config File"); $FOUND = 0; while ($line = <CONFIG>) { $line =~ s/[\r\n]//; if ($line =~ /^LOG=/) { ($logfile = $line) =~ s/^LOG=//; } elsif ($line =~ /^STATUS=30[12]/) { ($status = $line) =~ s/^STATUS=//; } else { ($old,$new,$this_status) = split(/ /,$line); if ($old eq $request) { if (!$new) { &CgiDie("No New URL Specified"); } if (!($new =~ /^http:\/\//)) { $new = $server_prefix.$new; } $FOUND = 1; if ($logfile) { open(LOG,">>$logfile") || &CgiDie("Can't Append to Logfile: $logfile"); print LOG "$new\n"; close(LOG); } $status = $this_status unless (!$this_status); $status = 302 unless ($status); if ( ($status != 301) || ($status != 302) ) { $status = 302; } print "Status: $status\n"; print "Location: $new\n\n"; print &HtmlTop("Request Redirected"); print "<p>Request redirected to:\n"; print "<a href=\"$new\">$new</a></p>\n"; print &HtmlBot; } } } close(CONFIG); if (!$FOUND) { &not_found($request); } } else { &not_found($request); }

Listing 10.6. List for File Not Found.

sub not_found { local($request) = @_; print "Status: 404 File Not Found\n"; print &PrintHeader,&HtmlTop("File Not Found"); print <<EOM; print "<p>Error 404: $request could not be found on this server.</p>\n"; print &HtmlBot; }

You can now put together all of the code into one full-fledged application-redirect-listed in Listing 10.7.

Listing 10.7. Redirect-the finished application.

#!/usr/local/bin/perl require 'cgi-lib.pl'; $public_html = '/public_html'; $config = '.redirect'; if ($redirect_url = $ENV{'REDIRECT_URL'}) { $request = $redirect_url; $server_prefix = "http://$ENV{'SERVER_NAME'}:$ENV{'SERVER_PORT'}"; if ($redirect_url =~ /^\/\~/) { $redirect_url =~ s/^\/\~//; if ( ($end = index($redirect_url,'/')) < $[ ) { $end = $]; } $username = substr($redirect_url,0,$end); $prefix = &return_homedir($username); if (!$prefix) { &CgiDie("Invalid Directory"); } $start = index($redirect_url,'/'); $end = rindex($redirect_url,'/')+1; $suffix = $public_html.substr($redirect_url,$start,$end - $start); } else { $prefix = $ENV{'DOCUMENT_ROOT'}; $suffix = substr($redirect_url,0,rindex($redirect_url,'/')+1); } $config_loc = $prefix.$suffix.$config; if (-e $config_loc) { open(CONFIG,$config_loc) || &CgiDie("Can't Open Config File"); $FOUND = 0; while ($line = <CONFIG>) { $line =~ s/[\r\n]//; if ($line =~ /^LOG=/) { ($logfile = $line) =~ s/^LOG=//; } elsif ($line =~ /^STATUS=30[12]/) { ($status = $line) =~ s/^STATUS=//; } else { ($old,$new,$this_status) = split(/ /,$line); if ($old eq $request) { if (!$new) { &CgiDie("No New URL Specified"); } if (!($new =~ /^http:\/\//)) { $new = $server_prefix.$new; } $FOUND = 1; if ($logfile) { open(LOG,">>$logfile") || &CgiDie("Can't Append to Logfile: $logfile"); print LOG "$new\n"; close(LOG); } $status = $this_status unless (!$this_status); $status = 302 unless ($status); if ( ($status != 301) || ($status != 302) ) { $status = 302; } print "Status: $status\n"; print "Location: $new\n\n"; print &HtmlTop("Request Redirected"); print "<p>Request redirected to:\n"; print "<a href=\"$new\">$new</a></p>\n"; print &HtmlBot; } } } close(CONFIG); if (!$FOUND) { &not_found($request); } } else { &not_found($request); } } else { &ReadParse(*input); $logfile = $input{'log'} unless !($input{'log'}); $url = $input{'url'} unless !($input{'url'}); if (!$url) { &CgiDie("No URL Specified"); } if ($logfile) { # try to open and append to $LOGFILE # if that doesn't work, append to $DEFAULT_LOGFILE # if that doesn't work, send an error message open(LOG,">>$logfile") || &CgiDie("Can't Append to Logfile: $logfile"); print LOG "$url\n"; close(LOG); } print "Status: 302 Forwarding to Another URL\n"; print "Location: $url\n\n"; print &HtmlTop("Web Forwarding"); print "<p>Go to: <a href=\"$url\">$url</a></p>\n"; print &HtmlBot; } sub return_homedir { local($username) = @_; local($name,$passwd,$uid,$gif,$quota,$comment,$gcos,$dir,$shell) = getpwnam($username); return $dir; } sub not_found { local($request) = @_; print "Status: 404 File Not Found\n"; print &PrintHeader,&HtmlTop("File Not Found"); print <<EOM; print "<p>Error 404: $request could not be found on this server.</p>\n"; print &HtmlBot; }

The majority of redirect's source code is dedicated to determining where the configuration file is located and to parsing the file. Determining where the configuration file is located depends on your ability to manipulate the appropriate server variables. As usual, the CGI input and output routines seem almost trivial in this program; the real substance lies in determining how to take advantage of the environment variables and of parsing configuration files.

Generic Form Parser

One of CGI's most important contributions to the World Wide Web is its capability to collect input from the user. Although many CGI programs depend on this input to determine what to send back (for example, a search front-end to a database), perhaps the most basic use of CGI is to simply collect the information from the user and store it somewhere for the provider to look at later.

You see these types of applications all over the Web, ranging from forms soliciting comments to online voting booths to guestbooks. You can reduce all of these applications to these steps:

Collect the input submitted via a form.
Store the input somewhere, perhaps for later processing.
Send confirmation of receipt.

Instead of writing a separate application every time you need to collect data, you can write one generic forms parser that performs the preceding three steps. Such an application is developed here, starting with the most basic type of program and later extending it so that any user can easily configure it. Chapter 11, "Gateways," extends the program further so that it e-mails the results rather than store the information on disk.

Tip

With many browsers, you don't even need a CGI application to act as a generic form parser. If you specify a mailto: reference in the action parameter of the form tag, when the user submits the form, the encoded input will be e-mailed to the person specified in the action parameter.

For example, the following form will encode your input and e-mail it to eekim@hcs.harvard.edu: <form action="mailto:eekim@hcs.harvard.edu"> <input type=text name="item"> </form>

You can then parse and process the contents of your e-mail.

Simple Parser

This section is a very specific application. You are conducting a poll over a controversial topic, and you want to collect people's choices and their ages and store these results in a comma-delimited file. Because the application is so specific, I hard code the form into the CGI application. The completed program in both Perl and C are in Listings 10.8 and 10.9.

Listing 10.8. The poll.cgi program (in Perl).

#!/usr/local/bin/perl require 'cgi-lib.pl'; $file = '/home/poll/results.txt'; if (&ReadParse(*input)) { open(FILE,">>$file") || &CgiDie("Can't Append to $file"); print FILE "$input{'cola'},$input{'age'}\n"; close(FILE); print &PrintHeader,&HtmlTop("Thanks!"); print "<p>Thanks for filling out the poll!</p>\n"; print &HtmlBot; } else { print &PrintHeader,&HtmlTop("Poll"); print <<EOM; <form method=POST> <p>Which is better?</p> <ul> <li><input type=radio name="cola" value="coke" checked>Coke <li><input type=radio name="cola" value="pepsi">Pepsi </ul> <p>How old are you? <input type=text name="age"></p> <input type=submit> </form> EOM print &HtmlBot; }

Listing 10.9. The poll.cgi program (in C).

#include <stdio.h> #include "cgi-lib.h" #include "html-lib.h" #define OUTPUT "/home/poll/results.txt" int main() { llist entries; FILE *output; html_header(); if (read_cgi_input(&entries)) { if ( (output = fopen(OUTPUT,"a")) == NULL) { html_begin("Can't Append to File"); h1("Can't Append to File"); html_end(); exit(1); } fprintf(output,"%s,%s",cgi_val(entries,"cola"),cgi_val(entries,"age")); fclose(output); html_begin("Thanks!"); h1("Thanks!"); printf("<p>Thanks for filling out the poll!</p>\n"); html_end(); } else { html_begin("Poll"); h1("Poll"); printf("<form method=POST>\n"); printf("<p>Which is better?</p>\n"); printf("<ul>\n"); printf(" <li><input type=radio name=\"cola\ Â" value=\"coke\" checked>Coke\n"); printf(" <li><input type=radio name=\"cola\" value=\"pepsi\">Pepsi\n"); printf("</ul>\n"); printf("<p>How old are you? <input type=text name=\"age\"></p>\n"); printf("<input type=submit>\n"); printf("</form>\n); html_end(); } list_clear(&entries); }

Note

Appending to a file is normally an atomic operation, meaning it is a sequence of operations that must finish uninterrupted, so you don't have to worry about file locking.

Both the Perl and C versions of poll.cgi consist mostly of printing the appropriate HTML. Obtaining and parsing the input is one line of code in both versions.

In Perl:

if (&ReadParse(*input)) { … }

In C:

if (read_cgi_input(&entries)) { … }

Appending the results to a file is three lines.

In Perl:

open(FILE,">>$file") || &CgiDie("Can't Append to $file"); print FILE "$input{'cola'},$input{'age'}\n"; close(FILE);

In C:

if ( (output = fopen(OUTPUT,"a")) == NULL) { … } fprintf(output,"%s,%s",cgi_val(entries,"cola"),cgi_val(entries,"age")); fclose(output);

You could easily create this kind of program any time you need one.

You want to avoid this kind of effort, however. Instead of having a separate program for each task, you want one program that parses input and saves it to a file. In order to achieve this result using poll.cgi as the basis for your code, you need to do the following:

Remove the built-in form. You want the program to work with any form, so having a built-in form probably is not very useful.
Allow the user to specify the filename and location for saving the information.
Save any and all values to the file. You can no longer assume that you have only two values labeled cola and age.

To achieve the second step, allow the user to specify the filename and location via the PATH_INFO variable. The Perl and C source code for our simple but general forms parser are in Listings 10.10 and 10.11, respectively.

Listing 10.10. The parse-form program (in Perl).

#!/usr/local/bin/perl require 'cgi-lib.pl'; $file = $ENV{'PATH_INFO'}; if (!$file) { &CgiDie("No output file specified"); } &ReadParse(*input); open(FILE,">>$file") || &CgiDie("Can't Append to $file"); foreach $name (keys(%in)) { foreach (split("\0", $in{$name})) { ($value = $_) =~ s/\n/<br>\n/g; # since it's comma delimited, escape commas by # preceding them with slashes; must also escape slashes $value =~ s/,/\\,/; $value =~ s/\\/\\\\/; print FILE "$value,\n"; } } close(FILE); print &PrintHeader,&HtmlTop("Form Submitted"); print &HtmlBot;

Listing 10.11. The parse-form program (in C).

#include <stdio.h> #include <stdlib.h> #include "cgi-lib.h" #include "html-lib.h" char *escape_commas(char *str) { int i,j = 0; char *new = malloc(sizeof(char) * (strlen(str) * 2 + 1)); for (i = 0; i < strlen(str); i++) { if ( (str[i] == ',') || (str[i] == '\') ) { new[j] = '\'; j++; } new[j] = str[i]; j++; } new[j] = '\0'; return new; } int main() { llist entries; node *window; FILE *output; html_header(); if (PATH_INFO == NULL) { /* remember, cgi-lib.h defines PATH_INFO */ html_begin("No output file specified"); h1("No output file specified"); html_end(); exit(1); } read_cgi_input(&entries); if ( (output = fopen(PATH_INFO,"a")) == NULL) { html_begin("Can't Append to File"); h1("Can't Append to File"); html_end(); exit(1); } window = entries.head; while (window != NULL) { fprintf(output,"%s,",escape_commas((*window).entry.value)); } fclose(output); html_begin("Form Submitted"); h1("Form Submitted"); html_end(); list_clear(&entries); }

To use parse-form, include it in the action parameter of your HTML form with the full pathname of the output file. For example, the following will save the results of the form, comma-delimited, in the /var/adm/results.txt file:

<form action="/cgi-bin/parse-form/var/adm/results.txt">

The code is smaller, even though the program is more general because the built-in form has been removed. Even in its new, more general form, however, parse-form is still not quite satisfactory. First, the confirmation message is fairly unhelpful and ugly. You might want to send a custom message for each type of form.

Second, the output file is somewhat unhelpful. The point of parsing the data before saving it to a file is to simplify the parsing. For example, with poll.cgi, it's easier to parse a file like the following:

coke,15 pepsi,21 pepsi,10 than one like: cola=coke&age=15 cola=pepsi&age=21 cola=pepsi&age=10

Here, because you know the variables, you can assume that you won't have any commas in the response, for example. You can make no such assumption in general, though. What if you had a form that asked for comments? People might use commas when they fill out their comments. You need to escape these commas so that a clear distinction exists between the delimiter and actual commas. If your data is very complex, then a comma-delimited file might not be easier to parse than a CGI-encoded one.

Customizable Parser (form.cgi)

Form.cgi solves the other parsers' problems. Form.cgi reads a configuration file (either defined by PATH_INFO or the predefined default) and does the following:

Sends a customized HTML form if no input exists.
If there is input, writes to an output file in a format specified by a template file. Both files are defined in the configuration file.
Sends either a standard response or a customized response.

Because form.cgi requires some amount of text processing and because this text focuses on the algorithm rather than the programming implementation, form.cgi is written in Perl. You might already have Perl code for reading and parsing a configuration file from the redirection manager. This code has been adapted to read a configuration file that looks like the following:

FORM=/form.html TEMPLATE=/usr/local/etc/httpd/conf/template OUTPUT=/usr/local/etc/httpd/conf/output RESPONSE=/thanks.html

FORM and RESPONSE define HTML documents relative to the document root. The TEMPLATE and OUTPUT variables contain full pathnames to the template. If you do not define it in the configuration file, then form.cgi sends the same response as parse-form. The code for parsing the configuration file appears in Listing 10.12. This code will ignore any other line not in the specified form.

Listing 10.12. Parsing the configuration file.

$global_config = '/usr/local/etc/httpd/conf/form.conf'; # parse config file $config = $ENV{'PATH_INFO'}; if (!$config) { $config = $global_config; } open(CONFIG,$config) || &CgiDie("Could not open config file"); while ($line = <CONFIG>) { $line =~ s/[\r\n]//; if ($line =~ /^FORM=/) { ($form = $line) =~ s/^FORM=//; } elsif ($line =~ /^TEMPLATE=/) { ($template = $line) =~ s/^TEMPLATE=//; } elsif ($line =~ /^OUTPUT=/) { ($output = $line) =~ s/^OUTPUT=//; } elsif ($line =~ /^RESPONSE=/) { ($response = $line) =~ s/^RESPONSE=//; } } close(CONFIG);

The template file tells form.cgi the format of the output file. In order to specify the form values, you precede the field name with a dollar sign ($). For example, the template for a comma-delimited output file for the cola poll would look like the following:

$cola,$age

The input name must be only one word and consist entirely of alphanumeric characters. This example also has the capability to write the values of CGI environment variables to the file. To add this capability, you specify the environment variable name preceded by a percent symbol (%). For example, if you want to label each line of your cola poll's output file with the name of the machine where the browser resides, you would use the template file:

%REMOTE_HOST $cola,$age

If you want to just print a dollar sign or percent symbol, precede the symbol with a backslash (/). In order to print a backslash, precede the backslash with a backslash to print two backslashes (//).

How do you implement this? After you have read and parsed the form input, you need to read the template file and parse each line, replacing any variables with the appropriate form values. The code for this process appears in Listing 10.13.

Listing 10.13. Use template to define output file format.

# read template into list if ($template) { open(TMPL,$template) || &CgiDie("Can't Open Template"); @TEMPLATE = <TMPL>; close(TMPL); } else { &CgiDie("No template specified"); } # write to output file according to template if ($output) { open(OUTPUT,">>$output") || &CgiDie("Can't Append to $output"); foreach $line (@TEMPLATE) { if ( ($line =~ /\$/) || ($line =~ /\%/) ) { # form variables $line =~ s/^\$(\w+)/$input{$1}/; $line =~ s/([^\\])\$(\w+)/$1$input{$2}/g; # environment variables $line =~ s/^\%(\w+)/$ENV{$1}/; $line =~ s/([^\\])\%(\w+)/$1$ENV{$2}/g; } print OUTPUT $line; } close(OUTPUT); } else { &CgiDie("No output file specified"); }

Putting all of the code together results in form.cgi as listed in Listing 10.14. Form.cgi seems to overcome all of the shortcomings of the previous attempts at a general, generic form parser. It serves as a customizable, robust application that will probably save you a great deal of time.

Listing 10.14. The form.cgi program.

#!/usr/local/bin/perl require 'cgi-lib.pl'; $global_config = '/usr/local/etc/httpd/conf/form.conf'; # parse config file $config = $ENV{'PATH_INFO'}; if (!$config) { $config = $global_config; } open(CONFIG,$config) || &CgiDie("Could not open config file"); while ($line = <CONFIG>) { $line =~ s/[\r\n]//; if ($line =~ /^FORM=/) { ($form = $line) =~ s/^FORM=//; } elsif ($line =~ /^TEMPLATE=/) { ($template = $line) =~ s/^TEMPLATE=//; } elsif ($line =~ /^OUTPUT=/) { ($output = $line) =~ s/^OUTPUT=//; } elsif ($line =~ /^RESPONSE=/) { ($response = $line) =~ s/^RESPONSE=//; } } close(CONFIG); # process input or send form if (&ReadParse(*input)) { # read template into list if ($template) { open(TMPL,$template) || &CgiDie("Can't Open Template"); @TEMPLATE = <TMPL>; close(TMPL); } else { &CgiDie("No template specified"); } # write to output file according to template if ($output) { open(OUTPUT,">>$output") || &CgiDie("Can't Append to $output"); foreach $line (@TEMPLATE) { if ( ($line =~ /\$/) || ($line =~ /\%/) ) { # form variables $line =~ s/^\$(\w+)/$input{$1}/; $line =~ s/([^\\])\$(\w+)/$1$input{$2}/g; # environment variables $line =~ s/^\%(\w+)/$ENV{$1}/; $line =~ s/([^\\])\%(\w+)/$1$ENV{$2}/g; } print OUTPUT $line; } close(OUTPUT); } else { &CgiDie("No output file specified"); } # send either specified response or dull response if ($response) { print "Location: $response\n\n"; } else { print &PrintHeader,&HtmlTop("Form Submitted"); print &HtmlBot; } } elsif ($form) { # send default form print "Location: $form\n\n"; } else { &CgiDie("No default form specified"); }

Using form.cgi as a Guestbook

You can use form.cgi as a very primitive guestbook. To do so, you need to create a configuration file, a form and a response HTML file, and a template file that describes the format of the guestbook.

Assume the following specifications:

The document root is /usr/local/etc/httpd/htdocs/.
The add form is add.html (see Listing 10.15) and the response is thanks.html (see Listing 10.16). Both are in document root.
The configuration and template files are in /usr/local/etc/httpd/conf/guestbook.conf (see Listing 10.17) and /usr/local/etc/httpd/conf/guestbook.template (see Listing 10.18), respectively.
The guestbook is in /usr/local/etc/httpd/htdocs/guestbook.html.

Listing 10.15. The add.html program.

<html><head> <title>Add Entry</title> </head> <body> <h1>Add Entry</h1> <hr> <form action="/cgi-bin/form.cgi/usr/local/etc/httpd/conf/guestbook.conf"> <p>Name: <input name="name"><br> Email: <input name="email"><br> URL: <input name="url"></p> <p> <textarea name="message" rows=10 cols=70> </textarea> </p> <input type=submit value="Sign Guestbook"> </form> <hr> </body></html>

Listing 10.16. The thanks.html program.

<html><head> <title>Thanks!</title> </head> <body> <h1>Thanks!</h1> <p>Thanks for submitting your entry! You can <a href="/guestbook.html">look at the guestbook.</a></p> </body></html>

Listing 10.17. The guestbook.conf program.

FORM=/add.html RESPONSE=/thanks.html TEMPLATE=/usr/local/etc/httpd/conf/guestbook.template OUTPUT=/usr/local/etc/httpd/htdocs/guestbook.html

Listing 10.18. The guestbook.template program.

<p><b>From <a href="$url">$name</a> <a href="mailto:$email">$email</a></p> <pre>$message</pre> <hr>

By creating the text files in Listings 10.15 through 18, you have created a guestbook without one extra line of CGI programming. Remember, however, that you have a very rudimentary guestbook, lacking features such as date-stamping and filtering greater than (>) and less than (<) symbols. The guestbook examples in Chapter 5, "Input," and Chapter 6, "Programming Strategies," are superior to this primitive example. Regardless, form.cgi can save the CGI developer a great deal of time.

Summary

The examples in this chapter were minimalistic as far as CGI programs go. The complexity came in manipulating CGI environment variables, in parsing input, and in sending output. All other routines either read and wrote data to a file, or they manipulated text.

The applications in this chapter-the content negotiator, the redirection manager, and the form parser-as well as the techniques applied, are enough to cover the majority of CGI programs that most people will ever need to write. The remainder of this book focuses on more specialized, advanced applications.

Chapter 10

Basic Applications

CONTENTS