Appendix D

cgi-lib.ol Reference Guide


CONTENTS

  • Using cgi-lib.pl
  • Routines and Variables
  • Source Code

    Steve Brenner's cgi-lib.pl was one of the first CGI programming libraries available, and it is widely used. cgi-lib.pl greatly simplifies CGI programming in Perl by providing parsing libraries and other useful CGI routines. It is written for Perl 4, although it will work with Perl 5.

    The primary function of cgi-lib.pl is to parse form input. It parses form input and places it in an associative array keyed by the name of the field. This library has evolved since its first release and can handle both regular form decoded input (application/x-www-form-urlencoded, data that is sent as arguments in the URL itself) and the multipart form decoded input used for the newly proposed HTML file uploading (multipart/form-data, data which is sent as standard input like a multipart e-mail attachment).

    This appendix presents a very simple example of how to use cgi-lib.pl and describes each available routine. The complete source code for cgi-lib.pl appears at the end of this appendix. The library is also available on the CD-ROM provided with this book.

    Note
    I have refrained from discussing Perl 5 in this book for a number of reasons, most of them listed in the Introduction. However, I would highly encourage you to explore Perl 5 and some of its nice improvements over Perl 4. Although Perl 5 is slightly more complex than Perl 4 conceptually and syntactically, the tools you gain make it worth the time you spend learning it.
    Lincoln Stein has written a very good class library for Perl 5 called CGI.pm, which includes support for form parsing, HTML form output, and internal debugging. If you know Perl 5 or plan to learn it, I highly recommend you take a look. It is available on the included CD-ROM; more information is at <URL:http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html>;

    Using cgi-lib.pl

    To use cgi-lib.pl, you must place it either in the same directory as your Perl scripts or in the global directory of Perl libraries (normally located in /usr/lib/perl on UNIX machines). On UNIX machines, make sure cgi-lib.pl is world-readable.

    Using cgi-lib.pl requires two steps: including the library and calling the functions. A very minimal CGI program using cgi-lib.pl is

    #!/usr/local/bin/perl

    if (&ReadParse(*input)) {
        print &PrintHeader,&HtmlTop("Form Results");
        print &PrintVariables,&HtmlBot;
    }
    else {
        print &PrintHeader,&HtmlTop("Entry Form");
        print <<EOM;
    <form method=POST>
    <p>Name: <input name="name"><br>
    Age: <input name="age"></p>
    <p><input type=submit></p>
    </form>
    EOM
        print &HtmlBot;
    }

    This program does the following:

    The main routine is &ReadParse, which takes each form name/value pair and inserts it into the associative array %input. The array is keyed by name, so $input{'name'} is equal to 'value'.

    &PrintHeader, &HtmlTop, &PrintVariables, and &HtmlBot are all HTML output functions described in more detail in the next section.

    Routines and Variables

    In this section, I have listed and defined the functions and variables made available in the cgi-lib.pl library.

    &ReadParse

    &ReadParse parses form input of MIME types application/x-www-form-urlencoded and multipart/form-data. Pass it the variable *varname and it will place the parsed form data in the associative array %varname in the form:

    $varname{name} = value

    If a name has more than one associated value, the values are separated by a null character. You can use the &SplitParam function to separate the value of $varname{name} into its multiple values.

    If you want &ReadParse to save files uploaded using HTML file upload, you must change the value of $cgi-lib'writefiles in cgi-lib.pl from 0 to 1.

    &PrintHeader

    &PrintHeader returns the following string:

    Content-Type: text/html\n\n

    Here is how this function is called:

    print &PrintHeader;

    &HtmlTop

    &HtmlTop accepts a string that is used between the <title> tags and the <h1> tags. It returns a valid HTML header. For example, the following:

    print &HtmlTop("Hello, World!");

    prints this:

    <html><head>
    <title>Hello, World!</title>
    </head>

    <body>
    <h1>Hello, World!</h1>

    &HtmlBot

    &HtmlBot is the complement of &HtmlTop and returns the HTML footer string.

    </body> </html>

    &SplitParam

    &SplitParam splits a multivalued parameter returned by the associative array from &ReadParse and returns a list containing each separate element. For example, if you had the following form:

    <form method=POST>
    Street 1: <input name="street"><br>
    Street 2: <input name="street"><br>
    <input type=submit>
    </form>

    and you parsed it using this:

    &ReadParse(*input);

    the following is the value of $input{'street'}:

    value1\0value2

    To split these values, you can do the following:

    @streets = &SplitParam($input{'street'});

    which would return this list:

    (value1, value2)

    &MethGet

    &MethGet returns 1 if REQUEST_METHOD equals GET; otherwise, it returns 0.

    &MethPost

    &MethPost returns 1 if REQUEST_METHOD equals POST; otherwise, it returns 0.

    &MyBaseUrl

    &MyBaseUrl returns the URL without the QUERY_STRING or PATH_INFO. For example, if the URL were the following:

    http://hcs.harvard.edu/cgi-bin/finger?eekim

    &MyBaseUrl would return the following:

    http://hcs.harvard.edu:80/cgi-bin/finger

    &MyFullUrl

    &MyFullUrl returns the complete URL including any QUERY_STRING or PATH_INFO values. For example, if your URL is

    http://hcs.harvard.edu/cgi-bin/counter.cgi/~eekim?file.html

    &MyFullUrl returns the following:

    http://hcs.harvard.edu:80/cgi-bin/counter.cgi/~eekim?file.html

    &CgiError

    &CgiError accepts a list of strings and prints them in the form of an error message. The first string is inserted between <title> and <h1> tags; all subsequent strings are placed between <p> tags. If no strings are provided, the default headline and title of the message is

    Error: script $name encountered fatal error

    where $name is the value of &MyFullUrl. For example, the following:

    &CgiError("Error","Cannot open file","Please report to web admin.");

    returns this HTML message:

    <html><head>
    <title>Error</title>
    </head>

    <body>
    <h1>Error</h1>

    <p>Cannot open file</p>

    <p>Please report to web admin.</p>

    </body> </html>

    &CgiDie

    The same as &CgiError except it does a die when finished. die prints the error message to stderr.

    &PrintVariables

    &PrintVariables returns a definition list (<dl>) of each name and value pair. For example, given the name and value pairs (name, eugene) and (age, 21), &PrintVariables returns the following:

    <dl compact>
    <dt><b>name</b>
    <dd><i>eugene</i>:<br>
    <dt><b>age</b>
    <dd><i>21</i>:<br>
    </dl>

    &PrintEnv

    &PrintEnv returns a definition list of all the environment variables.

    Source Code

    This section contains a full listing of the cgi-lib.pl library.


    Listing D.1. The cgi-lib.pl program.
    # Perl Routines to Manipulate CGI input
    # S.E.Brenner@bioc.cam.ac.uk
    # $Id: cgi-lib.pl,v 2.8 1996/03/30 01:36:33 brenner Rel $
    #
    # Copyright (c) 1996 Steven E. Brenner
    # Unpublished work.
    # Permission granted to use and modify this library so long as the
    # copyright above is maintained, modifications are documented, and
    # credit is given for any use of the library.
    #
    # Thanks are due to many people for reporting bugs and suggestions
    # especially Meng Weng Wong, Maki Watanabe, Bo Frese Rasmussen,
    # Andrew Dalke, Mark-Jason Dominus, Dave Dittrich, Jason Mathews

    # For more information, see:
    #     http://www.bio.cam.ac.uk/cgi-lib/

    ($cgi_lib'version = '$Revision: 2.8 $') =~ s/[^.\d]//g;


    # Parameters affecting cgi-lib behavior
    # User-configurable parameters affecting file upload.
    $cgi_lib'maxdata    = 131072;    # maximum bytes to accept via POST - 2^17
    $cgi_lib'writefiles =      0;    # directory to which to write files, or
                                     # 0 if files should not be written
    $cgi_lib'filepre    = "cgi-lib"; # Prefix of file names, in directory above

    # Do not change the following parameters unless you have special reasons
    $cgi_lib'bufsize  =  8192;    # default buffer size when reading multipart
    $cgi_lib'maxbound =   100;    # maximum boundary length to be encounterd
    $cgi_lib'headerout =   0;    # indicates whether the header has been printed


    # ReadParse
    # Reads in GET or POST data, converts it to unescaped text, and puts
    # key/value pairs in %in, using "\0" to separate multiple selections

    # Returns >0 if there was input, 0 if there was no input
    # undef indicates some failure.

    # Now that cgi scripts can be put in the normal file space, it is useful
    # to combine both the form and the script in one place. If no parameters
    # are given (i.e., ReadParse returns FALSE), then a form could be output.

    # If a reference to a hash is given, then the data will be stored in that
    # hash, but the data from $in and @in will become inaccessable.
    # If a variable-glob (e.g., *cgi_input) is the first parameter to ReadParse,
    # information is stored there, rather than in $in, @in, and %in.
    # Second, third, and fourth parameters fill associative arrays analagous to
    # %in with data relevant to file uploads.

    # If no method is given, the script will process both command-line arguments
    # of the form: name=value and any text that is in $ENV{'QUERY_STRING'}
    # This is intended to aid debugging and may be changed in future releases

    sub ReadParse {
      local (*in) = shift if @_;    # CGI input
      local (*incfn,                # Client's filename (may not be provided)
             *inct,                 # Client's content-type (may not be provided)
             *insfn) = @_;          # Server's filename (for spooled files)
      local ($len, $type, $meth, $errflag, $cmdflag, $perlwarn);

      # Disable warnings as this code deliberately uses local and environment
      # variables which are preset to undef (i.e., not explicitly initialized)
      $perlwarn = $^W;
      $^W = 0;

      # Get several useful env variables
      $type = $ENV{'CONTENT_TYPE'};
      $len = $ENV{'CONTENT_LENGTH'};
      $meth = $ENV{'REQUEST_METHOD'};

      if ($len > $cgi_lib'maxdata) { #'
          &CgiDie("cgi-lib.pl: Request to receive too much data: $len bytes\n");
      }

      if (!defined $meth || $meth eq '' || $meth eq 'GET' ||
          $type eq 'application/x-www-form-urlencoded') {
        local ($key, $val, $i);

        # Read in text
        if (!defined $meth || $meth eq '') {
          $in = $ENV{'QUERY_STRING'};
          $cmdflag = 1; # also use command-line options
        } elsif($meth eq 'GET' || $meth eq 'HEAD') {
          $in = $ENV{'QUERY_STRING'};
        } elsif ($meth eq 'POST') {
            $errflag = (read(STDIN, $in, $len) != $len);
        } else {
          &CgiDie("cgi-lib.pl: Unknown request method: $meth\n");
        }

        @in = split(/[&;]/,$in);
        push(@in, @ARGV) if $cmdflag; # add command-line parameters

        foreach $i (0 .. $#in) {
          # Convert plus to space
          $in[$i] =~ s/\+/ /g;

          # Split into key and value.
          ($key, $val) = split(/=/,$in[$i],2); # splits on the first =.

          # Convert %XX from hex numbers to alphanumeric
          $key =~ s/%([A-Fa-f0-9]{2})/pack("c",hex($1))/ge;
          $val =~ s/%([A-Fa-f0-9]{2})/pack("c",hex($1))/ge;

          # Associate key and value
          $in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple separator
          $in{$key} .= $val;
        }

      } elsif ($ENV{'CONTENT_TYPE'} =~ m#^multipart/form-data#) {
        # for efficiency, compile multipart code only if needed
    $errflag = !(eval <<'END_MULTIPART');

        local ($buf, $boundary, $head, @heads, $cd, $ct, $fname, $ctype, $blen);
        local ($bpos, $lpos, $left, $amt, $fn, $ser);
        local ($bufsize, $maxbound, $writefiles) =
          ($cgi_lib'bufsize, $cgi_lib'maxbound, $cgi_lib'writefiles);


        # The following lines exist solely to eliminate spurious warning messages
        $buf = '';

        ($boundary) = $type =~ /boundary="([^"]+)"/; #"; # find boundary
        ($boundary) = $type =~ /boundary=(\S+)/ unless $boundary;
        &CgiDie ("Boundary not provided") unless $boundary;
        $boundary = "--" . $boundary;
        $blen = length ($boundary);

        if ($ENV{'REQUEST_METHOD'} ne 'POST') {
          &CgiDie("Invalid request method for multipart/form-data: $meth\n");
        }

        if ($writefiles) {
          local($me);
          stat ($writefiles);
          $writefiles = "/tmp" unless -d _ && -r _ && -w _;
          # ($me) = $0 =~ m#([^/]*)$#;
          $writefiles .= "/$cgi_lib'filepre";
        }

        # read in the data and split into parts:
        # put headers in @in and data in %in
        # General algorithm:
        #   There are two dividers: the border and the '\r\n\r\n' between
        # header and body. Iterate between searching for these
        #   Retain a buffer of size(bufsize+maxbound); the latter part is
        # to ensure that dividers don't get lost by wrapping between two bufs
        #   Look for a divider in the current batch. If not found, then
        # save all of bufsize, move the maxbound extra buffer to the front of
        # the buffer, and read in a new bufsize bytes. If a divider is found,
        # save everything up to the divider. Then empty the buffer of everything
        # up to the end of the divider. Refill buffer to bufsize+maxbound
        # Note slightly odd organization. Code before BODY: really goes with
        # code following HEAD:, but is put first to 'pre-fill' buffers. BODY:
        # is placed before HEAD: because we first need to discard any 'preface,'
        # which would be analagous to a body without a preceeding head.

        $left = $len;
       PART: # find each part of the multi-part while reading data
        while (1) {
          last PART if $errflag;

          $amt = ($left > $bufsize+$maxbound-length($buf)
                  ?  $bufsize+$maxbound-length($buf): $left);
          $errflag = (read(STDIN, $buf, $amt, length($buf)) != $amt);
          $left -= $amt;

          $in{$name} .= "\0" if defined $in{$name};
          $in{$name} .= $fn if $fn;

          $name=~/([-\w]+)/; # This allows $insfn{$name} to be untainted
          if (defined $1) {
            $insfn{$1} .= "\0" if defined $insfn{$1};
            $insfn{$1} .= $fn if $fn;
          }

         BODY:
          while (($bpos = index($buf, $boundary)) == -1) {
            if ($name) { # if no $name, then it's the prologue -- discard
              if ($fn) { print FILE substr($buf, 0, $bufsize); }
              else     $in{$name} .= substr($buf, 0, $bufsize); }
            }
            $buf = substr($buf, $bufsize);
            $amt = ($left > $bufsize ? $bufsize : $left); #$maxbound==length($buf);
            $errflag = (read(STDIN, $buf, $amt, $maxbound) != $amt);
            $left -= $amt;
          }
          if (defined $name) { # if no $name, then it's the prologue -- discard
            if ($fn) { print FILE substr($buf, 0, $bpos-2); }
            else     { $in {$name} .= substr($buf, 0, $bpos-2); } # kill last \r\n
          }
          close (FILE);
          last PART if substr($buf, $bpos + $blen, 4) eq "--\r\n";
          substr($buf, 0, $bpos+$blen+2) = '';
          $amt = ($left > $bufsize+$maxbound-length($buf)
                  ?  $bufsize+$maxbound-length($buf) : $left);
          $errflag = (read(STDIN, $buf, $amt, length($buf)) != $amt);
          $left -= $amt;


          undef $head; undef $fn;
         HEAD:
          while (($lpos = index($buf, "\r\n\r\n")) == -1) {
            $head .= substr($buf, 0, $bufsize);
            $buf = substr($buf, $bufsize);
            $amt = ($left > $bufsize ? $bufsize : $left); #$maxbound==length($buf);
            $errflag = (read(STDIN, $buf, $amt, $maxbound) != $amt);
            $left -= $amt;
          }
          $head .= substr($buf, 0, $lpos+2);
          push (@in, $head);
          @heads = split("\r\n", $head);
          ($cd) = grep (/^\s*Content-Disposition:/i, @heads);
          ($ct) = grep (/^\s*Content-Type:/i, @heads);

          ($name) = $cd =~ /\bname="([^"]+)"/i; #";
          ($name) = $cd =~ /\bname=([^\s:;]+)/i unless defined $name;

          ($fname) = $cd =~ /\bfilename="([^"]*)"/i; #"; # filename can be null-str
          ($fname) = $cd =~ /\bfilename=([^\s:;]+)/i unless defined $fname;
          $incfn{$name} .= (defined $in{$name} ? "\0" : "") . $fname;

          ($ctype) = $ct =~ /^\s*Content-type:\s*"([^"]+)"/i; #";
          ($ctype) = $ct =~ /^\s*Content-Type:\s*([^\s:;]+)/i unless defined $ctype
    ;
          $inct{$name} .= (defined $in{$name} ? "\0" : "") . $ctype;

          if ($writefiles && defined $fname) {
            $ser++;
            $fn = $writefiles . ".$$.$ser";
            open (FILE, ">$fn") || &CgiDie("Couldn't open $fn\n");
          }
          substr($buf, 0, $lpos+4) = '';
          undef $fname;
          undef $ctype;
        }

    1;
    END_MULTIPART
      &CgiDie($@) if $errflag;
      } else {
        &CgiDie("cgi-lib.pl: Unknown Content-type: $ENV{'CONTENT_TYPE'}\n");
      }


      $^W = $perlwarn;

      return ($errflag ? undef : scalar(@in));
    }


    # PrintHeader
    # Returns the magic line which tells WWW that we're an HTML document

    sub PrintHeader {
      return "Content-type: text/html\n\n";
    }


    # HtmlTop
    # Returns the <head> of a document and the beginning of the body
    # with the title and a body <h1> header as specified by the parameter

    sub HtmlTop
    {
      local ($title) = @_;

      return <<END_OF_TEXT;
    <html>
    <head>
    <title>$title</title>
    </head>
    <body>
    <h1>$title</h1>
    END_OF_TEXT
    }


    # HtmlBot
    # Returns the </body>, </html> codes for the bottom of every HTML page

    sub HtmlBot
    {
      return "</body>\n</html>\n";
    }


    # SplitParam
    # Splits a multi-valued parameter into a list of the constituent parameters

    sub SplitParam
    {
      local ($param) = @_;
      local (@params) = split ("\0", $param);
      return (wantarray ? @params : $params[0]);
    }


    # MethGet
    # Return true if this cgi call was using the GET request, false otherwise

    sub MethGet {
      return (defined $ENV{'REQUEST_METHOD'} && $ENV{'REQUEST_METHOD'} eq "GET");
    }


    # MethPost
    # Return true if this cgi call was using the POST request, false otherwise

    sub MethPost {
      return (defined $ENV{'REQUEST_METHOD'} && $ENV{'REQUEST_METHOD'} eq "POST");
    }


    # MyBaseUrl
    # Returns the base URL to the script (i.e., no extra path or query string)
    sub MyBaseUrl {
      local ($ret, $perlwarn);
      $perlwarn = $^W; $^W = 0;
      $ret = 'http://' . $ENV{'SERVER_NAME'} .
             ($ENV{'SERVER_PORT'} != 80 ? ":$ENV{'SERVER_PORT'}" : '') .
             $ENV{'SCRIPT_NAME'};
      $^W = $perlwarn;
      return $ret;
    }


    # MyFullUrl
    # Returns the full URL to the script (i.e., with extra path or query string)
    sub MyFullUrl {
      local ($ret, $perlwarn);
      $perlwarn = $^W; $^W = 0;
      $ret = 'http://' . $ENV{'SERVER_NAME'} .
             ($ENV{'SERVER_PORT'} != 80 ? ":$ENV{'SERVER_PORT'}" : '') .
             $ENV{'SCRIPT_NAME'} . $ENV{'PATH_INFO'} .
             (length ($ENV{'QUERY_STRING'}) ? "?$ENV{'QUERY_STRING'}" : '');
      $^W = $perlwarn;
      return $ret;
    }


    # MyURL
    # Returns the base URL to the script (i.e., no extra path or query string)
    # This is obsolete and will be removed in later versions
    sub MyURL {
      return &MyBaseUrl;
    }


    # CgiError
    # Prints out an error message which which containes appropriate headers,
    # markup, etcetera.
    # Parameters:
    #  If no parameters, gives a generic error message
    #  Otherwise, the first parameter will be the title and the rest will
    #  be given as different paragraphs of the body

    sub CgiError {
      local (@msg) = @_;
      local ($i,$name);

      if (!@msg) {
        $name = &MyFullUrl;
        @msg = ("Error: script $name encountered fatal error\n");
      };

      if (!$cgi_lib'headerout) { #')
        print &PrintHeader;
        print "<html>\n<head>\n<title>$msg[0]</title>\n</head>\n<body>\n";
      }
      print "<h1>$msg[0]</h1>\n";
      foreach $i (1 .. $#msg) {
        print "<p>$msg[$i]</p>\n";
      }

      $cgi_lib'headerout++;
    }


    # CgiDie
    # Identical to CgiError, but also quits with the passed error message.

    sub CgiDie {
      local (@msg) = @_;
      &CgiError (@msg);
      die @msg;
    }


    # PrintVariables
    # Nicely formats variables. Three calling options:
    # A non-null associative array - prints the items in that array
    # A type-glob - prints the items in the associated assoc array
    # nothing - defaults to use %in
    # Typical use: &PrintVariables()

    sub PrintVariables {
      local (*in) = @_ if @_ == 1;
      local (%in) = @_ if @_ > 1;
      local ($out, $key, $output);

      $output = "\n<dl compact>\n";
      foreach $key (sort keys(%in)) {
        foreach (split("\0", $in{$key})) {
          ($out = $_) =~ s/\n/<br>\n/g;
          $output .= "<dt><b>$key</b>\n <dd>:<i>$out</i>:<br>\n";
        }
      }
      $output .= "</dl>\n";

      return $output;
    }

    # PrintEnv
    # Nicely formats all environment variables and returns HTML string
    sub PrintEnv {
      &PrintVariables(*ENV);
    }


    # The following lines exist only to avoid warning messages
    $cgi_lib'writefiles = $cgi_lib'writefiles;
    $cgi_lib'bufsize = $cgi_lib'bufsize ;
    $cgi_lib'maxbound = $cgi_lib'maxbound;
    $cgi_lib'version = $cgi_lib'version;

    1; #return true