16
Advanced CGI/HTML


This chapter will attempt to explain some of the more advanced aspects of CGI programming and HTML in general. CGI programs, as you can probably tell, are very simple transaction-based programs that might perform a simple task. Suppose that you wanted to take that simple task further and do more heavy-duty type work. You will find that the CGI paradigm itself cannot handle some of the basic needs of more advanced applications without the help of some workarounds. There are also some new proposals under way to enhance the capability of HTML. Hopefully, after reading this chapter, you will have the necessary tools and information to perform robust operations using CGI and HTML.

Sessions

As you have seen in previous chapters, one sorely missed concept in CGI programming is that of a session or preserved state among transactions. A simple example of this might be keeping track of items in a virtual shopping cart while the user happily goes searching for more items to buy. Each visit to a page is a single transaction, which means that once the CGI program is finished, it remembers nothing about what it just did. You will see that the need for an extended session is essential when you are implementing anything more than a simple fill-out form handler.

The Need for an Extended Session

Sessions can be thought of as a combination of one or more transactions. As we have seen, CGI programs are capable of nothing more than a single transaction. The shopping cart is a good example of this need. Another might be keeping track of the user information while the user is visiting your sight. Perhaps the .htaccess security on your Web pages isn't quite enough. Maybe you'd like the user to login to your site through a login CGI script. Another example might be to keep track of what the user is searching for if you are providing a search engine on the Web.

All of these examples show that there is a crucial need for maintaining an extended session across invocations of your CGI program. There are certain tricks that one can play to maintain a state across separate and distinct CGI transactions. One of these tricks has already been discussed in Chapter 4, "HTML Forms--The Foundation of an Interactive Web." Another trick is a bit more expensive in terms of performance although it is a more secure approach. We will show you the techniques as well as use them in some, hopefully, useful examples. Data and State Preservation Between Transactions As we have seen in Chapter 4, there exists in the HTML form specification the concept of a hidden field. Obviously a hidden field serves no purpose other than to pass data from the HTML form back to the CGI script. In fact, all fields are for this general purpose. The main difference with the hidden field is that the user cannot alter this data and therefore it is a means by which you can send data back to yourself. This is the first and most obvious way to maintain a state across sessions. What you would do is simply put the data you wish to retain across transactions into a hidden field and then examine the contents of the hidden field the next time your CGI script is called. This is a neat trick, but remember that each transaction causes this data to travel across the wire for everyone to see! This is obviously not a good place to keep track of passwords and credit-card numbers.

Another way of preserving state between transactions is by using files on the server. What you can do is create a unique file instance perhaps based on either the REMOTE_HOST environment variable or the REMOTE_ADDR environment variable if REMOTE_HOST is not available. By definition of the CGI specification, at least REMOTE_ADDR should be set by the server prior to execution of the CGI program. You may also want to include the current date or time with this file as you may wish to control the expiration of a user's session. As you might guess, this method will most likely be slower due to the fact that you must always open the session file upon each script invocation. The overhead of doing this might be worthwhile depending on the amount of data you wish to preserve. Disk access is still faster than network transmission. There are other dangers of using a server-based session file that we will discuss.

A third approach is to use a combination of the first two approaches. You may wish to provide a login screen or use a protected script to obtain a userid when the user first enters your CGI session. You can then create a one-way hash of the userid perhaps again with the current date or time and pass this hash value back in a hidden field for continuous authentication between transactions.

Let's now take a look at a few examples using these methods of state preservation. Our first example will be a simple calculator, shown in Listing 16.1. We will maintain the current value and allow add, subtract, multiply, and divide operations. The data we will want to retain across transactions is the current value.

Listing 16.1. A very simple calculator

#!/usr/local/bin/perl



use CGI::Form;

$q = new CGI::Form;

print $q->header();

print $q->start_html(-title=>`A Very Simple Calculator');

print "<H1>A Very Simple Calculator</H1>\n";



if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {

   $val=0;

   &printForm($q,$val);

} else {

   $val=$q->param(`hiddenValue');

   $modifier=$q->param(`Modifier');

   if ($modifier=~/^[\d]+$/) {

      $op=$q->param(`Action');

      if ($op eq "Add") {

         $val+=$modifier;

      } elsif ($op eq "Subtract") {

         $val-=$modifier;

      } elsif ($op eq "Multiply") {

         $val*=$modifier;

      } elsif ($op eq "Divide") {

         $val/=$modifier;

      }

   } else {

      print "<P><STRONG>Please enter a numeric value!</STRONG><BR><BR>\n";

   }

   $q->param(`hiddenValue',$val);

   &printForm($q,$val);

}

print $q->end_html();



sub printForm {

   my($q,$val)=@_;

   print "<P>The current value is: $val\n";

   print "<P>Please enter a value and select an operation.\n<BR>";

   print $q->start_multipart_form();

   print $q->hidden(-name=>`hiddenValue',-value=>$val);

   print "<TABLE><TR><TD COLSPAN=4>\n";

   print $q->textfield(-name=>`Modifier',-size=>12,-maxlength=>5);

   print "</TD></TR>\n<TR><TD>\n";

   print $q->submit(-name=>`Action',-value=>`Add');

   print "\n</TD><TD>\n";

   print $q->submit(-name=>`Action',-value=>`Subtract');

   print "\n</TD><TD>\n";

   print $q->submit(-name=>`Action',-value=>`Multiply');

   print "\n</TD><TD>\n";

   print $q->submit(-name=>`Action',-value=>`Divide');

   print "\n</TD><TD>\n";

   print "</TR></TABLE>\n";

   print $q->end_form;

}

You will note that in this example, we use the hidden field Value to retain the current value of the calculator. We do some very basic field validation to make sure the user actually gave us a number. Obviously, this calculator will accept only integer values. Remember that when the user leaves our CGI program and returns, all state is now lost. Figure 16.1 shows the calculator as it appears in the Web browser.

Let's look at another example now in Listing 16.2 that will make use of a file to retain state about a certain client. We will use the REMOTE_ADDR CGI environment variable to distinguish between clients. In this example, we will be allowing users to enter our Web site and write whatever they like in a big text field and then store the contents of that field in a file for them to later come back to and modify. Additionally, we will keep track of how many times users submitted changes to their text and display that value to them. Our example will simply use the /tmp/visitors directory to store these files and not worry about cleaning up these files.

Figure 16.1. The calculator example in the browser.

Listing 16.2. A personal notepad.

#!/usr/local/bin/perl



$directory = "/tmp/visitors";

use CGI::Form;

$q = new CGI::Form;

print $q->header();

print $q->start_html(-title=>`A Personal Notepad');

print "<H1>A Personal Notepad</H1>\n";



$client=$q->cgi->var(`REMOTE_ADDR');

if ($client eq "") {

   print "<P>Sorry, I don't know who you are. I can't continue\n";

   exit;

} else {

   if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {

      if (-e "$directory/$client") {

         ($text,$visits)=&getUserData("$directory/$client");

      } else {

         $text="";

         $visits=0;

      }

      &printForm($q,$text,$visits);

   } else {

      if ($q->param(`Action') eq "Submit") {

         ($whocares,$visits)=&getUserData("$directory/$client");

         $text=$q->param(`Text');

         $visits++;

         &setUserData("$directory/$client",$text,$visits);

      }

      &printForm($q,$text,$visits);

   }

}

print $q->end_html();



sub printForm {

   my($q)=@_;

   $q->param(`Text',$text);

   print "<P>You have modified this notepad $visits times.<P>\n";

   print $q->start_multipart_form();

   print $q->textarea(-name=>`Text',-default=>$text,

                      -rows=>20,-columns=>50);

   print "<BR>";

   print $q->submit(-name=>`Action',-value=>`Submit');

   print $q->end_form;

}



sub getUserData {

   my($file)=@_;

   if (open(IN,"< $file")) {

      $visits=<IN>;

      $text=join(``,<IN>);

      close(IN);

   } else {

      $text="";

      $visits=0;

   }

   return($text,$visits);

}



sub setUserData {

   my($file,$text,$visits)=@_;

   if (open(OUT,"> $file")) {

      print OUT "$visits\n";

      print OUT $text;

      close(OUT);

   } else {

      # This is an error condition that shouldn't happen.

      # Handle it properly just the same.

      print "<P>Error! I cannot save your text. Sorry!<BR>\n";

   }

}

This solution is probably more efficient if you need to save larger amounts of data. However, this solution can also drain your server's resources very quickly. Be sure that you have enough machine resources to handle the expected number of clients for which you'll need to maintain state. Figure 16.2 shows the personal notepad as it appears in the browser.

Hopefully, these examples will be enough to get you thinking about how you can implement your own persistent CGI sessions across multiple transactions.

Figure 16.2. The personal notepad.

Execution and Shell-Like Interfaces Using this idea of preserving session information, we can take the example further and provide a simple remote login type of shell for execution of things on the server. What we'll do in Listing 16.3 is provide a login form, and each subsequent calls to the CGI program will be a command line from which the user can remotely execute programs. This can also be accomplished within a protected directory allowing the login action to be done within the browser. You might also consider limiting the types of commands one could execute using this CGI session.

Listing 16.3. A Web-based command shell.

#!/usr/local/bin/perl



use CGI::Form;

$q = new CGI::Form;

print $q->header();

print $q->start_html(-title=>`A Web-based Command Shell');

print "<H1>A Web-based Command Shell</H1>\n";



if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {

   &loginForm($q);

} else {

   if ($q->param(`Action') eq "Login") {

      $uid=$q->param(`uid');

      $pw=$q->param(`pw');

      if (&validateLogin($uid,$pw)) {

         $history="";

         &shellForm($q,$history);

      } else {

         &unauthorized($q);

      }

   } elsif ($q->param(`Action') eq "Doit") {

      $history=$q->param(`cmdHistory');

      $command=$q->param(`command');

      if ($command ne "") {

         $history.=&doCommand($q,$command);

         }

      $q->param(`cmdHistory',$history);

      &shellForm($q,$history);

   } else {

      &unauthorized($q);

   }

}

print $q->end_html();



sub validateLogin {

   my($userid,$pw)=@_;

   my($retval)=0;

   if (open(PASSWD,"/etc/passwd")) {

      while (<PASSWD>) {

         chop;

         my($login,$passwd,$uid,$gid,$rest)=split(/:/);

         if ($login eq $userid && crypt($pw,$passwd) eq $passwd) {

            $retval=1;

            last;

         }

      }

      close(PASSWD);

   }

   return $retval;

}



sub unauthorized {

   my($q)=@_;

   print "<STRONG>Sorry! You are not authorized to enter this area!</STRONG><BR>\n";

   print "<A HREF=/cgi-bin/cmdshell.pl>Try again</A>";

}



sub loginForm {

   my($q)=@_;

   print "<P>Please enter your userid and password:<P>\n";

   print $q->start_multipart_form();

   print $q->textfield(-name=>`uid');

   print "<BR>";

   print $q->password_field(-name=>`pw');

   print "<BR>";

   print $q->submit(-name=>`Action',-value=>`Login');

   print $q->reset;

   print $q->endform;

}



sub shellForm {

   my($q,$history)=@_;

   print $q->start_multipart_form();

   print $q->hidden(-name=>`cmdHistory',-value=>$history);

   print $q->textfield(-name=>`command');

   print "<BR>";

   print $q->submit(-name=>`Action',-value=>`Doit');

   print " ";

   print $q->reset;

   print $q->endform;

   print "<P>Command History:<P>\n";

   print "<PRE>$history<\PRE>";

}



sub doCommand {

   my($q,$cmd)=@_;

   my($hist)="$cmd\n";

   $hist.=`$cmd`;

   return "$hist\n";

}

You can see how the login form would look in Figure 16.3. Once the user has logged in and issued a few commands, the shell would look as shown in Figure 16.4.

Figure 16.3. The command shell login form.

Figure 16.4. The command shell with history.

You will notice in this example that we keep a running history of command output. When the output exceeds a certain limit, in this case 1,024 characters, it will be truncated. The session is maintained as long as continuous POST requests are made. As soon as the user leaves this CGI session, it will reinitialize on the GET request.

Warning and Dangers of Multiple Persistent Servers

It is very important that you understand the impact that your CGI scripts can have on your server's resources. Remember that, on most servers, for each CGI request, a new process is spawned to handle that request. There are a few servers that will allow you to embed CGI invocations as function calls from within the server.

You must also consider the possibility of colliding clients when trying to define how a session is maintained. There are certain basic parameters that can be used to distinguish sessions. Our example used the REMOTE_ADDR environment variable, which is good to separate clients from one another. You may also wish to create multiple sessions along a given client in which case you could throw in another variable such as time.

Embedded ObjectsAn Internet Proposal

HTML is an evolving standard. As more and more people access information on the World Wide Web, we find more limitations on the original HTML specification. There are plenty of new proposals for adding functionality to the HTML language. One of these proposals addresses the ability to embed generic objects into an HTML document. These generic objects include such things as Java applets, Microsoft ActiveX controls, and documents and other multimedia objects. This new tag is the <OBJECT> tag, and it allows HTML authors to provide richer content for those browsers that support the newer document types.

The <OBJECT> tag has been designed in such a way that it will include support for many different types of embeddable objects. The first parameter is ID, which can be used to give the object a unique identifier for hypertext links. The CLASSID parameter specifies an implementation for the object. CODEBASE allows for an additional URL to be specified for the object's implementation. DATA and TYPE are used to specify the object's data and media type. CODETYPE is similar to TYPE, except that it refers to the type of data pointed to by CODEBASE. STANDBY allows you to specify a text string that is displayed by the browser while the data is being loaded. This tag also contains parameters for appearance and alignment such as ALIGN, WIDTH, HEIGHT, BORDER, HSPACE, and VSPACE. The NAME parameter allows the object's data to be included in the contents of a form if the <OBJECT> element is found within the <FORM> block.

One example of something that might use the <OBJECT> tag is a Java applet. Here is how the object would be placed within the HTML code.

<OBJECT CLASSID="java:myproduct.myclass" HEIGHT=100 WIDTH=100>

Sorry, this browser cannot execute Java applets.

</OBJECT>

The text within the <OBJECT> block is displayed if the Java applet cannot be executed in the browser. This might be a good place to put a hypertext link to the Web site where one can download a plug-in to handle the type of object you are trying to display. For example, here is an object reference to a PDF document:

<OBJECT data=mydoc.pdf TYPE="application/pdf" HEIGHT=200 WIDTH=100>

<A HREF=http://www.adobe.com>Works best with the Acrobat plug-in</A>

</OBJECT>

Netscape Cookies and Other Netscape Feature Tags

Netscape Navigator 1.1 and later support a feature called cookies. Cookies provide the capability to maintain session information on the client end. Cookies are essentially little bits of information that are stored on the client's local disk and then may be obtained by a CGI program for future use. Cookies provide a better way to store persistent session information for individual clients. Server maintained files can become unruly as more clients are managed. It makes sense that client-specific information be stored at the client. There is a function in the CGI.pm module that provides a simple callable interface for access to these cookies.

The way to create a cookie is by using the cookie() method and then passing the cookie created by that method to the header() method to be incorporated into the document header. You can then access the cookie by just passing the -name argument without the -value argument. Here is an example of how to create a cookie as part of your document:

$cookie=$q->cookie(-name=>`Userid',-value=>`bdeng',-expires=>"+1h",

                  -path=>"/cgi-bin");

print $q->header(-cookie=>$cookie);

This creates a cookie called Userid, which is set to bdeng and expires one hour from the time it is created. This cookie will also be valid only for documents that begin with the partial path of
/cgi-bin. Later, when a program in the /cgi-bin path is called, it can query the value of the Userid cookie by calling the cookie method with -name=>`Userid'. At the time of this writing, this capability existed only in the CGI.pm module and not in the CGI modules contained within the LWP modules.

Another feature of Netscape that other browsers are beginning to adopt is that of frames. You can create a page with multiple frames where each frame refers to a separate URL. You can then target these frames with links from the other frames. This provides the user with a nice uniform way of navigating through a site without having to jump all over the place. Frame support is also available in the CGI.pm module as part of the header() and start_form() methods. These methods both support the parameter -target where you can specify a target frame for displaying the document or form.

Summary

This chapter touched on some of the more advanced issues surrounding CGI programming and HTML markup. If you are interested in learning more about the CGI specification and what it is capable of, a good starting point would be the NCSA Web site at the following URL:

http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

The most important issue you will probably face with respect to CGI programming is that of maintaining a persistent state across transactions. The examples provided in this chapter should help you in understanding solutions to that problem.

The best reference for the evolving HTML specification is at the W3C Web site at the follow-
ing URL:

http://www.w3.org/pub/WWW/MarkUp/MarkUp.html

You might also like to keep up to date with the specific features provided by Netscape Navigator and Microsoft Internet Explorer. Netscape Navigator specifics can be found at the following URL:

http://developer.netscape.com/library/documentation/htmlguide/index.htm

Microsoft Internet Explorer specifics can be found at this URL:

http://www.microsoft.com/intdev/prog-gen/webpage.htm