7
Simple PleasuresExamples


This chapter will provide several examples of things that all Webmasters might want to incorporate into their Web sites. It will, hopefully, illustrate the power of Perl and prove why Perl has become the de facto standard when it comes to CGI programming. All of these examples involve server-side CGI programs, and they are ubiquitous real-world programs out on the World Wide Web.

You have already learned the basics of Perl5 and how the WWW libraries are used. It is now time to put what you've learned to practical use. You will see how to implement a guest book, a page hit counter, a clickable image map, a text file searching program, and an automatic e-mail notification script.

Guest Book

One of the most common CGI applications out on the Web today is the guest book. Everyone wants to build up a customer contact list, and what better way to do it than on the Web? The paradigm of direct-mail is slowly being replaced by the guest book application on the World Wide Web. One of the best ways to find sales leads is to monitor the people who visit your Web site. This example shows how to implement this application easily with Perl5 and the WWW libraries.

Determining Fields of Information

Before you actually start writing code, it would be a good idea to figure out what information fields you want to store about the visitors. The obvious information comes to mind first: name, title, company, address, phone number(s), fax number, e-mail address. There may be others that are specific to your business. Let's also include these for this example: How did you hear about us? What products are you interested in? Comments/feedback.

Setting Up the Database

A very important piece of work in the initial design is deciding on how you will store the data. For simplicity's sake this example stores the information in a plain text file, with fields separated by a delimiting set of characters. There's no reason why you couldn't store the information into a relational or object database. There are several modules available that address the need of connecting Perl to relational databases. If you are interested in this capability, search the CPAN for ODBC modules.

In this example, you define one row in a file to be a single visitor. Our delimiting characters will be <*>. Therefore, the database would be structured like the following:

Name<*>Title<*>Company<*>Address<*>City<*>State<*>Zip<*>Phone<*>Fax<*>

Âe-mail<*> How<*>What<*>Comments

You'll define a distinct set of values for How and What. For example, How might include Friend, Magazine, Salesperson, Newspaper, and Television. You would define What as an array of products that are in our product line.

Displaying the Form

The form you use to obtain this information will use many of the field types described in Chapter 4, "HTML Forms--The Foundation of an Interactive Web." You'll also use a table to make the Figure 7.1. The guest book form as it appears in the browser. fields align nicely. Ultimately, you should end up with a form as shown in Figure 7.1.

Figure 7.1.

To display this form, you can use a combination of <TABLE> tags and CGI::Form methods, as in the Perl subroutine in Listing 7.1.

Listing 7.1. Perl subroutine for printing the guest book form.

sub guestBookForm {

   my($q)=@_;

   my(@states)=(`AK','AL','AR','AZ','CA','CO','CT','DE','FL','GA',

                `HI','IA','ID','IL','IN','KS','KY','LA','MA','MD',

                `ME','MI','MN','MO','MS','MT','NC','ND','NE','NH',

                `NJ','NM','NV','NY','OH','OK','OR','PA','RI','SC',

                `SD','TN','TX','UT','VA','VT','WA','WI','WV','WY');



   my(@hows)=(`A Friend','Magazine','Newspaper','Television',

                `Sales Person','Other');

   my(@products)=(`Widget','Whatsit','Whatchamacallit','Thingamajig');

   print "<H1>Welcome to Widget World</H1>\n";

   print "<P>Please take a moment to fill out our guest book ";

   print "to help us serve you better.\n";

   print $q->start_multipart_form();

   print "<TABLE>\n";

   print "<TR>\n";

   print "<TD>Name:<TD>\n";

   print $q->textfield(-name=>`Name',-size=>32,-maxlength=>32);

   print "<TR><TD>Title:<TD>\n";

   print $q->textfield(-name=>`Title',-size=>32,-maxlength=>32);

   print "<TR><TD>Company:<TD>\n";

   print $q->textfield(-name=>`Company',-size=>32,-maxlength=>32);

   print "<TR><TD>Address:<TD>\n";

   print $q->textarea(-name=>`Address',-rows=>3,-cols=>32);

   print "<TR><TD>City:<TD>\n";

   print $q->textfield(-name=>`City',-size=>32,-maxlength=>32);

   print "<TR><TD>State:<TD>\n";

   print $q->popup_menu(-name=>`State',-value=>\@states);

   print "<TR><TD>Zip Code:<TD>\n";

   print $q->textfield(-name=>`ZipCode',-size=>10,-maxlength=>10);

   print "<TR><TD>Phone:<TD>\n";

   print $q->textfield(-name=>`Phone',-size=>12,-maxlength=>12);

   print "<TR><TD>Fax:<TD>\n";

   print $q->textfield(-name=>`Fax',-size=>12,-maxlength=>12);

   print "<TR><TD>e-mail:<TD>\n";

   print $q->textfield(-name=>`email',-size=>32,-maxlength=>32);

   print "<TR><TD>How did you hear about us?<BR>";

   print $q->radio_group(-name=>`How',-values=>\@hows,-linebreak=>`true');

   print "<TD>Which product(s) are you interested in?<BR>";

   print $q->scrolling_list(-name=>`What',-values=>\@products,

                            -multiple=>`true'-linebreak=>`true');

   print "</TABLE>\n";

   print "Any Comments?<BR>\n";

   print $q->textarea(-name=>`Comment',-rows=>10,-cols=>60);

   print "<HR>\n";

   print $q->submit(-name=>`Action',-value=>`Submit');

   print " ";

   print $q->reset(-value=>`Start from Scratch');

   print $q->endform();

}

Processing the POST

When the user clicks Submit, the form data is sent back up to the server and the Perl script is called with the request method POST. All the values in the form can be obtained using the param() method. The Perl subroutine in Listing 7.2 gathers all the information into a single record of data and returns the delimited string that makes up a row in the database.

Listing 7.2. Perl subroutine to process the form data.

sub gatherData {

   my($q)=@_;

   my(@orderedList)=();

   push(@orderedList,$q->param(`Name'));

   push(@orderedList,$q->param(`Title'));

   push(@orderedList,$q->param(`Company'));

   push(@orderedList,$q->param(`Address'));

   push(@orderedList,$q->param(`City'));

   push(@orderedList,$q->param(`State'));

   push(@orderedList,$q->param(`ZipCode'));

   push(@orderedList,$q->param(`Phone'));

   push(@orderedList,$q->param(`Fax'));

   push(@orderedList,$q->param(`email'));

   push(@orderedList,$q->param(`How'));

   push(@orderedList,$q->param(`What'));

   push(@orderedList,$q->param(`Comment'));

   return join(`<*>`,@orderedList);

}

Putting It All Together

The main code in the guest book program is now pretty simple. All you need to do is figure out whether you're handling a GET or a POST and do the appropriate thing. An environment variable called REQUEST_METHOD tells you whether it is POST or GET. You obtain the value of this environment variable using the var() method of the inherited CGI::Base class. Assume you keep the database in a file called guests.list. You can now put this whole thing together with just a few lines of code, as shown in Listing 7.3.

Listing 7.3. Main Perl guest book CGI program.

#!/public/bin/perl5



# Standard header stuff

use CGI::Form;

$q = new CGI::Form;

print $q->header();

print $q->start_html(-title=>`Welcome to Widget World',

                    -author=>`webmaster\@widgets.com');



if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {

   &guestBookForm($q);

} else {

   open(DATABASE,">> guests.list") ||

      die "Cannot open guest book list for append!\n";

   print DATABASE &gatherData($q);

   print DATABASE "\n";

   close(DATABASE);

   print "<P>Thank you for taking the time to enter our guest book! ";

   print "We look forward to doing business with you.";

}

print "<HR>\n<P>If you have any problems with this form, please contact ";

print "our <A HREF=mailto:webmaster\@widgets.com>Web master</A>";

print $q->end_html();

Displaying the Complete Guest List

One last thing you might want to do (now that you have a database with some data in it) is to provide a CGI script that displays the guest book in a nice format (see Figure 7.2). This can be done by parsing the database file and generating the HTML markup on-the-fly. The example in Listing 7.4 shows one option on how you can do this.

Listing 7.4. CGI program to display the guest list.

#!/public/bin/perl5

use CGI::Form;

$q = new CGI::Form;

print $q->header();

print $q->start_html(-title=>`Guest List');

print "<H1>Guest List</H1>\n";

print "<TABLE BORDER>\n";

print "<TR><TH>Name<TH>Title<TH>Company<TH>Address<TH>City<TH>State<TH>Zip";

print "<TH>Phone<TH>Fax<TH>e-mail<TH>How they heard<TH>Products of interest";

print "<TH>Comments";

open(DATABASE,"< guests.list") ||

   die "Cannot open guest list database for read!\n";

while(<DATABASE>) {

   print "<TR>\n";

   print "<TD>";

   @fields=split(/\<\*\>/);

   print join(`<TD>`,@fields);

   print "</TR>";

}

close(DATABASE);

print "</TABLE>";

Figure 7.2. The table of guests in the guest book.

Review

What you've seen in this section is how to implement your own guest book and the ease with which it can be done using Perl5 and the WWW libraries. You should also consider adding some nice images to your guest book. You should use everything that HTML offers to provide a nice experience for your visitor.

Hit Counter

Another common use of CGI scripts is the hit counter. A hit counter is used to determine how many times your page has been accessed. Web servers can be configured to perform certain levels of logging. Although this can slow down the server somewhat, it can also provide valuable information to you. Many people want to know how popular their Web sites are, and the hit counter allows a Webmaster to show off to new visitors just exactly how popular a site is.

Introduction

Hit counters come in all different forms. You can have a normal ASCII text counter, or you can get creative and make a graphical counter. A common approach is to use the concept of an odometer on a car. I will show you how to obtain the number in this example and give you one example on how to make the display graphical.

The number of accesses is not the only type of counter you can provide. You can also find out how many times your page was referred by another page and also what type of browsers are accessing your page.

Setting Up the Web Server to Log Access

This section describes how to set up the NCSA httpd Web server for logging access to your Web site. This mechanism also applies to the Apache Web server and a few others that are based on httpd. Windows- and Macintosh-based servers usually provide a GUI front-end to these server options.

The NCSA httpd server has a configuration file called httpd.conf. This is an ASCII text file that is used to configure the server options. Within this configuration file, four variables are used to define where certain logs are kept. ErrorLog defines where the Web server should redirect STDERR; TransferLog defines where the page accesses are logged; AgentLog defines where the client information is logged; and RefererLog defines where the referring pages are logged. ErrorLog isn't something you would worry about in this example, although it is a very useful file to be aware of. This example focuses on TransferLog, AgentLog, and RefererLog.

Parsing the Access Log

To determine how many times a certain page has been visited, you need to scan the TransferLog. First, find out where your log file is kept. Let's assume that the Web server is installed in /usr/etc/httpd and that you have set TransferLog to logs/access_log. The file that you need to look at is /usr/etc/httpd/logs/access_log. Each line in this file pertains to one hit on a single object in your Web site. By using a Perl regular expression, you can search for the page in question and return the number of times that page has been found in the access log. The following line is an example of a record from the TransferLog.

www-proxy - - [06/Dec/1995:13:40:52 -0800] "GET /index.html HTTP/1.0" 200 638

The Perl program in Listing 7.5 opens the access log and uses a regular expression to search for the number of occurrences in the file. The page to search for is passed in as an argument to this function.

Listing 7.5. Perl subroutine to count the number of hits on a given page.

sub pageCount {

   my($page)=@_;

   # Pre-pend the GET method to limit the search scope.

   my($srchStr) = "GET $page";

   open(IN,"< /usr/etc/httpd/logs/access_log") ||

      die "Cannot open access log! $?\n";

   return(scalar(grep(/$srchStr/, <IN>)));}

This code can be included in your CGI script to display the number of hits on a given page. It can also be used outside of the Web site to provide statistics. $page is defined as a document path, relative to the document root of your server.

This routine can also be used in conjunction with some images to display a graphical hit counter. Suppose, for example, you have an image for each digit. You could take the resulting number from this function, treat it as a string, and use each digit value to locate the image associated with the digit, as shown in Listing 7.6.

Listing 7.6. CGI script that displays a graphical hit counter.

my($count)=&pageCount("/index.html");

my($len)=length($count);

my($i);

my($imageStr)="";

for ($i=0;$i<$len;$i++) {

    my($digit)=substr($count,$i,1);

    $imageStr .= "<IMG SRC=digit$digit.gif>";

}

print "$imageStr\n";

Therefore, if the hit count turns out to be 342, this code results in HTML that looks like

<IMG SRC=digit3.gif><IMG SRC=digit4.gif><IMG SRC=digit2.gif>

which will appear in the browser as in Figure 7.3.

Figure 7.3. Demonstration of a graphical hit counter.

You can also use printf formatting to pad the number with zeros prior to producing the image string. This would give the effect of an odometer look.

In order to keep your Web server from being bogged down by this script on every page access, it might be wise to run this code as a cron job every 15 minutes or so. With the cron job, you can also keep track of the current seek point in the file and avoid having to count hits multiple times. You can then access the counter file that the cron job maintains. If you need up-to-the-second results, you can have the CGI script use the same logic as the cron job and go to a particular seek point in the log to begin the counting. It is also always a good idea for the Web server administrator to rotate the logs every so often. When rotating the logs, it is important to then reset the seek position of your counter file back to zero. To move the current read position of an open file, use the Perl seek function as in the following code:

seek LOGHANDLE, $seekPoint, 0;

There is a useful package available in the CPAN from Gisle Aas called CounterFile.pm. This module provides an easy programmable interface for maintaining a counter file. By using this module, you can easily lock the file while you increment the counter. A CounterFile object is used as in the following example:

Use File::CounterFile;

$counter = new File::CounterFile "page1.count", "0";

$counter->lock;

$current_count = $counter->inc;

$counter->unlock;

You can decrement the counter by using the dec method. You can also use the value method to peek at the counter value without incrementing or decrementing the counter.

Parsing the RefererLog

To determine how many times you were referred by a particular Web page, you would scan the RefererLog. This is useful to see where people may be coming from. Here is an example of a record from the RefererLog:

http://vader/sales.html -> /technical.html

You can search this log using the same code as with the access log. The difference would be that you might want to search for the number of times you have been referred from a given page. This can be accomplished by simply modifying the regular expression in the grep statement to contain the -> string as part of the pattern to search, like this:

my($srchStr) = "$referer ->";

my($count) = grep(/$srchStr/,@lines);

Parsing the AgentLog

The AgentLog is useful to find out what kinds of browsers are accessing your Web site. The most popular Web browser out there today is Netscape, which is known in the agent log as Mozilla. Depending on the browser, you may also be able to determine what platform the browser was running on. This can be useful if you want to know how many are Windows users and how many are Macintosh users. Here is an example of a record from the AgentLog.

Mozilla/1.1N (Macintosh; I; 68K)  via proxy gateway  CERN-HTTPD/3.0 libwww/2.17

Unfortunately, not all browsers emit this information the same way. The first part of the line can always be used to determine the browser. For example, Netscape shows Mozilla, Mosaic shows NCSA Mosaic. It is interesting to note that Microsoft's Internet Explorer also announces itself as Mozilla with a qualifier (compatible MSIE 3.0), so that it is sent the same HTML that Netscape Navigator would be sent. This allows the Internet Explorer to display its own Netscape-compatible extension capabilities. The following regular expression might provide useful information for
determining the user agent.

if ($line=~/(.*)\((.*)\)(.*)/) {

   my($browser)=$1; # $1 contains the first set of parens

   my($platform)=$2; # $2 contains the 2nd set of parens

   my($proxy)=$3; # and so on.

}

Review

The important thing to remember about this example is that the code used to open and read the log files is very easily reusable. The main difference is in what you're searching for, which boils down to the regular expression. As you've seen in previous chapters, regular expressions can be as simple as ordinary words or they can be extremely complex. This is one of many features that illustrates the simplicity of Perl, yet shows the power of Perl for the more advanced user. You should always explore the capabilities of the Perl regular expression before you attempt to write your own parsing routines.

Clickable Maps

Clickable maps are used to visually navigate within a Web site. Generally, you see the clickable map on the main page or index page. The concept of a clickable map replaces that of an ordinary unordered URL list. For example, suppose you have five separate organizations in your site. You can display links either like this:

<UL>

<LI><A HREF="sales.html">Sales</A>

<LI><A HREF="service.html">Service</A>

<LI><A HREF="support.html">Support</A>

<LI><A HREF="training.html">Training</A>

<LI><A HREF="technical.html">Technical Information</A>

</UL>

Or you can put a pretty picture in place of this list and provide a nicer look and feel. Images are nice, but there are still some issues regarding limited bandwidth. This section explains how to create a clickable image map, as well as some tips on how to reduce your image byte size for those users viewing your page through a slower modem.

Introduction

To create a clickable map, you first need to create an image. Once you have an image, you need to determine which portion of an image will navigate to some specific URL when clicked. This is called your image map. When the user clicks on the image, a point is returned in pixel coordinates, which might correspond to an area defined in your image map. This area is associated with a specific URL that will be retrieved.

Creating an Image

The preferred image formats for the Web are JPEG and GIF. GIFs are nice because they provide transparency, which allows the background color of your page to seep through the image. This is especially useful if your image is not an ordinary rectangle. Because of the way in which GIFs are compressed, you will get a better compression ratio if you have large areas of a continuous single color. Images with dithered colors, for example, will not compress well in the GIF format. The JPEG format, which is intended for photographic scans, does a better job at compressing these types of images. It is important that you choose the correct format for your image. As a rule, you might consider that 16- or 4-color images be done in the GIF format, and images with a greater color depth be implemented in JPEG format. I suggest that you try different approaches with your image until you find that perfect nirvana of smallest byte size together with the greatest quality for your image. A product such as Adobe PhotoShop is perfect for this type of work.

There are some things you can do to reduce the byte size of your image. The most important thing is to choose the appropriate format for your image, as described in the previous paragraph. Once you have the proper format for your image, you can work on reducing the size by limiting the color depth to the actual number of colors you need. Most images will look fine with 16 or 256 colors. You need to go above that only for photographic scans. You can customize the palette of colors for your image if necessary. Another way to reduce byte size, of course, is to reduce the physical size of the image. If you are using the GIF format, yet another way to reduce image size is by designing the image with large areas of continuous color. As mentioned earlier, this increases the effectiveness of the GIF compression algorithm.

Creating the Map

Now that you have an image, you can map out the locations of your image in pixels. Pixel areas can be defined as rectangles, circles, polygons, and points. The coordinate system originates at the top-left corner of the image. If you do not cover all areas of your image in the map, you can set a default URL value for those pixels not covered. The following example shows a map for an image.

rect http://vader/sales.html 167,32 262,155

rect http://vader/service.html 16,14 40160,68

circle http://vader/support.html 215,215 , 215, 175

poly http://vader/training.html 25, 130, 100, 80, 160, 140, 105, 195poly http://Âvader/software.html 20, 285, 90, 200, 160, 285

To set up your image map, use the following HTML code:

<A HREF="http://vader/imagemaps/splash_map.map">

<IMG SRC=/images/splash_map.gif ISMAP>

</A>



NOTE:

This method of referring to an image map with the <HREF> tag does not work with all Web servers. You should make sure your Web server supports this before attempting to use it.


You also can display an image from your CGI script and query the pixel coordinate that has been clicked by the user. Listing 7.7 shows how to do this, using CGI::Form. Figure 7.4 shows how the image map appears in the browser.

Listing 7.7. Clickable image CGI example.

#!/public/bin/perl5

use CGI::Form;

use CGI::ImageMap qw(action_map map_untaint);

$q = new CGI::Form;

print $q->header;

print $q->start_html("Clickable Map Demonstration");

print "<H1>Clickable Map Demonstration</H1>\n";

print $q->startform;

print $q->image_button(`picture', "/images/cool_image.gif");

print $q->endform;

print "<HR>\n";



if ($q->param) {

   ($x,$y) = ($q->param(`picture.x'),$q->param(`picture.y'));

   print "<P>The user clicked location <EM>($x,$y)</EM>\n";

   my $map = $q->param( `splash_map.map' );

   $action = action_map($x,$y,@map);

   print "<P>The corresponding action is <EM>$action</EM>\n";

}

print $q->end_html;

Figure 7.4. The image map as it appears in the browser.

This returns the coordinate clicked by the user. You can imagine the possibilities of visual navigation using clickable maps. The more visual your page is, the easier it should be for the average user to navigate through it. You can also sometimes cross language barriers with this approach. The limitation today, of course, is the bandwidth at which images are transferred through the wire. Given the advances in network technology, this problem should soon be alleviated.

Review

Clickable maps are a great way to present a friendly navigation model. However, you have to be considerate of the average user when creating your images. Even though you might have a 100Mbs network running at your location, you should keep in mind that most users today are accessing your page at a rate of 28.8Kbs; thus, the larger the image, the more frustrating your page may be. When ISDN or cable modems become the standard means of connection, you will have more freedom and flexibility when it comes to image size.

Text File Search

The earlier example of the hit counter showed you how to scan a single file for occurrences of a string. This text file search example shows you how to scan your entire Web site for occurrences of a string. You may have visited the popular Web search sites available, such as Yahoo!, Excite, Lycos, and InfoSeek. These search engines work with large amounts of indexed data from Web sites around the world. In some cases, these search engines are implemented in Perl. As you have already seen in previous chapters, Perl is a perfect language for text manipulation and searching. It is very efficient in processing files, which, combined with its powerful regular expression capability, make it a perfect language for this type of work.

Introduction

This example shows you how to provide a search engine into your own Web site. The front end is a simple form with a text field, a Submit button, and a Reset button. The back end recurses through your Web site's directories, scanning the HTML files for the existence of the specified string. The resulting page will contain either a message that no items have been found, or it will display a list of navigable links to those pages that match the search criteria.

Defining the Search Scope

The form for this example is a simple one. Using CGI::Form, Listing 7.8 contains the code.

Listing 7.8. Subroutine to return a search form.

sub searchForm {

   my($q)=@_;

   print $q->header;

   print $q->start_html("Search My Site");

   print "<H1>Search My Site</H1>\n<HR>\n";

   print "<P>Please enter one or more words to search for";

   print " and click `Search'<BR>\n";

   print $q->start_multipart_form();

   print $q->textfield(-name=>`SearchString',-maxlength=>100,-size=>40);

   print "<BR><BR><BR>";

   print $q->submit(-name=>`Action',-value=>`Search');

   print " ";

   print $q->reset();

   print $q->endform();

   print $q->end_html();

}

This form appears in your browser as shown in Figure 7.5.

Figure 7.5. The search form as it appears in your browser.

When the user clicks Search, the real work begins. In this example, you will search the entire site, but depending on the size of your site, you might want to limit the search scope by adding another field to your form. This can be accomplished by using a pull-down menu or a group of radio buttons.

The Power of Perl in Text File Processing

Now that you have the front end, it's time to write the search engine itself. Use the File::Find library, available in the Perl distribution. This library does all of the directory scanning for you, leaving you to simply implement the scanning algorithm. This scanning algorithm searches for each word, keeping a count of occurrences of each word. When it comes time to display the search results, you can display them in the order of occurrences, which will give the user the most likely page they are looking for right at the top. This concept should not be entirely new to you if you have visited one of the popular search sites on the Web.

Assuming you have extracted the list of words to search for, you'll simply write a function that accepts a word list as an argument, along with the file to scan. Let's leave it up to the File::Find module to pass you the files, as shown in Listing 7.9.

Listing 7.9. Subroutine to search for a list of words.

sub wanted {

   # This line gets rid of all Unix-type hidden files/directories.

   return if $File::Find::name=~/\/\./;

   # Only look at HTML files.

   if ($File::Find::name=~/^.*\.html$/) {

      if (!open(IN, "< $File::Find::name")) {

         # This error message will appear in your error_log file.

         warn "Cannot open file: $File::Find::name...$!\n";

         return;

      }

      my(@lines)=<IN>;

      close(IN);

      my($count)=0;

      foreach (@words) {

         # Make the search case-insensitive.

         $word="(?i)$_";

         $count+=grep(/$word/,@lines);

      }

      if ($count>0) {

         # Add this page to the list of found items.

         push(@foundList,"$File::Find::name");

         # Store the hit count in an associate array

         # with the page as the key.

         $hitCounts{"$File::Find::name"}=$count;

      }

   }

}



NOTE:

If you are running on a UNIX system where the egrep command is available, you should consider replacing the majority of this Perl code with a call to egrep, as follows:

@hitList=`egrep -ci `(word1|word2|word3)' $File::Find::name`;

This would be more efficient in terms of memory requirements and processor use.


File::Find contains a function called finddepth(), which takes at least two arguments: a filter function and one or more directory names to recurse. The filter function you are using is the one above called wanted(). finddepth()calls wanted() for each file that it comes across. The filename is contained in the variable $_. The file path is contained in the variable $File::Find::dir. You have used the variable $File::Find::name, which is the combination of the other two variables, with a path separator stuck in between. By using the functionality provided by File::Find, all you need to do is add in your search filter and not worry about recursion and figuring out what's a file and what's a directory.

The code used to initiate the search looks like this:

@words=split(/ /,$q->param(`SearchString'));

if (@words>0) {

   finddepth(\&wanted,"/user/bdeng/Web/docs");

}

It's probably a good idea to check the @words array so that it contains at least one value. No need to make finddepth() do all that work if you have nothing to search for. In this particular case, you might emit some HTML that politely reminds the user to specify something to search for.

Displaying the Results

All you need to do now is display the results in a meaningful format. What you're aiming for is an ordered list of likely candidates for what the user is trying to find. You have an array of pages and an associative array of hit counts. What you need first is a sort routine to rearrange the array in the correct order. The following sort routine should work just fine:

@foundList = sort sortByHitCount @foundList;



sub sortByHitCount {

return $hitCounts{$b}- $hitCounts{$a};

}

The first line in this code is the call to sort(), using the subroutine sortByHitCount(). The $a and $b variables are package global variables that sort() uses to tell the sorting routine which items to compare. The items that you're comparing in this case are filenames that are keys into the hitCounts associative array. Returning a negative value indicates that $a is less than $b, and returning a positive value indicates $a is greater than $b. Returning 0 indicates that the two values are equal. What you are actually comparing in sortByHitCount() is the hit count of each page.



NOTE:

Remember that in the previous example, the %hitCounts associate array must be within the scope of the sortByHitCount function. It would be a very difficult problem to debug if you decided to move the sortByHitCount into a different package scope one day.


Now you have a sorted list of files that need converting to URLs. To do this, you simply chop off the first n characters, where n is the length of the $serverRoot variable. This can be done with the following line:

$url=substr($file,length($serverRoot));

You can now format the string as a link by adding the <A> tag around the $url. The final main code appears in Listing 7.10.

Listing 7.10. A simple CGI searching program.

#!/public/bin/perl5

use CGI::Form;

use File::Find;



# Variables for storing the search criteria/results.

@words;

@foundList;

%hitCounts;



$q = new CGI::Form;

$serverRoot="/user/bdeng/Web/docs";



if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {

   &searchForm($q);

} else {

   @words=split(/ /,$q->param(`SearchString'));

   print $q->header;

   print $q->start_html("Search Results");

   print "<H1>Search Results</H1>\n<HR>\n";

   if (@words>0) {

      finddepth(\&wanted,$serverRoot);

      @foundList = sort sortByHitCount @foundList;

      if (@foundList>0) {

         foreach $file (@foundList) {

            $item=substr($file,length($serverRoot));

            print "<A HREF=$item>$item</A> has ";

            print "$hitCounts{$file} occurences.<BR>\n";

         }

      } else {

         print "<P>Sorry, I didn't find anything based on your criteria. ";

      }

   } else {

       &searchForm($q);

       print "<HR><P>Please enter a search criteria. ";

   }

   print $q->end_html();

}

This example is provided simply to show you the capability of Perl for text processing. If you have a very large Web site with a lot of files to search through, it would make much more sense for you to run an index generating on your data perhaps on a nightly basis and then use that index from your CGI script. The script in Listing 7.10 can easily be modified to search an index rather than your entire Web site. A good indexing package called Isearch can be found at http://cnidr.org/isearch.html.

Review

This example is pretty basic. You can certainly take this and extend it to suit your needs. One important concept in this example is that you should utilize existing libraries wherever possible. Some things, such as case-sensitivity, scope limitation, and filename filters, can be made optional by adding to the search form. This example was limited to a case-insensitive search on all HTML files within the root directory tree of the server. You can also consider extracting the titles of the Web pages that you search by scanning for the <TITLE> tag, because you've already read the entire file into an array. This can be stored in another associative array and displayed in the results page as the label of your link.

Again, it might be wise to look into existing indexing programs for a more efficient searching capability. This is especially true if you are managing a large site with a lot of large HTML files. You might also have other types of files in your site, such as PDF files for which you can also create indexes providing optimized searches.

E-Mail Notification

The next and last example shows you how you can set up a Web page to automatically send
e-mail. What this example does is keep a database of license plate numbers and their owners'
e-mail addresses. If a person sees a car with the lights left on, he or she can bring up this Web page and send an automatic e-mail notification to the owner of that car.

Introduction

The first thing you need is a database of license plate numbers. You can use the same format as in the guest book example. The only difference would be the fields that are stored in the database. In addition to the license plate number, you should store the owner's name and e-mail address, as well as the color, make, and model of the car. The database will have the following format:

license<*>owner<*>email<*>color<*>make<*>model

Displaying the Form

The next thing you need is the form that allows users to issue the e-mail notification. This form is another simple one. You'll use a field to indicate the license plate number, another field for the user to enter his or her name, and two submission options. The first submit option is to send
e-mail to the owner. The other submit option is to display only the license plate information. The form code looks like Listing 7.11.

Listing 7.11. Subroutine to print a license plate form.

sub licensePlateForm {

   my($q)=@_;

   print "<P>Please enter a license plate number";

   print " and select one of the options.\n";

   print "<P><B>Notify</B> will send e-mail to the owner.\n";

   print "<P><B>Query</B> will display the information about";

   print " this license plate.\n";

   print $q->start_multipart_form();

   print "<P>License plate: ";

   print $q->textfield(-name=>`LicensePlate',-maxlength=>7,-size=>7);

   print "<P>Your name: ";

   print $q->textfield(-name=>`FindersName',-maxlength=>32,-size=>32);

   print "<BR><BR><BR>";

   print $q->submit(-name=>`Action',-value=>`Query');

   print " ";

   print $q->submit(-name=>`Action',-value=>`Notify');

   print " ";

   print $q->reset();

   print $q->endform();

}

This form is shown in Figure 7.6.

Figure 7.6. The license plate search form.

Querying the License Plate Database

You need a few other functions for this example. The first is the one used to look up the license plate in the database; another is to print the information about that license plate; and the third is the one that sends e-mail notification to the owner.

The first function opens the database file and scans it for a license plate match (see Listing 7.12).

Listing 7.12. Subroutine to search for a specific license plate in the database.

sub findLicensePlate {

   my($licensePlate)=@_;

   my(%info);

   if (open(DATABASE, "< $DATABASEFILE")) {

      $srchStr="^(?i)$licensePlate\\<\\*\\>";

      while (<DATABASE>) {

         if (/$srchStr/) {

            ($info{`lic'},$info{`name'},$info{`email'},

             $info{`color'},$info{`make'},$info{`model'})=split(`<*>`);

            last;

         }

      }

      close(DATABASE);

   }

   return %info;

}

The next function prints the information about a given license plate number, as shown in List-
ing 7.13.

Listing 7.13. Subroutine to print the information found about the license plate.



sub printInfo {

   my($licensePlate)=@_;

   my(%info)=&findLicensePlate($licensePlate);

   if (defined($info{`name'})) {

      print "<P><B>Owner</B> is: $name<BR>\n";

      print "<P><B>E-mail Address</B> is: $email<BR>\n";

      print "<P><B>Color</B> is: $color<BR>\n";

      print "<P><B>Make</B> is: $make<BR>\n";

      print "<P><B>Model</B> is: $model<BR>\n";

   } else {

      print "<P>Sorry, that license plate number was not found";

      print " in our database<BR>\n";

   }

}

Formatting the Mail Text

The last thing you need is the function that sends the e-mail. There is an existing Perl module for sending mail, called Mail::Send. You can use this library to make your script simpler. The Mail::Send module provides methods to set the destination address, subject, and so on, and then open a file handle to which you can write the body of the mail message. Given that this module does all this for you, all you need to do is decide on the wording for the message. The function used to send the e-mail notification looks like Listing 7.14.

Listing 7.14. Subroutine for sending e-mail notification.

use Mail::Send;



sub notifyOwner {

   my($licensePlate,$notifier)=@_;

   my(%info)=&findLicensePlate($licensePlate);

   if (defined($info{`email'})) {

$msg = new Mail::Send;

      $msg->to($info{`email'});

      $msg->subject("Hey! Your lights are on! ");

      $fh = $msg->open();

      print $fh "$info{`name'},\n   Are you the owner of that ";

      print $fh "$info{`color'} $info{`make'} $info{`model'}?\n";

      print $fh "If so, your lights are on.\n";

      print $fh "Sincerely yours,\n$notifier\n";

      $msg->close();

      print "$info{`name'} has been notified! ";

   } else {

      print "<P>Sorry, that license plate number was not "

      print "found in our database<BR>\n";

   }

}

You are now ready to write the main line code. In the case of the get request method, display the form; otherwise, process either the query or the notify action (see Listing 7.15).

Listing 7.15. The license plate notification CGI program.

#!/public/bin/perl5

use CGI::Form;



$q = new CGI::Form;



print $q->header();

print $q->start_html(-title=>`Lights Are On!');

print "<H1>Lights Are On!</H1><HR>\n";

if ($q->cgi->var(`REQUEST_METHOD') eq `GET') {

   &licensePlateForm($q);

} else {

   my($action)=$q->param(`Action');

   if ($action eq `Query') {

      &printInfo($q->param(`LicensePlate'));

   } elsif ($action eq `Notify') {

      &notifyOwner($q->param(`LicensePlate'),$q->param(`FindersName'));

   }

}

print $q->end_html();

Review

This is another simple example that shows you how to send e-mail to someone from a CGI script. There are many other useful applications for this type of script, such as an automated request for information.

Summary

I hope you have obtained some valuable tips from this chapter on how to implement some basic tasks using Perl as your CGI implementation language. I will attempt to provide some more complex examples in later chapters, which will build on this foundation. I also hope that you find some of these algorithms useful and, more importantly, reusable, and I encourage you to share your own ideas with the rest of the Perl community.