Chapter 13

Field Verification


CONTENTS

Forms are useful for a variety of tasks. For example, you may join a mailing list, place an order, or complete a survey. Only the creativity of the Web designer limits the list. By design, virtually anything can be entered into the fields of a form, but it's the job of the server software to process it properly.

There are times, however, when it would be nice to make certain that a form was filled out correctly. For example, if you're releasing a new version of your company's software product to the Internet for public testing (something that's very popular these days), you probably want any potential downloader to fill out a form so you can keep track of who's playing with your program. In cases like this, ensuring that the user answers all your required questions is vital. You might want to know, for example, if the downloaders used a fake e-mail address.

Server-Side Validation

Validating form information from the server-side involves checking the values that pass through the environment variables to the CGI program. The simplest level of validation is making certain that necessary fields were filled out and, if so, whether they were filled out correctly.

Before you can test any fields, you need to load the field values into memory. Under CGI, field data passes to the Perl script through the CONTENT_LENGTH environment variable, with individual field-name/field-value pairs separated by ampersands. The first step in the cracking process is to take the data from CONTENT_LENGTH and split it into individual name/value pairs:

read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);

Once the fields have been separated into individual elements in the $pairs list, the next step creates a new array-$contents-that has one entry for each field. This is demonstrated by the code fragment in listing 13.1.


Listing 13.1  Creating a Named List of Form Fields
foreach $pair (@pairs) {
   ($name, $value) = split(/=/, $pair, 2);
   $value =~ tr/+/ /;
   $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
   $contents{$name} = $value;
}

NOTE
The $contents list is referred to as a named list, because individual elements within the list are referenced by a name. In the case of $contents, the name corresponds to a name defined in the NAME attribute of a field from the form. For example, if one of the fields in the form was defined as follows:
<INPUT TYPE=TEXT NAME=REALNAME ...>
once the $contents list fills, there is a related element accessible like this:
$contents{'REALNAME'

Empty Fields

Once the fields have been cracked, meaning they've been converted from one long string of text into a list, checking the required ones for emptiness (that is, fields that have no data) is also straightforward. While all fields within a form are sent back to the server for processing, fields with no data have only their field name sent. The cracking loop above would then assign nothing to $value, effectively putting an "empty string" into the list for that particular field. Perl logic allows you to check to see if a variable contains something simply by using:

if(variable) {
   # variable has data
} else {
   # variable has no data
}

This means that you can easily test to see if a particular element isn't initialized (see listing 13.2); but, if you wish to test several fields, this can a bit cumbersome.


Listing 13.2  Simple Emptiness Testing
if(! $contents{'EMAIL'}) {
   # Warn user of the incomplete form
   exit;
}
...

An easier method would be to define a subroutine that generates the HTML file that informs the user to complete the form. Use the unless command to execute the function unless a given field has a value (as shown in listing 13.3).


Listing 13.3  Centralizing Error Handling in a Subroutine
print "Content-type: text/html\n\n";
...
sub emptyFields {
   print "<HTML><HEAD><TITLE>Form Incomplete</TITLE></HEAD>";
   print "<BODY>";
   print "I'm sorry, the form wasn't completely filled out.";
   print "<P>Please return to the form and try again.";
   print "</BODY></HTML>";

   exit;
}
...
&emptyFields unless $contents{'REALNAME'};
&emptyFields unless $contents{'EMAIL'};
...

Invalid Data

Sometimes it's necessary (where possible) to test the validity of field data. If your form has a field for an e-mail address, you can make certain assumptions about what a valid e-mail address contains-for example, an @ sign between the username and domain. Using the same unless command from listing 13.3, listing 13.4 demonstrates a quick test of an e-mail address field.


Listing 13.4  Testing Field Validity
print "Content-type: text/html\n\n";
...
sub InvalidEmail {
   print "<HTML><HEAD><TITLE>Invalid Email Address</TITLE></HEAD>";
   print "<BODY>";
   print "I'm sorry, your email address wasn't valid.";
   print "<P>Please return to the form and try again.";
   print "</BODY></HTML>";

   exit;
}
...
&InvalidEmail unless ($contents{'EMAIL'} =~ /@/);
...

Ping!

The test in listing 13.4, however, doesn't prevent a user from creating a totally fake name and domain; as long as he or she gets the ampersand in the string, the test passes. A slightly more rigorous test would involve ensuring that the domain given is an actual server. One way of doing this is through the UNIX program ping, which checks to see if the specified host responds.

From the UNIX command prompt, ping is run as follows:

ping hostname

where hostname is the domain name you wish to ping (as in visi.com). If the specified host is online and answers, ping displays:

hostname is alive

On the other hand, if the host isn't valid, that is, the host name can't be resolved from the DNS, ping displays:

ping: unknown host hostname

Testing the output of ping by searching the displayed string for the word "alive," for example, is one way to validate a host, but there is an easier way. To run an external program, like ping from inside Perl you use the system() function, which returns the exit code of the UNIX command specified:

$exitCode = system('ping hostname');

The exit code is a numeric value, and most UNIX programs follow the same convention:

If the statement is successful, the code returned is zero (0).
If the statement generates an error, the code returned is something other than zero.

The exact number returned in the event of an error varies from program to program (and each number implies a different type of error), but for the purposes of testing host validity, you only need to know if system() returned zero or not. Listing 13.5 demonstrates a Perl fragment that extracts the host name from the e-mail address and tries to ping that host. If the ping is unsuccessful, the InvalidEmail function is executed.


Listing 13.5  Pinging a Host
...
($username, $host) = split('\@', $contents{'EMAIL'});
$host =~ s/([;<>\*\|'&\$!?#\(\)\[\]\{\}:'"\\])/\\$1/g;

&InvalidEmail unless (system('ping $host > /dev/null') == 0);
...

Listing 13.5 also demonstrates two other points worth noting. First, because the ping program generates a string that is normally displayed, that is, written to standard output, by redirecting the output to /dev/null, the display string is thrown away so as to not appear within the user's browser. Second, the second line:

$host =~ s/([;<>\*\|'&\$!?#\(\)\[\]\{\}:'"\\])/\\$1/g;

This takes the host name extracted and escapes any special characters (called metachar-acters) that may be embedded within it by prefacing them with a slash (\). This is necessary because the value in $host becomes part of a command string that is executed by system(), creating a possible security risk. For example, if you didn't escape metacharacters, and the user typed the following as an "e-mail address":

sjwalter@visi.com ; /bin/rm -rF /

The system() command that would be generated would look like:

$exitCode = system('ping visi.com ; /bin/rm -rF /');

You recognize the semicolon (;) as a Perl end-of-statement delimiter, and it has the same effect in UNIX. UNIX also (like Perl) supports "statement stacking" where multiple statements can be placed on the same command line separated by semicolons. In effect, the user has just requested that your system ping visi.com, then run rm and erase your hard disk!

You can easily prevent this kind of hacking by rendering any metacharacters harmless, which is done by escaping them. To do this, place a slash in front of each character. This instructs UNIX to treat the character as a literal, which is a plain old character, and to ignore any special effects it may have. Using the previous system() command as an example, escaping the line would generate:

$exitCode = system('ping visi.com \; \/bin\/rm -rF \/');

which disables the ability to run other programs from inside your scripts. This would also probably generate an error condition inside system() depending on the actual command executed, but at that point, the attempt to invade your system has been successfully thwarted.

NOTE
Another term often used for this method of escaping text to render it harmless is sanitizing.

Server-Side Validation in Retrospect

This is just a simple look at validation from the server side. You can always make your testing more involved. An example would be to feed the domain name from the e-mail address through a nameserver utility like nslookup to validate the domain, instead of using ping. The downside of server-side validation is that:

  1. The form must be transmitted back to the server.
  2. The server must spawn a process to run the CGI script.
  3. The server must process the output from the script.
  4. The processed data must be transmitted back to the user.

This creates additional connection overhead that you can avoid with a little creative client-side scripting.

Client-Side Validation

JavaScript now enables you to pre-check your submitted forms before transmitting them to the server. This reduces the connection overhead, as well as making field validation immediate instead of having to wait for the server to parse the form, validate the data, and generate a response document.

To JavaScript, an HTML <FORM> is just another object attached to the document object, with all its elements seen as properties of the associated form object. To intercept the form before it's posted to the server, you hook the onSubmit event of the form object like this:

<FORM NAME="helpForm" METHOD=POST
      ONSUBMIT="return Validate(this)" ...>

Validate() is a JavaScript function you custom write to deal with the fields of your form. The onSubmit handler returns a value of true or false, which determines whether a form is actually submitted or not. If onSubmit returns true, form processing continues. If the return value is false, the user is brought back to the form just as he or she left it when he or she clicked the Submit button.

Emptiness

Within JavaScript, an element is empty if it's either null or an empty string (""). Listing 13.6 is an example of an isEmpty() function.


Listing 13.6  JavaScript Emptiness
function isEmpty(str) {
   return (str == null || str == "");
}

With this function, checking for a filled realname field is simple:

if(isEmpty(form.realname.value)) {
   alert("You must give me your name!");
}

NOTE
Remember that fields within a form object are not simple strings. They are actually objects with various properties that depend on the type of field. In the case of TEXT objects, the value property contains the data entered by the user.

Invalid Data

Testing the validity of the data in JavaScript fields is also straightforward. Because TEXT objects are strings, use the indexOf() method to search for the presence of a character sequence. If the character sequence isn't found, indexOf() returns a value of -1, as demonstrated in listing 13.7.


Listing 13.7  E-Mail Field Testing
if(theForm.email.value.indexOf("@") == -1) {
   alert("\nEmail addresses are usually of the form:" +
         "\n\nsomebody@someplace\n\n" +
         "your's doesn't seem to be valid!");
}

Implementing More Detailed Error Messages

Because JavaScript is running within the browser, field testing is as fast as the user's computer. You can take advantage of this added speed to provide some more descriptive responses when a form isn't properly filled out.

The code in listing 13.8 sets up an array of strings, then demonstrates an implementation of Validate() that displays a different message depending on which fields were not completed.


Listing 13.8  Fancy Error Messages
<SCRIPT LANGUAGE="JavaScript">
<!-- begin hide
function initArray(size) {
   this.length = size;

   for(i = 1; i <= size; i++) {
      this[i] = null;
   }

   return this;
}

msgCase     = new initArray(3);
msgCase[1]  = "You need to include your email address!";
msgCase[2]  = "You must include your name!";
msgCase[3]  = "What?  A blank form?\nSorry, you need to fill it out first!";

function Validate(theForm) {
   iName    = isEmpty(theForm.realname.value) ? 1 : 0;
   iEmail   = isEmpty(theForm.email.value) ? 1 : 0;
   iCase    = (iName << 1) | iEmail;

   if(iCase) {
      alert("\n" + msgCase[iCase - 1]);
      return false;
   }

   // Low-level verification of email address
   //
   if(theForm.email.value.indexOf("@") == -1) {
      alert("\nEmail addresses are usually of the form:" +
            "\n\nsomebody@someplace\n\n" +
            "your's doesn't seem to be valid!");

      return false;
   }

   return true;
}
// end hide -->
</SCRIPT>
...

The real essence of this script lies in the following lines:

iName    = isEmpty(theForm.realname.value) ? 1 : 0;
iEmail   = isEmpty(theForm.email.value) ? 1 : 0;

The variables iName and iEmail are set to 1 or 0, depending on whether the associated field is empty or not. This seems logically backward, and it is, but it's necessary in order for the following line to work:

iCase    = (iName << 1) | iEmail;

The left-shift and bit-wise OR operators take the values from "iName" and "iEmail" and construct a number between 0 and 3. Depending on the number ("iCase,"), a different message from the message array is sent back to the user. If the realname field was empty, for example, but the "email" field wasn't, "iName" would equal 1 and "iEmail" would be 0. Feeding these values through the line above would set "iCase" to 2. If you look at msgCase[2] you'll find that the stored string is:

You must include your name!

which is exactly the field that's empty. After you've figured out all the possible combinations of emptiness, this helpful hint makes dealing with a large number of empty fields much simpler.

VBScript

VBScript follows much the same structure as JavaScript, with some minor syntactical differences. The easiest way to get a feeling for how the two languages differ is to simply compare them in action. Listing 13.9 is the same code block as shown in listing 13.8, but written in VBScript.


Listing 13.9  Error Messages in VBScript
<SCRIPT LANGUAGE="VBScript">
function isEmpty(field)
   isEmpty = (field.value = "")
end function

Dim CR
CR = chr(13)

Dim msgCase(7)
msgCase(1)  = "You need to submit a comment as well," & CR & 
"not just your name and address!"
msgCase(2)  = "You need to include your email address!"
msgCase(3)  = "You have to give me more than just your name!"
msgCase(4)  = "You must include your name!"
msgCase(5)  = "You have to give me more than just your email address!"
msgCase(6)  = "You need to include your name and address!"
msgCase(7)  = "What?  A blank form?" & CR & 
"Sorry, you need to fill it out first!"

function Validate()
   Dim iName, iEmail, iText, iCase
   iName = iEmail = iText = iCase = 0

   if isEmpty(document.commentForm.realname) then
      iName = 1
   end if

   if isEmpty(document.commentForm.email) then
      iEmail = 1
   end if

   if isEmpty(document.commentForm.comment) then
      iText = 1
   end if

   Validate = true

   iCase    = (iName * 4 ) + (iEmail * 2) + iText

   if(iCase) then
      MsgBox msgCase(iCase)
      Validate = false
   elseif (not (theForm.use.checked or theForm.donotuse.checked))  then
      Msgbox CR & "You need to tell me whether I can use" & CR & 
      "your comments or not!"
      Validate = false
   elseif(theForm.email.value.indexOf("@") = -1) then
      MsgBox CR & "Email addresses are usually of the form:" & CR & 
      CR & "somebody@someplace" & CR & CR 
      "your's doesn't seem to be valid!"

      Validate = false
   end if
end function
</SCRIPT>

As you can see, the differences are subtle and purely language-syntactical as follows:

Functions in VBScript return values by setting the function name equal to the value you wish to return.
Semicolons aren't used to terminate a command because the carriage return becomes the "newline" indicator.
Arrays (and variables) are defined with the Dim statement, rather than having to write a special array function. Once created, individual array elements are accessed using parentheses instead of square brackets.
The MsgBox function replaces alert().
Testing for an empty field involves comparing the value property of the field against an empty string ("").
Embedding carriage returns within a large string to format the output in MsgBox is done by embedding a chr(13) command or a character variable that contains the character instead of the JavaScript "\n" character.
Rather than doing bit-wise math on the iCase variable, simple multiplication and addition is used.

From Here…

This chapter demonstrates how you double-check the validity of the data in submitted forms, both before the server gets the information and after. For related information, check out: