Webmaster in a Nutshell

Previous Chapter 9 Next
 

9.2 URL Encoding

Before data supplied on a form can be sent to a CGI program, each form element's name (specified by the NAME attribute) is equated with the value entered by the user to create a key-value pair. For example, if the user entered "30" when asked for his or her age, the key-value pair would be "age=30". In the transferred data, key-value pairs are separated by the ampersand (&) character.

Since under the GET method the form information is sent as part of the URL, form information can't include any spaces or other special characters that are not allowed in URLs, or characters that have other meanings in URLs, like slashes (/). (For the sake of consistency, this constraint also exists when the POST method is being used.) Therefore, the Web browser performs some special encoding on user-supplied information.

Encoding involves replacing spaces and other special characters in the query strings with their hexadecimal equivalents. (Thus, URL encoding is also sometimes called hexadecimal encoding.) Suppose a user fills out and submits a form containing his or her birthday in the syntax mm/dd/yy (e.g., 11/05/73). The forward slashes in the birthday are among the special characters that can't appear in the client's request for the CGI program. Thus, when the browser issues the request, it encodes the data. The following sample request shows the resulting encoding:

POST /cgi-bin/birthday.pl HTTP/1.0
.
. [information]
.
Content-length: 21
birthday=11%2F05%2F73

The sequence %2F is actually the hexadecimal equivalent of the slash ( / ) character.

CGI scripts have to provide some way to "decode" form data the client has encoded. Here's a short CGI program, written in Perl, that can process this form:

#!/usr/local/bin/perl
$size_of_form_information = $ENV{'CONTENT_LENGTH'};
read (STDIN, $form_info, $size_of_form_information);
$form_info =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
($field_name, $birthday) = split (/=/, $form_info);
print "Content-type: text/plain", "\n\n";
print "Hey, your birthday is on: $birthday. That's what you
told me, right?", "\n";
exit (0);

The line:

$form_info =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;

is a regular expression in Perl that converts the hex "%2F" back to a "/" character. To dissect this program further, see Chapter 15, Perl Quick Reference, which provides some quick-reference material on Perl.

As a special case, the space character can be encoded as a plus sign (+) in addition to its hexadecimal notation (%20).


Previous Home Next
A Typical CGI Interaction Book Index Extra Path Information