Appendix A

CGI Reference


CONTENTS


This appendix provides a reference for the CGI protocol and related variables, including MIME types, environment variables, and hexadecimal encoding for nonalphanumeric characters.

Output

To output something from a CGI application, print to stdout. You format output as follows:

headers
body/data

Headers

Headers consist of the HTTP header's name followed by a colon, a space, and the value. Each header should end with a carriage return and a line feed (\r\n), including the blank line following the headers.

Header name: header value

A CGI header must contain at least one of the following headers:

Location: URI
Content-Type: MIME type/subtype
Status: code message

You can include additional headers, including any HTTP-specific headers (such as Expires or Server) and any custom headers. See Chapter 4,"Output," for a discussion of the Location header. Table A.1 lists the status codes, which tell the client whether the transaction was successful or not and what to do next. See Chapter 8, "Client/Server Issues," for more about status codes.

Table A.1. Valid HTTP status codes.

Status Code
Definition
200
The request was successful and a proper response has been sent.
201
If a resource or file has been created by the server, it sends a 201 status code and the location of the new resource. Of the methods GET, HEAD, and POST, only POST is capable of creating new resources (for example, file uploading).
202
The request has been accepted although it might not have been processed yet. For example, if the user requested a long database search, you could start the search, respond with a 202 message, and inform the user that the results will be e-mailed later.
204
The request was successful but there is no content to return.
301
The requested document has a new, permanent URL. The new location should be specified in the Location header.
302
The requested document is temporarily located at a different location, specified in the Location header.
304
If the client requests a conditional GET (that is, it only wants to get the file if it has been modified after a certain date) and the file has not been modified, the server responds with a 304 status code and doesn't bother resending the file.
400
The request was bad and incomprehensible. You should never receive this error if your browser was written properly.
401
The client has requested a file that requires user authentication.
403
The server understands the request but refuses to fulfill it, most likely because either the server or the client does not have permission to access that file.
404
The requested file is not found.
500
The server experienced some internal error and cannot fulfill the request. You often will see this error if your CGI program has some error or sends a bad header that the server cannot parse.
501
The command requested has not been implemented by the server.
502
While the server was acting as a proxy server or gateway, it received an invalid response from the other server.
503
The server is too busy to handle any further requests.

MIME

MIME headers look like the following:

type/subtype

where a type is any one of the following:

The subtype provides specific information about the data format in use. A subtype preceded by an x- indicates an experimental subtype that has not yet been registered. Table A.2 contains several MIME type/subtypes. A complete list of registered MIME types is available at URL: ftp://ftp.isi.edu/in-notes/iana/assignments/media-types.

Table A.2. MIME types/subtypes.

Type/SubtypeFunction
text/plain Plain text. By default, if the server doesn't recognize the file extension, it assumes that the file is plain text.
text/html HTML files.
text/richtext Rich Text Format. Most word processors understand rich text format, so it can be a good portable format to use if you want people to read it from their word processors.
text/enriched The text enriched format is a method of formatting similar to HTML, meant for e-mail and news messages. It has a minimal markup set and uses multiple carriage returns and line feeds as separators.
text/tab-separated-values Text tab delimited format is the simplest common format for databases and spreadsheets.
text/sgml Standard General Markup Language.
image/gif GIF images, a common, compressed graphics format specifically designed for exchanging images across different platforms. Almost all graphical browsers display GIF images inline (using the <img> tag).
image/jpeg JPEG is another popular image compression format. Although a fairly common format, JPEG is not supported internally by as many browsers as GIF is.
image/x-xbitmap X bitmap is a very simple pixel-by-pixel description of images. Because it is simple and because most graphical browsers support it, it can be useful for creating small, dynamic images such as counters. Generally, X bitmap files have the extension .xbm.
image/x-pict Macintosh PICT format.
image/tiff TIFF format.
audio/basic Basic 8-bit, ulaw compressed audio files. Filenames usually end with the extension .au.
audio/x-wav Microsoft Windows audio format.
video/mpeg MPEG compressed video.
video/quicktime QuickTime video.
video/x-msvideo Microsoft Video. Filenames usually end with the extension .avi.
application/octet-stream Any general, binary format that the server doesn't recognize usually uses this MIME type. Upon receiving this type, most browsers give you the option of saving the data to a file. You can use this MIME type to force a user's browser to download and save a file rather than display it.
application/postscript
application
/atomicmail
application
/andrew-inset
PostScript files.
application/rtf Rich Text Format (see text/richtext above).
application/applefile application/mac-binhex40
application
/news-message-id
application
/news-transmission
 
application/wordperfect5.1 WordPerfect 5.1 word processor files.
application/pdf Adobe's Portable Document Format for the Acrobat reader.
application/zip The Zip compression format.
application/macwriteii Macintosh MacWrite II word processor files.
application/msword Microsoft Word word processor files.
application/mathematica
application
/cybercash
 
application/sgml Standard General Markup Language.
multipart/x-www-form-urlencoded Default encoding for HTML forms.
multipart/mixed Contains several pieces of many different types.
multipart/x-mixed-replace Similar to multipart/mixed except that each part replaces the preceding part. Used by Netscape for server-side push CGI applications.
multipart/form-data Contains form name/value pairs. Encoding scheme used for HTTP File Upload.

As an example, the header you'd use to denote HTML content to follow would be

Content-Type: text/html

No-Parse Header

No-Parse Header (nph) CGI programs communicate directly with the Web browser. The CGI headers are not parsed by the server (hence the name No-Parse Header), and buffering is usually turned off. Because the CGI program communicates directly with the browser, it must contain a valid HTTP response header. The first header must be

HTTP/1.0 nnn message

where nnn is the three-digit status code and message is the status message. Any headers that follow are standard HTTP headers such as Content-Type.

You generally specify NPH programs by preceding the name of the program with nph-.

Note that HTTP is at version 1.0 currently, but 1.1 is being worked on as this book is being written, and some features and headers from 1.1 have already been implemented in some browsers and servers.

Input

CGI applications obtain input using one or a combination of three methods: environment variables, standard input, and the command line.

ISINDEX

ISINDEX enables you to enter keywords. The keywords are appended to the end of the URL following a question mark (?) and separated by plus signs (+). CGI programs can access ISINDEX values either by checking the environment variable QUERY_STRING or by reading the command-line arguments, one keyword per argument.

Environment Variables

CGI environment variables provide information about the server, the client, the CGI program itself, and sometimes the data sent to the server. Tables A.3 and A.4 list some common environment variables.

Table A.3. CGI environment variables.

Environment Variable
Description
GATEWAY_INTERFACE Describes the version of CGI protocol. Set to CGI/1.1.
SERVER_PROTOCOL Describes the version of HTTP protocol. Usually set to HTTP/1.0.
REQUEST_METHOD Either GET or POST, depending on the method used to send data to the CGI program.
PATH_INFO Data appended to a URL after a slash. Typically used to describe some path relative to the document root.
PATH_TRANSLATED The complete path of PATH_INFO.
QUERY_STRING Contains input data if using the GET method. Always contains the data appended to the URL after the question mark (?).
CONTENT_TYPE Describes how the data is being encoded. Typically application/x-www-form-urlencoded. For HTTP File Upload, it is set to multipart/form-data.
CONTENT_LENGTH Stores the length of the input if you are using the POST method.
SERVER_SOFTWARE Name and version of the server software.
SERVER_NAME Host name of the machine running the server.
SERVER_ADMIN E-mail address of the Web server administrator.
SERVER_PORT Port on which the server is running-usually 80.
SCRIPT_NAME The name of the CGI program.
DOCUMENT_ROOT The value of the document root on the server.
REMOTE_HOST Name of the client machine requesting or sending information.
REMOTE_ADDR IP address of the client machine connected to the server.
REMOTE_USER The username if the user has authenticated himself or herself.
REMOTE_GROUP The group name if the user belonging to that group has authenticated himself or herself.
AUTH_TYPE Defines the authorization scheme being used, if any-usually Basic.
REMOTE_IDENT Displays the username of the person running the client connected to the server. Works only if the client machine is running IDENTD as specified by RFC931

Table A.4. Common HTTP variables.

Environment Variable
Description
HTTP_ACCEPT Contains a comma-delimited list of MIME types the browser is capable of interpreting.
HTTP_USER_AGENT The browser name, version, and usually its platform.
HTTP_REFERER Stores the URL of the page that referred you to the current URL.
HTTP_ACCEPT_LANGUAGE Languages supported by the Web browser; en is English.
HTTP_COOKIE Contains cookie values if the browser supports HTTP cookies and currently has stored cookie values. A cookie value is a variable that the server tells the browser to remember to tell back to the server later.

A full list of HTTP 1.0 headers can be found at the following location:

http://www.w3.org/hypertext/WWW/protocols/HTTP/1.0/spec.html

Getting Input from Forms

Input from forms is sent to the CGI application using one of two methods: GET or POST. Both methods by default encode the data using URL encoding. Names and their associated values are separated by equal signs (=), name/value pairs are separated by ampersands (&), and spaces are replaced with plus signs (+), as follows:

name1=value1&name2=value2a+value2b&name3=value3

Every other nonalphanumeric character is URL encoded. This means that the character is replaced by a percent sign (%) followed by its two-digit hexadecimal equivalent. Table A.5 contains a list of nonalphanumeric characters and their hexadecimal values.

Table A.5. Nonalphanumeric characters and their hexadecimal values.

Character
Hexadecimal
Tab
09
Space
20
"
22
(
28
)
29
,
2C
.
2E
;
3B
:
3A
<
3C
>
3E
@
40
[
5B
\
5C
]
5D
^
5E
'
60
{
7B
|
7C
}
7D
~
7E
?
3F
&
26
/
2F
=
3D
#
23
%
25

The GET method passes the encoded input string to the environment variable QUERY_STRING. The POST method passes the length of the input string to the aenvironment variable CONTENT_LENGTH, and the input string is passed to the standard input.