6

MIME Documents


MIME is an acronym for Multipurpose Internet Mail Extensions. In the previous chapter, we talked about how message headers in MIME format were used in an HTTP request. Despite its misleading name, MIME is used to define the structure and content of many different types of Internet messages other than just mail. MIME is a protocol that allows transmission of non-text files such as graphics, audio, video, and program files. Elements other than text, such as graphics and audio, are encoded in the message as ASCII text. The MIME message can then be easily transmitted via e-mail or other text-based protocols. The non-text elements in the message are then unencoded at the other end after the MIME message is downloaded.

MIME, WWW, and CGI

In this chapter, we will look at how MIME messages and encoding schemes relate to the WWW and CGI. The MIME standard is introduced in RFC1341 and later completely defined in RFC 1521. RFC 1521 replaces the older RFC 1341 completely. A full discussion of the MIME standard is not necessary, but may be found at the following URL:

http://www.freesoft.org/Connected/RFC/1521/index.html

MIME provides three mechanisms which allow non-textual data to be specified and encoded into text documents that can be transferred over text e-mail gateways:

In addition to these three main mechanisms, two optional header fields may be defined to add a unique ID and a description of the body of the message:

In this chapter, we will look at the preceding header fields and their significance to programming to the Web. In the "How MIME Data Is Encoded" section, we will look at how the Perl5 MIME:: module can be used to encode data, which can be used in MIME messages. We will then look at a multipart MIME message generated by Netscape Mail, and see how these mechanisms are used in a real-world situation.

The MIME-Version Header Field

The MIME-Version header field uses a version number to declare a message to be conformant with this specification and allows mail processing agents to distinguish between such messages and those generated by older or non-conformant software, which is presumed to lack such a field.

The Content-Type Header Field

The Content-Type header field is used to specify the "type" and "subtype" of data in the body of a message and to fully specify the encoding of the data. There are seven main Content-Types, each with a growing number of subtypes. MIME has been carefully designed as an extensible mechanism, and it is expected that the set of content-type/subtype pairs and their associated parameters will grow significantly with time. Here are the seven main Content-Types along with currently defined subtypes. Text A text Content-Type value can be used to represent textual information in a number of character sets and formatted text description languages in a standardized manner:

text/html                        html htm

text/plain                       txt pl

text/richtext                    rtx

text/tab-separated-values        tsv

text/x-setext                    etx

Multipart A multipart Content-Type value can be used to combine several body parts, possibly of differing types of data, into a single message. Multipart/Alternative This signifies multiple content-types with the same (or similar) information. Multipart/Digest This signifies a series of included mail messages. Messages are of type message/RFC822 unless an explicit content-type is specified for each part. Multipart/Mixed This signifies data with multiple content-types. Multipart/Parallel This is similar to multipart/mixed data in parallel. Application An application Content-Type value can be used to transmit application data or binary data, and hence, among other uses, to implement an electronic mail file transfer service:

application/octet-stream          bin

application/oda                   oda

application/pdf                   pdf

application/postscript            ai eps ps

application/rtf                   rtf

application/x-mif                 mif

application/x-maker               fm

application/x-csh                 csh

application/x-dvi                 dvi

application/x-hdf                 hdf

application/x-latex               latex

application/x-netcdf              nc cdf

application/x-sh                  sh

application/x-tcl                 tcl

application/x-tex                 tex

application/x-texinfo             texinfo texi

application/x-troff               t tr roff

application/x-troff-man           man

application/x-troff-me            me

application/x-troff-ms            ms

application/x-wais-source         src

application/zip                   zip

application/x-bcpio               bcpio

application/x-cpio                cpio

application/x-gtar                gtar

application/x-shar                shar

application/x-sv4cpio             sv4cpio

application/x-sv4crc              sv4crc

application/x-tar                 tar

application/x-ustar               ustar

Message A message Content-Type value can be used for encapsulating another message in the document. Message/RFC822 This signifies an included (MIME) mail message. Message/News This signifies an included (MIME) USENET news message. Message/Partial This signifies that the content is a single part of a message split into multiple mail messages. Image This is an image Content-Type value for transmitting still image (picture) data:

image/gif                          gif

image/ief                          ief

image/jpeg                         jpeg jpg jpe

image/tiff                         tiff tif

image/x-cmu-raster                 ras

image/x-portable-anymap            pnm

image/x-portable-bitmap            pbm

image/x-portable-graymap           pgm

image/x-portable-pixmap            ppm

image/x-rgb                        rgb

image/x-xbitmap                    xbm

image/x-xpixmap                    xpm

image/x-xwindowdump                xwd

Audio This is an audio Content-Type value for transmitting audio or voice data:

audio/bsic                        au snd

audio/x-aiff                       aif aiff aifc

audio/x-wav                        wav

Video This is a video Content-Type value for transmitting video or moving image data, possibly with audio as part of the composite video data format:

video/mpeg                         mpeg mpg mpe

video/quicktime                    qt mov

video/x-msvideo                    avi

video/x-sgi-movie                  movie

The Content-Transfer-Encoding Header Field

The Content-Transfer-Encoding header field is used to specify an auxiliary encoding that was applied to the data in order to allow it to pass through mail transport mechanisms that may have data or character set limitations.

Additional Header Fields

Two additional header fields can be used to further describe the data in a message body: the
Content-ID and Content-Description header fields.

How MIME Data Is Encoded

MIME data can be encoded in two different ways. Each encoding method has its advantages and disadvantages, which are described later. The first method, Q encoding, is recommended for use when the characters to be encoded are in the ASCII character set; otherwise, the B encoding should be used. Both encoding/decoding methods are possible using MIME::Base64 (B) and MIME::QuotedPrint (Q).

Only a subset of the printable ASCII characters may be used in encoded-text. Space and tab characters are not allowed, so that the beginning and end of an encoded-word are obvious. The ? character is used within an encoded-word to separate the various portions of the encoded-word from one another and thus cannot appear in the encoded-text portion. Other characters are also illegal in certain contexts. For example, an encoded-word in a "phrase" preceding an address in a From header field may not contain any of the "specials" defined in RFC 822. Finally, certain other characters are disallowed in some contexts to ensure reliability for messages that pass through internetwork mail gateways.

The B encoding automatically meets these requirements. The Q encoding allows a wide range of printable characters to be used in non-critical locations in the message header (such as Subject), with fewer characters available for use in other locations.

B Base64 Encoding

The B encoding is identical to the Base64 encoding defined by RFC 1521.

Q Quoted-Printable Encoding

The Q encoding is similar to the Quoted-Printable content-transfer-encoding defined in RFC 1521. It is designed to allow text containing mostly ASCII characters to be decipherable on an ASCII terminal without decoding.

For more information on RFC 1521 and specific information about the rules used to encode data, go to:

http://www.freesoft.org/Connected/RFC/1521/index.html

Data encoded in the "Q" or Quoted-Printable method follows these basic rules:

  1. Any 8-bit value may be represented by a = followed by two hexadecimal digits. For example, if the character set in use were ISO-8859-1, the = character would thus be encoded as =3D and a SPACE by =20. (Uppercase should be used for hexadecimal digits
    A through F.)

  2. The 8-bit hexadecimal value 20 (for example, ISO-8859-1 SPACE) may be represented as _ (underscore, ASCII 95). (This character may not pass through some internetwork mail gateways, but its use will greatly enhance readability of Q encoded data with mail readers that do not support this encoding.) Note that the _ always represents hexadecimal 20, even if the SPACE character occupies a different code position in the character set in use.

  3. 8-bit values that correspond to printable ASCII characters other than =, ?, _ (underscore), and SPACE may be represented as those characters.

Encoding and Decoding MIME with libwww

Mechanisms for encoding and decoding MIME messages are provided in the MIME::Base64 and Mime::QuotedPrint classes.

Using MIME::Base64

Mime::Base64 includes two functions: encode_base64() and decode_base64(). To use Mime::Base64 in your script, include the following line near the beginning of your script:

use MIME::Base64;

After called, encoding and decoding strings of MIME is quite simple. Encoding is handled by sending a string of non-encoded text (which can be stored in a variable) to the encode_base64 routine. Here's how it's done:

$MyEncodedMime = encode_base64($MyPlainText);

Where, $MyEncodedMime will contain the Base64 encoded version of $MyPlainText.

Decoding is handled in the same way, using decode_base64.

Here's an example:

$MyDecodedText = decode_base64($MyEncodedMIME);

Many popular e-mail clients, such as Netscape Mail, have standardized on Base64 method encoding to attach binary files to ASCII text e-mail messages. Let's take a look at an example of an
e-mail message generated by Netscape mail that has Base64 encoded data as an attachment (see Listing 7.1).

Listing 7.1. E-mail with Base64 encoded

MIME attachment

From - Mon Jan 13 01:50:26 1997

Return-Path: <ckemp@ro.com>

Received: from chris (ts8p1.ro.com [205.216.92.168])

          by sh1.ro.com (8.8.4/8.8.4) with SMTP

  id CAA12517 for <ckemp@ro.com>; Mon, 13 Jan 1997 02:50:08 -0600

Message-ID: <32DA04C3.75E@ro.com>

Date: Mon, 13 Jan 1997 01:47:47 -0800

From: Chris Kemp <ckemp@ro.com>

Reply-To: ckemp@ro.com

Organization: Silicon Graphics

X-Mailer: Mozilla 3.01 (Win95; I)

MIME-Version: 1.0

To: ckemp@ro.com

Subject: MIME Test

Content-Type: multipart/mixed; boundary="------------6B797646304"

X-UIDL: 48f2146a82b1f706edbd5e7c350b352a

X-Mozilla-Status: 0001



This is a multi-part message in MIME format.



--------------6B797646304

Content-Type: text/plain; charset=us-ascii

Content-Transfer-Encoding: 7bit



Chris,



I have attached a picture of the Boeing F-22 to this message.



--------------6B797646304

Content-Type: image/jpeg; name="F22.jpg"

Content-Transfer-Encoding: base64

Content-Disposition: inline; filename="F22.jpg"



/9j/4AAQSkZJRgABAgEASABIAAD/7QG4UGhvdG9zaG9wIDMuMAA4QklNA+kAAAAAAHgAAwAA

AEgASAAAAAAC2gIo/+H/4gL5AkYDRwUoA/wAAgAAAEgASAAAAAAC2AIoAAEAAABkAAAAAQAD

AwMAAAABJw8AAQABAAAAAAAAAAAAAAAAYAgAGQGQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAA4QklNA+0AAAAAABAASAAAAAEAAQBIAAAAAQABOEJJTQPzAAAAAAAIAAAAAAAA

AAA4QklNJxAAAAAAAAoAAQAAAAAAAAACOEJJTQP1AAAAAABIAC9mZgABAGxmZgAGAAAAAAAB

AC9mZgABAKGZmgAGAAAAAAABADIAAAABAFoAAAAGAAAAAAABADUAAAABAC0AAAAGAAAAAAAB

OEJJTQP4AAAAAABwAAD/////////////////////////////A+gAAAAA////////////////

/////////////wPoAAAAAP////////////////////////////8D6AAAAAD/////////////

////////////////A+gAADhCSU0EBgAAAAAAAgAG/+4ADkFkb2JlAGQAAAAAAf/bAIQAAgIC

AgICAgICAgMCAgIDBAMCAgMEBQQEBAQEBQYFBQUFBQUGBgcHCAcHBgkJCgoJCQwMDAwMDAwM

DAwMDAwMDAEDAwMFBAUJBgYJDQoJCg0PDg4ODg8PDAwMDAwPDwwMDAwMDA8MDAwMDAwMDAwM

DAwMDAwMDAwMDAwMDAwMDAwM/8AAEQgCWAMYAwERAAIRAQMRAf/EAaIAAAAHAQEBAQEAAAAA

AAAAAAQFAwIGAQAHCAkKCwEAAgIDAQEBAQEAAAAAAAAAAQACAwQFBgcICQoLEAACAQMDAgQC

BgcDBAIGAnMBAgMRBAAFIRIxQVEGE2EicYEUMpGhBxWxQiPBUtHhMxZi8CRygvElQzRTkqKy

Y3PCNUQnk6OzNhdUZHTD0uIIJoMJChgZhJRFRqS0VtNVKBry4/PE1OT0ZXWFlaW1xdXl9WZ2

hpamtsbW5vY3R1dnd4eXp7fH1+f3OEhYaHiImKi4yNjo+Ck5SVlpeYmZqbnJ2en5KjpKWmp6

ipqqusra6voRAAICAQIDBQUEBQYECAMDbQEAAhEDBCESMUEFURNhIgZxgZEyobHwFMHR4SNC

FVJicvEzJDRDghaSUyWiY7LCB3PSNeJEgxdUkwgJChgZJjZFGidkdFU38qOzwygp0+PzhJSk

tMTU5PRldYWVpbXF1eX1RlZmdoaWprbG1ub2R1dnd4eXp7fH1+f3OEhYaHiImKi4yNjo+DlJ

WWl5iZmpucnZ6fkqOkpaanqKmqq6ytrq+v/dAAQAY//aAAwDAQACEQMRAD8A/P8A4q7FXYq7



<a few hundred K of encoded data removed>

sS+TZPzi8xpdeS/Jmoana6tf6VYteahqd8v1ixlmto4bi91KSRI45P8AX4tir9CMVdirsVdi

rsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirs

VdirsVdirsVdirsVeRfn5y/5UX+dHH7X+BPMVP8AuGXGLCX0v//Z

--------------6B797646304--

Notice the MIME content type declarations throughout the message. A practical application of the Mime::Base64 decode_base64 function would be extracting the encoded data to a file. Using this method, you could allow users to submit pictures via e-mail of items that could be integrated into a Web page.

Using MIME::QuotedPrint

Mime::QuotedPrint includes the functions encode_qp and decode_qp that function in the same way as their Base64 counterparts. The syntax is as follows:

$encoded = encode_yp($decoded);

$decoded = decode_yp($encoded);

MIME is significant to the Web programmer in two main ways. MIME and its different data types define standard types of data that can be transmitted as part of a document. MIME headers are used in every HTTP transaction, thus a familiarity with them is essential in advanced applications where your program must generate headers. MIME encoding schemes allow binary data to be included in text messages that can be transported over older e-mail (not HTTP) gateways.