Chapter 6

Creating an HTML Document

by Robert Meegan


CONTENTS

When you first venture out onto the Web, it all seems very much like a big cloud of gas with no form or structure. Later, as you develop some experience, you can begin to see the structure that constitutes the Web. At the top are the links that connect pages together, while at the bottom are the HTML documents that form the foundation.

Documents provide most of the content and a great deal of the form for the Web. It's your job as an author to create HTML documents that are informative and interesting about the subjects of which you are knowledgeable. The largest part of these documents will be the body element, where you will put the text and images that make up the content.

In this chapter, you will learn how to create the body element for your own HTML documents and you'll learn about the following:

The Basics of the Body Element

Despite the graphical nature of the Web, the vast majority of its information is in the form of text documents. Most people who view your documents will be interested in what you have to say. Because of this, whether you are converting existing documents or creating new ones, you will spend much of your time working in the body.

Starting with the Required Elements

Before you can fill in your document, you need to lay out a basic working framework. As you saw in chapter 4, "Building Blocks of HTML," HTML documents must follow a defined pattern of elements if they are to be interpreted correctly. It is a good idea for you to create a template to use for each of your pages so that you are less likely to leave out an important detail. Listing 6.1 is an example of a basic template.


Listing 6.1  A Basic Document Template
<HTML>
<HEAD>
<TITLE> A Basic Document Template </TITLE>
<HEAD>
<BODY>
Put the body text in here.
</BODY>
</HTML>

This template begins with the <HTML> tag (see fig. 6.1), which is necessary for every HTML document. Next is the <HEAD> tag, which opens up the heading part of the document. This contains the <TITLE> element, which is used for adding a title to your document. This element is not required, but using it represents good practice as it helps readers of your document to know what they are reading. The heading is closed with the </HEAD> tag. Finally, the <BODY> element follows. This will be where you place the bulk of the material in your document. Remember to close the body element with the </BODY> tag and to finish the page with the </HTML> tag.

Figure 6.1 : The basic framework creates a document with a title and a single line of text.

Because HTML is a markup language, the body of your document is turned on with the start tag, <BODY>. Everything that follows this tag is interpreted according to a strict set of rules that tell the browser about the contents. The body element is closed with the end tag, </BODY>.

Note
Strictly speaking, it isn't absolutely necessary to use the <BODY> start and end tags, as HTML allows you to skip a tag if it is obvious from the context. It's still a good idea to use them. Some older browsers and other HTML programs may become confused without them.

In the basic template shown above, the body text is a single line. In your document, you will replace this line with the main text of your document. Unless you are using a special HTML editor, you must enter your text using a strict ASCII format. This limits you to a common set of characters that can be interpreted by computers throughout the world. The text that you enter here-whether for the first time or from an existing document-must be completely free of any special formatting. Note that some ASCII characters can only be added to the document by using a special coding scheme. This will be discussed later in this chapter.

Note
Most browsers consider all non-blank white space (tabs, end-of-line characters, etc.) as a single blank. Multiple white spaces are normally condensed to a single blank.

The body element in the template shown in listing 6.1 also includes an address element. I'll tell you more about this later.

Breaking Text into Paragraphs

Your old English teacher taught you to break your writing up into paragraphs that expressed complete thoughts, and an HTML document shouldn't be an exception. Unfortunately, line and paragraph breaks are a little more complicated in HTML than you might expect.

As a markup language, HTML requires that you make no assumptions about your reader's machine. The readers of your document can set whatever margins and fonts they want to use. This means that text wrapping must be determined by the browser software, as it is the only part of the system that knows about the reader's setup. Line feeds in the original document are ignored by the browser, which then reformats the text to fit the context. This means that a document that may be perfectly legible in your editor (see fig. 6.2) is badly mashed together in the browser, as shown in figure 6.3.

Figure 6.2 : Line feeds separate the paragraphs in the editor.

Figure 6.3 : The browser ignores the line feeds and runs the text together.

The proper way to break text into paragraphs is by using paragraph elements. Place a paragraph start tag, <P>, at the beginning of each new paragraph, and the browser will know to separate the paragraphs. Adding a paragraph end tag, </P>, is optional, as it is normally implied by the next start tag that comes along. Still, adding the </P> tag at the end of your text can help to protect your documents against browsers that don't precisely follow the HTML 2.0 standard. HTML 1.0 did use the paragraph tag as a container and documents created to that standard have all their text between paragraph start and end tags.

Figure 6.4 shows what the document looks like in the editor after the paragraph tags have been added. You can see that the tags were added to the start of each paragraph and that the line feeds are still in the document. Because the browser ignores the line feeds anyway, it is best to keep them in the source document to make it easier to edit later.

Figure 6.4 : You must begin each paragraph with the <P> tag.

When you look at the document in figure 6.5, you can see that the browser separated the paragraphs correctly by adding a double-spaced line between them.

Figure 6.5 : With paragraph elements, the text becomes much easier to read in the browser.

Note
In some HTML documents, you will see a paragraph start tag, <P>, used repeatedly in order to create additional white space. This is not supported in HTML, and most current browsers will ignore all of the <P> tags after the first one.

Adding Line Breaks

As you have seen, HTML does all of the formatting at the browser rather than at the source. This has the advantage of device independence. But what do you do if you have a reason to break up a line of text at a certain point?

The way to end a line where you want is to use the line break tag, <BR>. This forces the browser to start a new line, regardless of the position in the current line. Unlike the paragraph element, the line break does not double-space the text. Because the line break element is not a container, it does not have an end tag.

One reason you might want to force line breaks is to show off your poetic muse, as shown in listing 6.2.


Listing 6.2  A Limerick Showing the Use of the <BR> Tag
<HTML>
<HEAD>
<TITLE>Creating an HTML Document</TITLE>
</HEAD>
<BODY>
<P>A very intelligent turtle<BR>
Found programming UNIX a hurdle<BR>
The system, you see,<BR>
Ran as slow as did he,<BR>
And that's not saying much for the turtle.<BR>
<CITE>Mercifully anonymous</CITE>
</BODY>
</HTML>

When this source is viewed in figure 6.6, you can see how the line break element works.

Figure 6.6 : Use line breaks to force a new line in the browser.

Tip
Multiple line breaks can be used to provide extra white space in your document. The problem is that some browsers will condense multiple line breaks (multiple <BR> or <P> tags) to a single line break.

You need to be careful when using line breaks; if the line has already wrapped in the browser, your break may appear after only a couple of words in the next line. This is particularly the case if the browser that you test your documents on has wider margins than your reader's browser. Figure 6.7 shows an example where the author saw that the break was occurring in the middle of the quotation, so she added a <BR>. Unfortunately, when displayed on a screen with different margins, the word "actually" ends up on a line by itself.

Figure 6.7 : Careless use of line breaks can produce an unexpected result.

Creating a Text Outline

So far, your HTML document probably looks a little dull. To make it more interesting, the first thing that you need to do is add a little more structure to it. Users of the Web want to be able to quickly scan a document to determine whether or not it has the information for which they are looking. The way to make this scanning easier is to break the document up into logical sections, each covering a single topic.

After you have broken up the document, the next step is to add meaningful headers to each section, which enables your reader to quickly jump to the material of interest.

Adding Headings

Headings in HTML provide an outline of the text that forms the body of the document. As such, they direct the reader through the document and make your information more interesting and usable. They are probably the most commonly used formatting tag that you will find in HTML documents.

The heading element is a container and must have a start tag (<H1>) and an end tag (</H1>). HTML has six levels of headings: H1 (the most important), H2, H3, H4, H5, and H6 (the least important). Each of these levels will have its own appearance in the viewer's browser, but you have no direct control over what that appearance will be. This is part of the HTML philosophy: you, as the document writer, have the responsibility for the content, while the browser has the responsibility for the appearance. See the example in listing 6.3.


Listing 6.3  An HTML Document Showing the Use of Headings
<HTML>
<HEAD>
<TITLE>Creating an HTML Document</TITLE>
</HEAD>
<BODY>
<H1>Level 1 Heading</H1>
<H2>Level 2 Heading</H2>
<H3>Level 3 Heading</H3>
<H4>Level 4 Heading</H4>
<H5>Level 5 Heading</H5>
<H6>Level 6 Heading</H6>
</BODY>
</HTML>

Note
Although it is not absolutely necessary to use each of the heading levels, as a matter of good practice you should not skip levels because it may cause problems with automatic document converters. In particular, as new Web indexes come online, they will be able to search Web documents and create retrievable outlines. These may become confused if heading levels are missing.

Figures 6.8 and 6.9 show how these headings look when they are displayed in Netscape Navigator and Microsoft Internet Explorer. You can see that not only do they use different fonts, but the sizes of the headings are different.

Figure 6.8 : Here are the six heading levels as they appear in Netscape.

Figure 6.9 : Here are the six heading levels as they appear in the Internet Explorer.

Note
Remember that forgetting to add an end tag will definitely mess up the appearance of your document. Headings are containers and require both start and end tags. Another thing to remember is that headings also have an implied paragraph break before and after each one. You can't apply a heading to text in the middle of a paragraph to change the size or font. The result will be a paragraph broken into three separate pieces, and the middle one will have a heading format.

The best way to use headings is to consider them the outline for your document. Figure 6.10 shows a document in which each level of heading represents a new level of detail. Generally, it is good practice to use a new level whenever you have two to four items of equal importance. If more than four items are of the same importance under a parent heading, however, try breaking them into two different parent headings.

Figure 6.10: Headings provide an outline of the document.

Adding Horizontal Lines

Another method for adding divisions to your documents is the use of horizontal lines. These provide a strong visual break between sections and are especially useful for separating the various parts of your document. Many browsers use an "etched" line that presents a crisp look and adds visual depth to the document.

You can create a horizontal line using the horizontal rule element, <HR>. This tag draws a shaded horizontal line across the browser's display. The <HR> tag is not a container and does not require an end tag. There is an implied paragraph break before and after a horizontal rule.

Figure 6.11 shows how horizontal rule tags are used, and figure 6.12 demonstrates their appearance in the Internet Explorer browser.

Figure 6.11: Horizontal rules divide major sections.

Figure 6.12: Most browsers interpret the <HR> tag as an etched line.

Horizontal rules should be reserved for instances when you want to represent a strong break in the flow of the text. Some basic guidelines for adding rules are that they should never come between a heading and the text that follows the heading and that they should not be used to create "white-space" in your document.

Formatting Your Text

Your readers are used to seeing sophisticated media presentations. The books, magazines, and even newspapers that they read are created with a variety of text styles designed to catch the eye and enable the reader to identify the significant elements quickly. This formatting makes up for the lack of voice inflection that would normally exist if the author were actually speaking.

Even with the addition of headings, the documents that you have created so far still lack interest. You are speaking to your readers in a monotone voice that displays none of the enthusiasm you have for your topic. This section covers methods that you can use to bring life to your documents.

Caution
Just as in any other form of computer publishing, it is possible to overuse any of these elements. Remember that attractive and informative documents will use these techniques sparingly.

Logical Format Elements

One of the ideas behind HTML is that documents should be laid out in a logical and structured manner. This gives the users of the documents as much flexibility as possible. With this in mind, the designers of HTML created a number of formatting elements that are labeled according to the purpose they serve rather than by their appearance. The advantage of this approach is that documents are not limited to a certain platform. Although they may look different on various platforms, the content and context will remain the same.

These logical format elements are as follows:

<CITE>Tom Sawyer</CITE> remains one of the classics of American literature.
One of the first lines that every C programmer learns is:
<CODE>puts("Hello World!");</CODE>
The actual line reads, "Alas, poor Yorick. I knew him, <EM>Horatio</EM>."
To run the decoder, type <KBD>Restore</KBD> followed by your password.
The letters <SAMP>AEIOU</SAMP> are the vowels of the English language.
The most important rule to remember is <STRONG>Don't panic
</STRONG>!
The sort routine rotates on the <VAR>I</VAR>th element.

Note that all of these elements are containers, and as such, they require an end tag. Figure 6.13 shows how these logical elements look when seen in the Netscape browser.

Figure 6.13: Samples of the logical format elements are displayed in Netscape.

You have probably noticed that a lot of these format styles use the same rendering. The most obvious question to ask is, why use them if they all look alike?

The answer is that these elements are logical styles. They indicate what the intention of the author was, not how the material should look. This is important because future uses of HTML may include programs that search the Web to find citations, for example, or the next generation of Web browsers may be able to read a document aloud. A program that can identify emphasis would be able to avoid the deadly monotone of current text-to-speech processors.

Physical Format Elements

Having said that HTML is intended to leave the appearance of the document up to the browser, I will now show you how you can have limited control over what the reader sees. In addition to the logical formatting elements, it is possible to use physical formatting elements that will change the appearance of the text in the browser. These physical elements are as follows:

This is in <B>bold</B> text.
This is in <I>italic</I> text.
This is in <TT>teletype</TT> text.

If the proper font isn't available, the viewer's browser must render the text in the closest possible manner. Once again, each of these is a container element and requires the use of an end tag. Figure 6.14 shows how these elements look in the Internet Explorer.

Figure 6.14: Samples of the physical format elements are shown in the Internet Explorer.

These elements can be nested, with one element contained entirely within another. Overlapping elements are not permitted and can produce unpredictable results. Figure 6.15 gives some examples of nested elements and how they can be used to create special effects.

Figure 6.15: Logical and physical format elements can be nested to create additional format styles.

Additional Text Elements

Not everything that is in the body of your document is strictly paragraph text. There are other text elements that you might want to use in your documents. These are more specialized and should be reserved for cases that can't be handled any other way.

Special Characters

There are a number of special characters that are not found in the basic ASCII set. These include letters and characters used by other European languages, some mathematical symbols, and an assortment of other characters. These can be added to your document using the special character entity. The format of this entity is an ampersand (&) followed by the name of the character. The example in listing 6.4 shows how you can use the special characters.


Listing 6.4  Using Special Characters
<H3>The Use of Character Format Elements</H3>
This is how to add &ltEM&gtemphasis&lt/EM&gt to a word.<BR>
Which gives the result:<BR>
This is how to add <EM>emphasis</EM> to a word.<BR>

Figure 6.16 shows what this example looks like in Netscape.

Figure 6.16: Special characters can be added to HTML documents using the special character entities.

The Address Element

One of the most important elements for your documents is the address element. This is where you identify yourself as the author of the document and (optionally) let people know how they can get in touch with you. Any copyright information for the material in the page can be placed here as well. The address element is normally placed at either the top or bottom of a document. Figure 6.17 is an example of one such address element.

Figure 6.17: The address element is used to identify the author or maintainer of the document.

Note
A very important addition to the address is to indicate the date that you created the document and the last revision date. This will enable people to determine if they have already seen the most up-to-date version of the document.

The <Blockquote> Element

You may have the opportunity to quote a long piece of work from another source in your document. To indicate that this quotation is different from the rest of your text, HTML provides the <Blockquote> element. This container functions as a body element within the body element and can contain any of the formatting or break tags. As a container, the <Blockquote> element is turned off by using the end tag.

The normal method used by most browsers to indicate a <Blockquote> element is to indent the text away from the left margin. Some text-only browsers may indicate a <Blockquote> using a character, such as the greater than sign, in the leftmost column on the screen. Because most browsers are now graphical in nature, the <Blockquote> element provides an additional service by enabling you to indent normal text from the left margin. This can add some visual interest to the document.

Figure 6.18 shows how a <Blockquote> is constructed, including some of the formatting available in the container. The results of this document when read into Netscape can be seen in figure 6.19.

Figure 6.18: The <Blockquote> element serves as a text container within the body.

Figure 6.19: This is the appearance of the document in Netscape.

Using Preformatted Text

Is it absolutely necessary to use paragraph and line break elements for formatting text? Well, not really; HTML provides containers that can hold preformatted text. This is text that gives you, the author, much more control over how the browser displays your document. The trade-off for this control is a loss of flexibility.

The <PRE> Container

The most useful and most common of the preformatting tags is the <PRE> container. Text in a <PRE> container is basically free-form with linefeeds causing the line to break at the beginning of the next clear line. Line break tags and paragraph tags are also supported. This versatility enables you to create such items as tables and precise columns of text. Another common use of the <PRE> element is to hold large blocks of computer code that would otherwise be difficult to read.

Text in a <PRE> container can use any of the physical or logical text formatting elements. You can use this feature to create tables that have bold headers or italicized values. The use of paragraph formatting elements, such as <Address> or any of the heading elements, is not permitted however. Anchor elements, which are described in chapter 7, "Linking HTML Documents," can be included within a <PRE> container.

The biggest drawback to the <PRE> container is that any text within it is displayed in a monospaced font in the reader's browser. This tends to make long stretches of preformatted text look clunky and out of place.

Figure 6.20 shows an example of some preformatted text in an editor. You can use the editor to line up the columns neatly before adding the character formatting tags. The result of this document is shown in figure 6.21.

Figure 6.20: Preformatted text can be used to line up columns of numbers.

Figure 6.21: A preformatted table can look professional in a document.

Tip
HTML 3.0 introduces table elements that automatically line up text and graphic elements. If you are sure that your readers will have a proper browser, use these instead.

Caution
The definition for tab characters is that they will move the cursor to the next position, which is an integer multiple of eight. The official HTML specification recommends that tab characters not be used in preformatted text because they are not supported in the same way by all browsers. Spaces should be used for aligning columns.

The <XMP> Container

There are other preformatted container classes. The <XMP> container gives you the capability to create text that is already laid out. However, there are some disadvantages to the <XMP> container. HTML elements are not permitted inside of an <XMP> container. Browsers are not allowed to recognize any markup tags except the end tag. Unfortunately, many browsers don't comply with this standard properly, and the official specification for HTML lists <XMP> as obsolete.

The <XMP> container must be rendered in a font size that permits at least eighty characters on a line. Figure 6.22 is an example of the <XMP> container in use.

Figure 6.22: The <XMP> container allows preformatted text in a proportional font.

The <LISTING> Container

Another preformatted text container is the <LISTING> element. This container must display at least 132 characters on a line, but is in all other ways identical to the <XMP> container. The <LISTING> element is also obsolete as of HTML 2.0.

Caution
You should avoid using the <XMP> and <LISTING> elements unless it is absolutely necessary. Because they have been declared obsolete, browsers are not required to support them any longer. You will be more certain of what your readers are seeing if you use the <PRE> element instead.

Adding Hidden Comments

It is possible to add comments to your HTML document that won't be seen by a reader. The syntax for this is to begin your comment with the <! tag and to end it with the -> tag. Anything located between the two tags will not be displayed in the browser. This is a convenient way to leave notes for yourself or others. An example might be to add a comment when new material is added to a document that shows the date of the new addition.

Caution
Don't assume that your comments can't be seen by your readers. Most browsers allow the source of your document to be viewed directly, including any comments that you have added.

On the other hand, don't try to use comments to "comment out" any HTML elements. Some browsers interpret any > as the end of the comment. In any case, the chances of the browser becoming confused are pretty good, with the result that the rest of your document will be scrambled badly.

HTML 3.0 Additions

By now you may be wondering where all of these rules come from. The World Wide Web Coalition (or W3C, as it is known) is an unofficial body that publishes specifications for HTML and the Web. These specifications are prepared in draft format and then debated at great length across the Internet. At a predetermined date, a final specification is published and it becomes the standard for the Web.

Unfortunately, the W3C is an unofficial organization and can take a long time completing specifications. The problem with this process is that the developers of Web browsers and other software often introduce new features into HTML before they are approved by the W3C (and sometimes in a different form than is finally released).

The result of all of this maneuvering is that the HTML 3.0 specification (also known as HTML+) has never actually been finalized. Despite this, many of the most popular browsers already support the new proposed features. In this section, I will be referring to these features as HTML 3.0, even though such a standard doesn't actually exist yet.

Note
At this time, Netscape Navigator, Microsoft Internet Explorer, and NCSA Mosaic all support a significant fraction of the HTML 3.0 standard. Unfortunately, the feature sets of these three popular browsers don't completely overlap.

Text Positioning

One of the biggest additions to the standard in the HTML 3.0 specification is the ability to control the positioning of text horizontally across the page. This will give you the option of placing your headings either against the left margin, in the center of the page, or against the right margin. The flexibility to locate your headings where you want them will enable you to make your documents more appealing.

Alignment is specified for headings in the same way that it is for paragraphs, using the ALIGN attribute. The acceptable choices for heading alignment are left, right, center, and justify. Setting alignment to justify will start the heading at the left margin and add spaces to fill the entire line length, if possible. Figure 6.23 provides examples of how to specify heading alignment, and figure 6.24 shows the results of these examples.

Figure 6.23: Alignment can be specified in the heading element.

Figure 6.24: The use of alignment can improve the appearance of headings.

Additions to the <BR> Element

The <BR> element also has a new attribute that can be used for locating text adjacent to floating images. The CLEAR attribute can be set to LEFT, RIGHT, or ALL to break the line and start the next line of text where the left margin, right margin, or both margins are free of any images. Figure 6.25 shows an example of a <BR CLEAR=LEFT> element.

Figure 6.25: The CLEAR attribute can be used to avoid wrapping text around images.

The <NOBR> Element

Just as there are instances in which it is convenient to break a line at a specified point, there are also times when you would like to avoid breaking a line at a certain point. Any text between a <NOBR> start tag and the associated end tag is guaranteed not to break across lines.

Note
This can be very useful for items, such as addresses, where an unfortunate line break can cause unexpected results. Don't overuse the <NOBR> element, however. Text can look very strange when the natural line breaks have been changed.

Tip
If you think you might need a break inside of an <NOBR> element, you can suggest a breaking point with a <WBR> tag. The browser will only use the <WBR> if it needs it.

Text Format Elements

The arrival of HTML 3.0 will also add a number of new physical font style elements to the ones listed above. These are used just as the older elements are, but with the caveat that if you use these elements, readers using some browsers may not see the effects that you intend.

The new elements are as follows:

This text is <U>underlined</U>.
This is a <S>strikethough</S> example.
This is <BIG>big</BIG> text.
This is <SMALL>small</SMALL> text.
This is a <SUB>subscript</SUB>.
This is a <SUp>superscript</SUP>.

Note
Netscape Navigator uses an alternative tag, <STRIKE>, for strikethrough. Microsoft Internet Explorer and NCS
A Mosaic permit both <STRIKE> and <S>.
Netscape Version 2.0 does not recognize the <U> element.
Internet Explorer Version 2.0 doesn't support <BIG> and <SMALL> or <SUB> and <SUP>.
Is your head spinning yet?

Body Element Attributes

A number of new attributes for the <BODY> element have been added. These give the document author considerable latitude in the display of the text by adding them to the body start tag. Once any of them have been used, they are used for the remainder of the document. For example, change the text to a bright purple as follows:

<BODY TEXT="#ff00ff">

The following are the new attributes that can be used in the body element:

Note
If a BACKGROUND image is specified but not loaded, the browser will then attempt to use the BGCOLOR attribute. If BGCOLOR hasn't been specified, the browser will ignore the TEXT, LINK, ALINK, and VLINK attributes in order to avoid the possibility of the text disappearing against the background.