Chapter 9

Basic HTML and Simple Pages


CONTENTS

HTML is one of the building blocks of any Intranet. HTML allows the document writer to be assured that a document will look reasonably good on any screen, from a large, high-resolution screen to a dumb ASCII terminal. Though this book doesn't get into intricate detail on HTML we do cover HTML basics.

After reading this chapter, you will know the following:

Introduction to HTML

Chapter 2discussed HTTP, the language Web browsers and servers use to communicate. This chapter covers the HyperText Markup Language, commonly called HTML. HTML allows browsers to display information in a display-independent way. This is done by describing how the document is laid out, not how it should look.

HTML 1.0 is a basic language with support for preformatted text, headers, and different type styles such as Bold or Italic. HTML 2.0 adds to these formatting tags and allows for more advanced controls; the latest specification HTML 3.0 allows text to flow around images, tables, and many other features.

NOTE
This book is not a tutorial on HTML. We will quickly cover the basics of HTML, but if you want to learn HTML, you should read Que's Special Edition Using HTML, 2nd Edition.

Content versus Style

Because HTML describes a document layout rather then how a document should look, the content of a document is more important than the style it is written in. Normally when creating a document, the user must decide which font style to use, which size font, and many different formatting options.

HTML however takes these decisions away from the developer and automatically handles the formatting details. This allows HTML to be viewed on dumb terminals as well as high resolution monitors.

NOTE
There have been "advancements" in HTML that some browsers take advantage of. These advancements include defining the colors, font sizes, and other features of how the document looks. These are discussed in the section "Advanced HTML."

About HTML

HTML documents are standard text documents that have special markup "tags" embedded in them. These tags are character or characters surrounded by < and >. These tags tell the browser what the text is and how to display it.

NOTE
Tags can either describe a logical or a physical style. The logical tag tells the browser what a piece of text is; for example, a major heading. Physical tags, on the other hand, tell a browser how to display something, such as italics.

There are three main rules to HTML:

HTML documents are made up of three parts: the declaration, the header, and the body.

The declaration is simply a tag that defines where the HTML document starts and ends. The tag to use is <HTML>, and the closing tag is </HTML>. This goes at the beginning and end of all HTML documents.

The header part contains information about the document. This includes at least the <TITLE> and <BASE> tags. The <TITLE> tag is usually displayed at the top of the browser and should be something that will allow the user to understand what the document contains. The <BASE> tag defines where relative URLs are referenced from. The header is enclosed by <HEAD> and </HEAD>.

The body document is contained between the <BODY> and </BODY> tags. This is where the document content is stored along with the tags that describe it.

Basic HTML Tags

The earliest version of HTML contained only a handful of tags. These few tags allow documents to be viewed on many different types of machines in a format that mimics the way the document is laid out.

The HTML specification requires that tags be surrounded by greater than and less than signs < and >. Some tags may be embedded in other tags. To end a tag, you precede it with a frontslash /. For example, for preformatted text you would start with <pre> and put your text in, then end the preformatted section with </pre>.

NOTE
HTML tags aren't case sensitive. <PRE> is the same as <pre> and <Pre>. The tags can also be ended with tags using different capitalization, so <PRE> can be ended with </pRe>.

The most common tags available include the following:

These are just some of the HTML 1.0 tags. By using these simple tags, you can create very usable documents that can be viewed on any screen. You can also link documents to other documents and have images embedded in with the text. It does not allow you to define the screen color, font size, or other attributes to define how a document will look; rather, it allows you to define how it is written.

Here is a simple HTML document:

<HTML>
<HEAD>
<TITLE> Test Document </TITLE>
</HEAD>
<BODY>
This is line 1.
This is line 2. Note there are no line breaks unless we specify them.<BR>
Like we just did.
<P>
<UL>
<LI> This is item1</LI>
<LI> This is item2</LI>
</UL>
<HR>
</BODY>
</HTML>

This page is shown in figure 9.1.

Figure 9.1 : A Test HTML page.

Converting Static Documents

Many companies start their Intranets by putting existing documents on-line. This has many advantages over paper based documents. These advantages include the following:

With all these benefits, it's no wonder that so many companies are starting to build Intranets.

Most companies have many static documents, which can include employee handbooks, newsletters, specifications, phone lists, policies or job postings.

These can all be easily converted to electronic documents and placed on the company Intranet server. Because the documents are fairly static, they can easily be added without having to learn CGI programming or advanced HTML topics.

Saving as ASCII

The quickest way to get your Intranet started is to store all your documents in an area of your Web server. They can be saved as regular text and viewed in a WWW browser. Most word processor programs allow the user to save as text or ASCII. Browsers usually have a Find option available which makes searching a file easy.

NOTE
It is possible to view documents without having a Web server running. Simply save all the documents in a central directory and use the Open Local option that most browsers have.

This is a quick way to get started, and for some companies or departments, this may be all that is needed. Other companies might want to go a step further and convert their documents to HTML. This would allow them to take advantage of the hyperlinking and embedding of documents.

Saving as HTML

In the past year or so, most of the large word processors have come out with updates that allow documents to be saved as HTML. Some of these word processors actually can be used as HTML editors by adding a template or style sheet.

Microsoft Word, Adobe's FrameMaker, and WordPerfect all have new versions that allow you to save a document as HTML.

NOTE
Because HTML is an evolving specification, some features may not be available using these applications. It may be necessary to save the document as HTML and then edit it with a text editor to add some HTML features.

Let's look at a few word processor add-ons that can make saving as HTML easier.

CU_HTML

CU_HTML is a Microsoft Word 2.0 or 6.0 template that allows easy creation of HTML files. It is available from http://www.cuhk.hk/csc/cu_html/cu_html/htm.

After you get the .zip file and uncompress it, you can install it by following the instructions in cu_html.htm or cu_html.doc. After it is installed and the template is selected, you will have an additional pull-down menu and another toolbar. These allow you to insert links, images, and tagging options. There is also an option to write html. This is used to save an HTML file. Figure 9.2 shows Word after loading in the CU_HTML template.

Figure 9.2 :Word with the CU_HTML template loaded.

After you create the text, you can simply highlight it and select the style you want to apply. This is just like the normal Word tools; the only difference is the style names in the list. Some of the styles you can apply include bold, italics, underline, Header 1-6, addresses, preformatted text, ordered lists, unordered lists, and horizontal rule.

Internet Assistant

Internet Assistant (IA) is a free add-on for Word that allows editing and browsing of WWW documents. It is available from Microsoft's Web site (http://www.microsft.com/msoffice/freestuff/msword/download/ia/default.htm), or you can order a disk from Microsoft for $5.00. To order a disk, call 1-800-426-9400.

When you install the package, you will have the option of setting Internet Assistant up as the default viewer. If you do this, then every time you start Word the Internet Assistant will also be loaded. Figure 9.3 shows Internet Assistant.

Figure 9.3 : Microsoft's Internet Assistant loaded in Word.

To start editing, you need to select the HTML.DOT template, or opening an existing HTML document (with an .htm extension) will automatically load the HTML template. Loading the template adds many new features to the pull-down menus and also adds new toolbars.

Using Internet Assistant is almost the same as using standard Word with a few more styles. Some of these styles are bold, italics, and underline. IA also allows you to add special symbols and correctly converts them to the HTML equivalent.

The Internet Assistant also has a forms generator. This is similar to the forms tool built into Excel. It allows you to select checkboxes, submit and restart buttons, pull-down lists, text areas, and radio buttons. IA prompts you for the correct information to finish making the form.

Converters

Having a word processor save documents as HTML is a good way to add existing documents or to convert a few documents to HTML. If you have many files, though, it can be time consuming to have to start the word processor, open the document, make any changes, and save the file as a different name.

If you have many documents, you really need a converter. Converters allow you to automate the conversions by using a script or batch file. There are many converters available on the Internet. We will cover converters that will allow us to convert FrameMaker documents, Microsoft Word documents and WordPerfect documents.

Using MIFtran

MIFtran is a program that converts FrameMaker documents that were saved in the Maker Interchange Format to HTML files. MIFtran is a C program that was written by Jim McBeath. It is available from ftp://ftp.alumni.caltech.edu/pub/mcbeath/web/miftran. It has been compiled and run on many versions of UNIX and should on most machines with a C compiler.

After you download the file and unpack it, you should have a number of files.You should read the file README and follow these instructions to get miftran working.

  1. Edit Makefile and make changes if necessary.
  2. Type make to build miftran.

Once miftran is built you can test it by changing to the html directory and running make there.

To use miftran, you need an "RC file." This file controls everything about miftran. See the Chap3.html file for details on the rc file. You may be able to use miftran.rc in the mtinc directory.

The command line options for miftran are

A sample miftran run would be

miftran -rc miftest.rc test.mif

Using rtftohtml

rtftohtml is a filter that can translate Rich Text Format (RTF) documents to HTML. Many word processors can save files in RTF format, so rtftohtml can be used to turn almost any word processor into an HTML editor.

Rtftohtml is available in binary form from ftp://ftp.skypoint.com/pub/members/s/spimis/latest/binaries/. You can also get the source code for compiling on UNIX machines from ftp://ftp.skypoint.com/pub/members/s/spimis/latest/src/unix.tar.Z.

Rtftohtml binaries are available for Macintosh, SunOS, Solaris, DOS and OS/2.

We are going to take a look at using rtftohtml to convert an rtf file. In this example, we are using rtftohtml version 2.7.5a for DOS.

Rtftohtml uses the html-trans file to decide how to convert the rtf code to html. If you add a new RTF style, you will need to understand this file. There are basically four parts to this file. They are the .PTag, .TTag, .Tmatch, and .Pmatch. These sections are used to define paragraph styles, text styles, text matching, and paragraph matching.

TIP
The html-trans file is complicated. If you need to add a style, read the instructions found in the Users Guide (guide.htm).

The command line options for rtftohtml are

Therefore, the following command will convert doc.rtf to doc.html without a table of contents, and all images will be linked using an HREF.

rtftohtm -T doc.rtf

To convert doc.rtf to test.html without re-creating graphics and with links to ".jpg" files instead of ".gif" files, you would use

rtftohtm -o test -G -P jpg doc.rtf

Using wp2html

wp2html is a shareware program written by Andy Scriven. You can download the evaluation copy from many sources; one of them is http://www.res.bbsrc.ac.uk/wp2html. If you decide you like the program, you need to register it. Registration costs five English pounds, or around seven U.S. dollars.

Wp2html handles WP tables and converts them to HTML tables. It also can handle graphics by either linking in a gif or jpeg version or by creating a description of the image, usually the name and size of the image.

NOTE
When you get the software, there will be one file called README.1ST and another called WPTOHTML.EXE. Run the WPTOHTML.EXE program, this is a self extracting file that will unpack itself and create several files. WP2HTMl can be installed by following the instructions in README.1ST.
This section discusses the DOS version. There is also a UNIX version available. The instruction will vary depending on which version you get.

wp2html can be run with just a filename. If you run it this way, it will create a new file with an .htm extension. For example, if you run wp2html junk.wp, it will create a junk.htm file.

You can also use one of four flags:

To use the configuration file called test.cfg, a style file called wp.sty and using input.wp as the input file you would run:

wp2html -c test.cfg -s wp.sty -i imput.wp

HTML Editors

HTML editors are a very good way for people to create basic HTML documents without having to learn the language. They allow you to see the document the way it will look in a browser.

CAUTION
Not all browsers support all tags the same way. Users can also change the way a tag is displayed. The term WYSIWYG (What You See Is What You Get) is not always true with HTML.

HTML editors are available for Windows platforms, Macintosh, and UNIX platforms. Most newer editors are WYSIWYG editors and allow the designers to simply grab the styles they want to use and apply them. It is also easy to add in hyperlinks by typing in the URL.

NOTE
Because HTML changes so quickly, most HTML editors can't be used to incorporate all the latest functions. It is often required to tweak HTML files after they come off the editor. This is especially true if you use Server Side Includes to set the look and feel of the site.

There are many editors on the market; most of them are for the Windows platform, though, so UNIX users have fewer choices. Two common UNIX editors are HoTMetal Pro and Netscape Gold. These editors are also available for the Microsoft Windows platforms. FrontPage from Microsoft and HotDog from Sausage Software are two also very nice HTML editors.

HoTMetaL Pro

HoTMetaL is developed by SoftQuad. They develop both a free version and the professional version. The free version is available for ftp from ftp://papa.indstate.edu/mirror/SoftQuad/hmfree2.exe.

NOTE
HoTMetaL Free is for use for noncommercial use only. If you are developing Intranet pages, this is considered commercial use, and you will need to purchase the Pro edition.

To run HoTMetal under Windows, you will need Windows 3.1 or higher running on a 486/33 or better, and at least 8 MB of RAM. SoftQuad also makes a version of HoTMetaL for UNIX machines.

Once you have downloaded the hmfree2.exe executable, you will want to save it to a temporary directory. This allows you to remove the installation files after the installation is complete.

In your temporary directory, run the hmfr22.exe program. This will extract the distribution files. In Windows, run the setup.exe program you just extracted. Once setup finishes you can remove the files in your temporary area. HoTMetaL has a very good help system and allows most of the current tags. It has a nice table editor and will warn you if you try to save a document that has invalid syntax in it. Figure 9.4 shows HoTMetal free in action.

Figure 9.4: HoTMetaL Free is a very nice editor.

Netscape Navigator Gold

Netscape's top-of-the-line browser package called Netscape Navigator Gold is actually a browser and editor combined. This allows you to edit documents and view them in the same software package. The Gold version costs 79 dollars and is available from the Netscape site (http://home.netscape.com).

To edit a document with Navigator Gold, you can simply view an HTML document and then switch to edit mode. This will add three toolbars. The toolbars each have a different function.

One of the nicest features of Navigator Gold is the Drag and Drop feature. This allows you to quickly move images, links, and horizontal rules by dragging them on the screen. Figure 9.5 shows a page in edit mode of Navigator Gold.

Figure 9.5: Navigator Gold edit window.

Microsoft FrontPage

FrontPage, available from Microsoft (http://www.microsoft.com/fp), is one of the more advanced HTML tools. It is actually much more than just an editor, it is a site development package. In addition to allowing you to create HTML you can also graphically view your site, test for broken links and use server robots.

FrontPage requires you to add server extensions if you want to use the server robots. There are server extensions for Netscape servers, O'Reilly's WebSite and IIS. More server extensions are being created to be able to use robots with even more servers.

FrontPage has all the features of a WYSIWYG editor plus the ability to develop an entire site using wizards. These wizards guide you through creating different types of web sites. These include corporate home pages, workgroup servers and discussion areas.

FrontPage also has a screen where you can graphically view your site and see where your links go. You can also verify your links and check to make sure they actually all point somewhere.

HotDog Pro

HotDog Pro from Sausage Software is another WYSIWYG editor. It can be downloaded from http://www.sausage.com.

HotDog Pro allows you to add styles, images and links easily using a graphical interface. HotDog Pro also has an HTML validator, spell checker and a real time viewer. This viewer eliminates the need to have a browser open and constantly hitting reload to see your new changes. You might still want to check your work in the standard browser you have chosen for your Intranet.

Advanced HTML

As more and more people started using HTML for more things, it became evident that the specification needed to be expanded if it was to be used for more than simple documents. Netscape decided to add extensions to the HTML specification to make it more useful for users of the Navigator software. These extensions are called the Netscape Extensions or "Netscapisms."

NOTE
When Netscape first introduced these extensions, many people were upset that Netscape didn't wait until the HTML working group approved them and added them to the new specification; however, now most browsers support at least some of these extensions.
Netscape, however, always seems to be pushing the envelope and adding new extensions before the HTML working group gets them added. Whether this is bad or not is still very controversial.

The HTML specification has also been expanded, and version 3.2 is now being worked on. Version 3.2 adds many features over the original specification. Some of the more useful features for an Intranet include Forms, Tables, Frames, different font sizes, and image dimensions.

Forms are used to submit data to a script. They are covered in Chapter 11, "HTML Forms." Tables are used to display data in a more organized fashion. Frames are useful for allowing the user to see more than one screen at once. Different font sizing is useful for making things look better, and image dimensioning is helpful in making an image always fit in a browser window.

Tables

Tables are used to display information that needs to be displayed in rows and columns. HTML doesn't allow the writer to force extra spaces between words, so using tables is the best way to format how a document will look.

NOTE
It is possible to use the <PRE> tag to get multiple spaces in to force a document to line up properly.

Table Tags

The tags that are needed when working with tables are:

The table tag can also be changed by adding other options. These include:

The td tag can also have options such as:

Cells can also be setup to span either rows or columns using the th tags and one of the following:

Using these tags, you can generate very interesting tables. It is also possible to embed other html elements inside a table to get special formatting.

Table Example

This is a sample table to show how a table is structured. This is not a complete html document; rather, it is a snippet that contains the table code. It would need to be placed in the body portion of an HTML document.

<TABLE border=3>
<TR>
<TH colspan=3>		Time for a Nearest Neighbor Search</TH><P>

<TR><TH>Machine  </TH><TH>  32768 nodes  </TH><TH>  65536 nodes  </TH><P>

<TR><TD>MM32K	</TD><TD>2.2 msec	</TD><TD>3.1  msec<BR></TD></TR>
<TR><TD>i486	</TD><TD>350 msec	</TD><TD>700  msec<BR></TD></TR>
<TR><TD>MIPS	</TD><TD>970 msec	</TD><TD>1800 msec<BR></TD></TR>
<TR><TD>Alpha	</TD><TD>81  msec	</TD><TD>177  msec<BR></TD></TR>
<TR><TD>Sparc	</TD><TD>410 msec	</TD><TD>820  msec<BR></TD></TR>
</TABLE>

This would generate a table that looks like the one shown in figure 9.6.

Figure 9.6: A sample table.

Frames

Frames aren't part of the HTML specification, but they can be very useful in an Intranet. They are used and supported by Netscape browsers and compatible browsers.

Frames are used to allow multiple HTML documents to be displayed in a single browser window. Frames can be used in such a way to make multiple screens easier to use, but they can also make a screen too complicated.

The most popular applications for frames in an Intranet are the following:

Frame Tags

There are three basic Frame tags:

When building frames, there is more than one file used. The first file simply sets up the placement of the frames and allows a noframes section for non-frames-aware browsers to see. Then each separate frame will call a new document to be displayed in the frame.

For example, look at a simple frames document:

<FRAMESET COLS="50%,50%">
<FRAME SRC="frame1.html">
<FRAME SRC="frame2.html">
</FRAMESET>

The first line tells the browser to have two frames side by side, each using 50 percent of the browser. The next two lines tell which HTML document to place in each frame. The last line ends the frameset.

The frameset tag describes how the document is laid out, either horizontally or vertically. It can also be used to specify the size of the frame. The size can be a percent, as in our example, a size in pixels, or an asterisk *. An * means divide the remaining space evenly between the remaining frames. The default is *. Framesets can also be nested to allow frames to be set up both horizontally and vertically.

The Frames tag also has attributes that can be used to define a frame. These include the following:

The Target tag was mentioned earlier when discussing assigning a name to a frame. The Target tag tells the browser which frame the action should be applied to. This action may be a new link or a form action.

NOTE
Target tags can be used anywhere an HREF tag is used plus anywhere an ACTION tag is used.

Target tags are used in links such as:

<a href="home.html" target="target_name">Hyperlink</a>

Targets can also be used as a base case. A base is defined as the default target. Base defaults to the same frame. An example of the use of Base is

<base target="frame2">

There are also some reserved target names. These names are:

Frames Example

Frames are very useful both for navigational links and for a reference such as a table of contents. You will learn how to develop a frame using two columns. One small one on the left for an index and one larger one on the right for the page.

This could be used for the employee handbook or for newsletters or any large document.

First, you need to create the frameset:

<html>
<head>
<title>Frames example</title>
</head>
<body>
<frameset cols="10%,90%">
<frame SRC="index.html" NAME="index">
<frame SRC="page1.html" NAME="page">
</frameset>
</body>
</html>

This will create the frameset with two frames side by side. The one on the left will take 10 percent of the space, and the right one will use the rest.

The frame on the left will be called index and will contain the contents of the "index.html" file. The one on the right will contain the contents of "page.html" and will have the name "page."

In an Intranet, you want all of your browsers to be able to handle frames, but since that isn't always possible, you would want to add a NOFRAMES section to your frameset. This would go in the body of the document but not in the frameset:

<NOFRAME>
This page is designed for use for frames browsers.
Since you don't have a frames browser you may want to 
jump to <A href="page.html"the first page in this document"</a>
to get started.
</NOFRAME>

Now in the index.html page, add hrefs for each entry. Update the entries in the page frame, so you can set a base tag equal to page:

<html>
<body>
<base target="page">
<ul>
<li><a href="page1.html">Page1 </a></li>
<li><a href="page2.html">Page2 </a></li>
<li><a href="page3.html">Page3 </a></li>
<li><a href="page4.html">Page4 </a></li>
</ul>
</body>
</html>

This sets the base to be page. This tells any links in this frame to display in the page frame. The next lines simply set up an unordered list of pages that the user can jump to. These lines can be any valid HTML. As long as the href doesn't have a target set, it will show up in the page frame.

All the page*.html documents are valid HTML. They don't have anything special with them, though it might be desirable to include a NOFRAMES section. This might have links to go back a page, forward a page, and back to the index. This would make it easier for people who don't have frames-capable browsers.

Figure 9.7 shows the frames example in the Netscape browser.

Figure 9.7: The frames example.

Using HTML to display information makes alot of sense and with newer features being added such as tables and frames there is almost nothing that can't be displayed.