Chapter 19

Verifying and Checking a Web Page

by Bill Brandon


CONTENTS

You may remember the popular line from the movie Field of Dreams: "If you build it, they will come." Everyone who creates a page or a site for the Web would like to communicate a message, attract attention and traffic, perhaps even win acclaim. Otherwise, there's no purpose in the effort. But you probably found out in your first few hours on the Web that not all pages are worthy of much attention, let alone acclaim. Building a great Web page takes a lot of time, attention to detail, and knowledge of your readers.

Four elements separate outstanding Web pages from the other 95 percent. First, great Web pages are always mechanically sound: the HTML is written correctly, the text and graphics display correctly, and the links all work. Second, an outstanding page is aesthetically pleasing; having a pleasant appearance is different from being cool and flashy. Third, a great page is built from the ground up to provide value to the viewers. Finally, a great Web page adheres to certain standard practices. These practices create a practical user interface and allow visitors to get the page to respond in predictable ways.

In this chapter, you learn about the following:

What Is Web Page Verification?

How often have you managed to complete an entire document or program without making a single mistake? Even when I proofread and test carefully, at least one embarrassing error always seems to get past me. HTML documents are no different, except that even more things can go wrong.

Web page verification is the continuing task of making sure your HTML source code is intelligible to browsers and provides the interface you and the Web surfer expect. Web page verification also addresses the maintenance of those vital hypertext links to other pages, to images, and to files. You can think of Web page verification as a combination Quality Assurance function and continuous improvement program.

Resources are available on the Web itself to check your HTML syntax and test your links. Some of these resources run on other people's servers. Anyone on the Web can use them. These resources can go by any of several names, such as validators or validation services, but in this chapter I refer to the entire group as verification services.

Most of this chapter is concerned with demonstrating several of these tools on the Web. I will also show you where to download a number of these tools. You can then run them on your own server (if you have one) to perform the same functions.

The tools you will see perform at least one of the two essential verification functions: they verify HTML source code, they verify links, or they do both.

Verifying HTML Source

HTML is written to be read by Web browsers, not by human beings. Although most Web browsers are pretty forgiving, basic errors in HTML syntax prevent a page from being displayed properly. Other HTML errors cause people to have to wait longer than they like while a page loads. Such failures can destroy the effectiveness of your Web site.

Of course, you can use an SGML-aware editor such as HoTMetaL; it does syntax checking on the fly. Such a tool makes sure you use the correct tag for any given context.

But not everyone uses an SGML-aware editor. Many people use Notepad or WordPad to prepare their HTML documents. If you are using a less capable editor, you can employ any of several verification services on the Web to check out your HTML source for errors. Such services vary in their capabilities, but they have one outstanding characteristic in common: they are free. In fact, even if you use an SGML-aware editor, you should verify your source code. Why? Change.

A fact of life on the World Wide Web is the speed with which it has grown. In less than three years, the Web has acquired millions of users. At the same time, the HTML convention has gone through three standards. The browsers used to access documents on the Web have undergone a similar explosive growth. An editor that is up-to-date in its ability to parse and correct syntax today may well be obsolete in three months or less.

Some browsers now use nonstandard tags (extensions) in documents to deliver special effects. You may have seen pages on the Web marked "This page appears best under Netscape" or "This page appears best with Internet Explorer." To say that this presents a challenge to Web page developers is an understatement.

A developer can build a Web page to conform to the HTML 3.0 Standard, for example. The page may look wonderful when viewed with a Level 3.0 browser. But what does it look like when viewed with a Level 1.0 browser? There are also users who cruise the Web with text-only browsers like Lynx. Millions of copies of browsers that conform to standards less capable than HTML 3.0 are in use every day, all over the world. The developer wants all of them to be able to get her message, buy her products, and find her e-mail address. Meanwhile a growing percentage of the Web population worldwide uses some version of Netscape. What will they see?

One solution to this problem is for every developer to obtain a copy of every browser and check out the page under each one. This solution seems a little extreme. The on-line verification services offer a much simpler answer. And the on-line services are constantly updated as well.

Later in this chapter, you will look in detail at the three leading on-line verification services: WebTechs, Weblint, and the Kinder-Gentler Validator. You'll also be introduced to the Chicago Computer Society's Suite of Validation Suites, which provides a convenient interface for all three of these services and much more. I'll show you four excellent alternative verification services, too. Finally, you'll learn where to obtain verification tools that you can install on your own server and get a look at what it takes to do this.

Verifying Links

Simply checking the HTML source ensures only that your documents appear the way you expect them to appear on different browsers. You also need to make sure that all your links work the way they are supposed to work.

A browser tries to follow any link the user clicks. One possible source of problems is a simple typographical error. Every page designer makes these errors, and sometimes these errors happen while you're entering links. An SGML-aware editor doesn't catch this problem, and chances are you won't spot all of them either. Another source of trouble is the constant change that the Web undergoes. A link to a valid site last week may not go to anything next week. Web pages require continuous maintenance and verification to guard against these dead links.

One way to check your links is to ask all your friends to test your document periodically. This idea is good in theory, but it's a fast way to lose your friends in practice. Luckily, some verification tools and services will test your links for you.

Checking links is part of routine maintenance. Most Web browsers are very forgiving of errors in HTML. A broken link is not something that a browser can deal with, though. For this reason, you should test on a regular basis.

In the section titled, "Using Doctor HTML," you will discover an excellent on-line resource for testing your links and fine-tuning page performance. Although the Doctor is not available for installation on local servers, other link testers are; they are listed in the section titled, "Obtaining and Installing Other Verification Suites." Finally, there is a service called URL-Minder that will notify you whenever there are changes to a page to which your page is linked. This is described in the section on "Using Other Verification Services on the Web."

Examining the Options

You have a couple of ways that you can go to obtain regular verification of your pages. You can do the job yourself, or you can have someone else do it.

To do verification yourself, you install and run one or more tools on your server. These tools are CGI scripts, nearly always written in perl. Many are available from Web sites at no cost. You could also write your own CGI script.

Of course, you may not have a server of your own on which to install verification tools. In this case, you can use one of the many public tools available on the Web. This capability can be extremely convenient if you are developing pages at a client's site or in a hotel room while you're traveling.

Running verification on your own server is a good thing if you have a lot of HTML source code and Web pages to maintain. Companies that have an in-house version of the Internet (often referred to as an "intra-net") would find this an attractive option. In the sections titled, "Installing the WebTechs HTML Check Toolkit Locally," and "Obtaining and Installing Other Verification Suites," you'll learn exactly what is required to set up this capability.

Most people, however, only occasionally need to verify any source code and have just a handful of links to maintain. In these cases, even if you have a server available you may want to take advantage of the services on the Web. But where and how do you find these services?

Finding Verification Services on the Web

The first task is to find these on-line services. Fortunately, it's easy to locate a handy list of validation checkers. Using the Yahoo search engine (http://www.yahoo.com), search on the key Validation Checkers and choose the match with the label Computers and Internet:Software:Data Formats:HTML:Validation Checkers. Figure 19.1 gives you an idea of the results you get this way.

Figure 19.1 : Yaho maintains a list of validation checkers on the Web.

The other search engines available on the Web can also be used to locate validation checkers. None of them provides the kind of precision the Yahoo list does, however. You should try a variety of keywords, such as html, URL, verification, and service, in addition to validation and checker. Use various combinations. From time to time, new validation or verification checkers will appear on the Web, and it is difficult to predict the keywords it will take to find them.

Table 19.1 lists four of the most popular verification services. Each of these will verify HTML source on the Web. Although all four perform similar functions, they provide subtle differences in their reports. All will be discussed in this chapter in the major sections. Other verification services available on the Web will also be described, more briefly, in the section titled, "Using Other Verification Services on the Web."

Table 19.1  Four Popular Verification Services on the Web

Service NameURL
WebTechshttp://www.webtechs.com/html-val-svc
Kinder-Gentlerhttp://ugweb.cs.ualberta.ca/~gerald/validate.cgi
Weblinthttp://www.khoros.unm.edu/staff/neilb/weblint.html
Doctor HTMLhttp://imageware.com/RxHTML.cgi

Using the WebTechs Verification Service

WebTechs was formerly HALSoft, and remains a standard for on-line verification. The WebTechs tool checks HTML. It validates a single page or a list of pages submitted together, and it lets you enter lines of your source directly. WebTechs is located at http://www.webtechs.com/html-val-svc.

On some Web pages, you may have seen a yellow box like the one shown in the margin. It indicates that the HTML on the Web page has passed WebTechs validation tests. Although getting this icon isn't exactly the same as winning an Oscar, it indicates that the person who developed the page knows his or her stuff.

When your page passes the test, the validation system itself gives you the graphic. It comes with some HTML code that makes the graphic link to the WebTechs site.

So how do you go about getting this bit of public recognition? The path starts with turning your Web browser to the appropriate site, as shown in table 19.2.

Table 19.2  WebTechs Validation Server Sites

LocationURL
North Americahttp://www.webtechs.com/html-val-svc/
EUnet Austriahttp://www.austria.eu.net/html-val-svc/
HENSA UKhttp://www.hensa.ac.uk/html-val-svc/
Australiahttp://cq-pan.cqu.edu.au/validate/

After you enter the appropriate site and start the service, you see a form similar to the one shown in figure 19.2. On this form, you can have your Web page (or bits of HTML) checked for conformance in a matter of seconds. You instantly get a report that lays out any problems with the HTML source.

Figure 19.2 : Check your Web page by using this WebTechs HTML Validation Service form.

Note
WebTechs changes the appearance and layout of this form from time to time. In particular, the last radio button in the first row is quite likely to change. For a time, it was HotJava as shown here. It has since been changed to "SQ," for SoftQuad's HoTMeTaL Pro Extensions. In the future it will probably be used to specify other sets of HTML extensions as well. These changes do not affect the use of the form.

Incidentally, if you have many pages to maintain, you can add some HTML to each page to save you time and work. The code in listing 19.1 adds a button labeled "Validate this URL" to your page. Whenever you update a page, all you have to do is click on the button instead of opening up the WebTechs URL. Table 19.3 gives the possible values for each of the variables.


Listing 19.1  Add This HTML to Provide a Button That Automatically Submits Your Page for Validation
<FORM METHOD="POST" ACTION="http://www.webtechs.com/cgi-bin/html-check.pl">
<INPUT NAME="recommended" VALUE="0" TYPE="hidden">
<INPUT NAME="level" VALUE="2" TYPE="hidden">
<INPUT NAME="input" VALUE="0" TYPE="hidden">
<INPUT NAME="esis" VALUE="0" TYPE="hidden">
<INPUT NAME="render" VALUE="0" TYPE="hidden">
<INPUT NAME="URLs" VALUE="http://www.foo.com/goo.html" TYPE="hidden">
<INPUT NAME="submit" VALUE="Validate this URL">
</FORM>

Caution
Remember to replace "http://www.foo.com/goo.html" with the proper address for the page on which this button is placed!

Table 19.3  Values for the Variables Used in Setting Up the Validate This URL Button

VariableMeaning Range of Settings
recommended Type of checking0 = standard, 1 = strict
level Level of DTD to use2, 3, or Mozilla
input Echo HTML input0 = don't echo, 1 = echo
esis Echo output of parser0 = don't echo, 1 = echo
render Render HTML for preview0 = don't render, 1 = render
URLs Full declaration of URL 

Note
WebTechs refers to the Netscape extensions as Mozilla. WebTechs does not specify a "level" variable for HotJava, Internet Explorer, or any other DTD beyond those shown for HTML 2.0, HTML 3.0, and Netscape. Should they add more variables, you will find them by clicking on the hyperlink "About the HTML Validation Service," and then looking under the heading, "How do I add the 'Validate this URL' button to each of my pages?"

Selecting the Level of Conformance

When you arrive at the WebTechs HTML Validation Service, you may want
to set some options. WebTechs lets you specify the level of conformance for the test. That is, you can test a document for conformance to the HTML 2.0 Specification, the HTML 3.0 Specification, the Netscape Document Type Definition (DTD), or some other DTD. The radio buttons marked Level 2, Level 3, and Mozilla, respectively, indicate these different specifications (see fig. 19.3.) As noted before, the identity and use of the fourth radio button on this row changes from time to time. You can select only one radio button at a time.

Figure 19.3 : Use the radio buttons to tell WebTechs what kind of HTML is in your sources.

These radio buttons tell WebTechs which DTD to use in checking your page. Successful choice of DTD requires that you understand how WebTechs works, as I will explain in the next few paragraphs.

WebTechs is actually an SGML parser. As such, it requires a DOCTYPE declaration in the first line of any document it checks; this declaration tells it which DTD to use. However, Web browsers aren't SGML parsers and ignore a DOCTYPE declaration when they find one. As a result, most Web documents do not include DOCTYPE. By selecting a radio button, you instruct WebTechs to respond as though the corresponding DOCTYPE were at the beginning of your page-if no DOCTYPE declaration is in the document when it opens.

If WebTechs finds a DOCTYPE declaration in your source when opened, it uses that declaration and ignores the radio buttons. It will also ignore the check box, provide you with the correct options settings, and (if your source passes) it will provide you with the correct validation icon.

Perhaps you don't actually know the DOCTYPE declared in your document or the species of HTML contained in it. If you select an inappropriate button, you could get a list of errors relating to a standard you perhaps didn't think applied.

Tip
You should look at the first line of your HTML source to see what's there before you try to validate a page.

A more serious problem occurs when the DOCTYPE declaration in your document is not one that WebTechs recognizes. WebTechs can generate an enormous number of spurious errors. Be sure that your DOCTYPE declaration is correct, if you have one. The correct syntax for the declaration is

<!DOCTYPE HTML PUBLIC "quoted string">

The "quoted string" is the part that WebTechs must know. WebTechs lists the strings it recognizes in its public identifier catalog. Here are the four you are most likely to need:

"-//IETF//DTD HTML 2.0//EN"
"-//IETF//DTD HTML 3.0//EN"
"-//Netscape Comm. Corp.//DTDHTML//EN"
"-//Sun Microsystems Corp.//DTD HotJava HTML//EN"

Tip
The WebTechs public identifier catalog is well hidden. You will find it at this URL:
http://www.webtechs.com/html-tk/src/lib/catalog.

These strings must appear just as they do here, including capitalization and punctuation. The DOCTYPEs are not even necessarily the same as the "official" public identifiers for their respective DTDs.

Caution
Some popular HTML editors automatically insert DOCTYPE declarations into documents. You may have to edit or remove such declarations before trying to validate your page. In some cases, the editor inserts a DOCTYPE that indicates the HTML complies with the 3.0 specification, even though this is not true. In other cases, the editor includes information in the DOCTYPE that confuses WebTechs about which DTD to use, causing your validation to fail.

Understanding Strict Mode

In the WebTechs HTML Service, a check box marked Strict appears at the beginning of the radio button row. You can use it to modify any of the radio button settings. The default is unchecked. When this item is checked, WebTechs uses a "strict" version of the DTD for the level (2.0, 3.0, Mozilla, or other choice) that you select.

In Strict mode, WebTechs accepts only recommended idioms in your HTML. This restriction ensures that a document's structure is uncompromised. In theory, all browsers should then display your document correctly. The Strict version of the DTD for each of the four specifications tries to tidy up some parts of HTML that don't measure up to SGML standards.

Unfortunately, some browsers still in common use were written when HTML 1.0 was in effect. Under this specification, the <P> tag separated paragraphs. But now <P> is a container. Suppose that you write your HTML to pass a Strict Level 2 test. You will find that an HTML 1.0-compliant browser displays a line break between a list bullet and the text that should follow it.

Tip
Don't use Strict conformance to check your pages, and don't modify your pages to comply with Strict HTML unless you are sure of the browsers that users will employ to display your page.

WebTechs provides on-line copies of the formal specifications and the DTDs for both HTML versions and for Netscape and HotJava. You can find the strict DTDs here as well. All the strict DTDs enforce four recommended idioms:

Having no text outside paragraph elements means that all document text must be part of a block container. Table 19.4 shows right and wrong according to Strict Mode. Please note that the source code on the left is different from that on the right. The difference is subtle: on the left, the paragraph containers are properly used while on the right, no paragraph containers are used at all.

Table 19.4  The Ways Strict Mode Identifies Valid Paragraph Text

Paragraphs Valid in Strict Mode Paragraphs Not Valid in Strict Mode
<HTML> <HTML>
<HEAD> <HEAD>
<TITLE>Passes Strict Test</TITLE> <TITLE>Fails Strict Test</TITLE>
</HEAD> </HEAD>
<BODY> <BODY>
<P>First Line</P> First Line
<P>Veni, vidi, vici.</P> Veni, vidi, vici.
<P>Last Line</P> Last Line
</BODY> </BODY>
</HTML> </HTML>

Why is this important? Browsers that are HTML 2.0 or 3.0 compliant will display both examples in table 19.4. In the case on the left, each paragraph container of text will be shown on a separate line on the screen, with one line space before and after the text. In the case on the right, all the text will be shown on a single line.

In addition, both examples will pass a simple HTML 2.0 or 3.0 validation by WebTechs. Only the one on the left will pass a Strict Mode validation, however.

It might seem desirable to always use the Strict Mode, to ensure that browsers will always correctly interpret your source code and display your page the way you intended. However, as noted before, the container elements required to pass Strict Mode may cause HTML 1.0 compliant browsers and text browsers to display your page in ways that you never anticipated.

Even if you know that your page will not be accessed by any HTML 1.0 compliant browsers, you may still not want to use Strict Mode for checking. Table 19.5 shows the container elements for Strict HTML 2.0 and additional elements for Strict HTML 3.0. WebTechs rejects any others when the Strict level of conformance is chosen. If you are using extensions that provide container elements other than these, your source code may not pass a Strict test. This does not mean the code won't be readable to browsers, it just means it didn't pass the test. You will then spend time, maybe a lot or maybe a little, checking the error report from WebTechs line by line looking for the guilty party-without success.

Table 19.5  Container Elements under Strict HTML Rules

Valid under Strict HTML 2.0Add These Elements for Strict HTML 3.0
<P>[vb]<BLOCKQUOTE>[vb]<PRE>
<TABLE>
<DL>[vb]<UL>[vb]<OL>
<FIG>
<FORM>
<NOTE>
<ISINDEX>
<BQ>

What rule of thumb should you draw from all this? Just one: When deciding whether to test with Strict Mode selected, be guided by the KISS Principle (Keep It Simple, Simon).

Selecting Responses to Include in Report

After you have set the level of conformance that you want to establish for your source code, WebTechs gives you some options about the report contents, as shown in figure 19.4.

Figure 19.4 : The report options determine what you see in the report from WebTechs.

The basic report that WebTechs sends to you will be either a message that says Check Complete. No errors found or an actual list of the errors found. The errors, of course, reflect the Level of Conformance you chose in the first row of check boxes. Under some circumstances, you can get an erroneous No errors found message. The options you select for the report can help you spot these errors and help you make sense out of the error listing.

The error listing that WebTechs returns refers to error locations by using line numbers. If you check the box by Show Input, your HTML source follows the error list, with line numbers added. This report can be very helpful.

You can get additional help in interpreting the error listing by selecting Show Parser Output. This option appends a detailed list of the way WebTechs parsed your source.

Finally, by selecting Show Formatted Output, you can have WebTechs append what it tested, formatted according to the DTD you chose. This report is useful in case you enter an URL incorrectly. If you do, WebTechs gets an error message when it tries to connect to that URL. WebTechs handles some of these messages well, but not all of them. In particular, if a typo causes an Error 302 ("Document moved") message to be returned, WebTechs parses the error message and returns the report Check Complete. No errors found. If you checked Show Formatted Output, you see the actual Error message in addition to the incorrect report and therefore avoid being tricked into thinking it was your page being validated.

Testing an Entire URL or Pieces of HTML Source

If you have an existing page on the Web that you want to test, enter the URL in the text box below the banner "Check Documents by URL." In fact, you can test several documents at the same time. Just enter the URLs, one per line, including the http:// part.

Caution
If a file has many problems, the SGML parser stops after about 200 errors. This means the validation service stops as well and will not validate any remaining URLs.

If you want to test only a section of HTML source, you can paste it into the text box provided for this purpose. In either case, WebTechs applies the Level and gives you the Responses you specified in the preceding sections.

Note
WebTechs is probably the most comprehensive of the verification services on-line. Its reports can also be the most difficult to understand. For that reason, you should become familiar with the FAQ (Frequently Asked Questions) File for the service. This tool is maintained by Scott Bigham at http://www.cs.duke.edu/~dsb/wt-faq.html.

Using Doctor HTML

Although using WebTechs is an excellent way to verify that your HTML source is everything it should be, WebTechs does not check your links. Most of the systems designed to check links run only on your own server. If you don't have a server, fortunately, you can use Doctor HTML.

Doctor HTML is different from the other tools addressed in this chapter. To begin with, it examines only Web pages; it won't take snippets of HTML for analysis. But it also provides services not found in the other tools.

Doctor HTML performs a number of functions, as you can see in figure 19.5. Some of these functions overlap with the other HTML verifiers. But the most important reasons for using Doctor HTML are to get verification that all the hyperlinks in your document are valid and to get specific advice concerning optimization of your page performance.

Figure 19.5 : You can use this from to order Doctor HTML's tests.

Doctor HTML is located at http://imagiware.com/RxHTML.cgi. The Doctor performs a complete Web site analysis, according to your specifications. The strengths of the program are in the testing of the images and the hyperlinks, functions not found in other verification services. Be sure to read the test descriptions; no separate FAQ is available.

The Doctor provides you with a report that is built "on-the-fly." It contains one section for each test you specified, and a summary. You are presented with the summary first, and from it you may select the individual test report sections. As an example, the three figures that follow are individual test report sections. These were returned in response to the request in figure 19.5 for examination of the Macmillan Information Superlibrary ™ on the Web.

Testing Links

The hyperlinks test checks for "dead" hyperlinks on your page. The resulting report indicates whether the URL pointed to is still present or if the server returns an error, as shown in figure 19.6. The report also tells you how large the destination URL is; if you get a very small size for a destination, check it by hand to determine whether the server is returning an error message.

Figure 19.6 : This typical report from Doctor HTML describes the hyperlinks found in a document.

Note that just because the report says the link is apparently valid, the page pointed to is not necessarily what it was when you set up the link. You should use the URL-Minder service described in "Using Other Verification Services on the Web" to track changes to the pages your links identify. The Doctor uses a 10-second time-out for each link test; slow links may time-out, and you will have to test them individually.

Fine-Tuning Page Performance

To tweak your page performance, you get maximum results from fixing image syntax, reducing image bandwidth requirements, and making sure that your table and form structures are right. The Doctor provides a wealth of information in all these areas.

This special report identifies images that take an excessive amount of bandwidth and that load slowly, as shown in figure 19.7. It also gives the |specific image command tags to set to improve overall page performance (see fig. 19.8).

Figure 19.7 : Doctor HTML's report on images is helpful in identifying any picture that is slowing down your page.

Figure 19.8 : These image command tags require resettings, according to Doctor HTML.

Using Kinder, Gentler Validator

The Kinder, Gentler Validator (sometimes called simply KGV) is a newer tool for validating HTML source and Web pages. You will find KGV at http://ugweb.cs.ualberta.ca/~gerald/validate.cgi. It provides informative reports, even pointing to the errors it detects. Figure 19.9 is an example of just how helpful KGV can be.

Figure 19.9 : This figure shows an example of the helpful reports provided by the Kinder, Gentler Validator.

While KGV's reports are easier to interpret than WebTechs, you should obtain the KGV FAQ which explains the more impenetrable messages that still appear.

Note
The FAQ for Kinder, Gentler Validation by Scott Bigham is at http://www.cs. duke.edu/~dsb/kgv-faq.html.

KGV is very similar in some respects to WebTechs; both of them completely parse your HTML source code. Both obey the rules of the HTML language definition to the letter and both are based on James Clark's SGML parsers.

But there is at least one big difference. KGV expects that your document will either be HTML 2.0 conformant, or that it will have a DOCTYPE declaration on the first line. If KGV doesn't find a DOCTYPE, it assumes the document is supposed to be 2.0 conformant. No nice row of radio button selections here!

KGV also has a public identifier catalog, located at http://ugweb.cs. ualberta.ca/~gerald/validate/lib/catalog. This is a longer and more complete public identifier list than WebTechs. All the warnings given under the WebTechs description about using the correct DOCTYPE and about spelling errors apply to KGV as well.

The interface for KGV (see fig. 19.10) is a bit simpler than the one for WebTechs, as you might expect. You have the option to include an analysis by Weblint, another verification tool that is discussed in the next section of this chapter.

Figure 19.10: The Kinder, Gentler Validator interface is simple but complete.

Notice that KGV provides two additional types of output. These may be helpful when dealing with difficult problems. Show Source Input displays the HTML text with line numbers. Show Parse Tree shows you how KGV parses your file. These are similar to WebTechs options "Show Input" and "Show Parser Output."

Finally, Kinder, Gentler Validator provides an icon when your source code passes its test, just like WebTechs. You can paste the snippet of code that KGV provides into your document so that all who view it know you build righteous HTML.

Using Weblint

Weblint takes a middle ground with HTML verification. One of its strengths is that it looks for specific common errors known to cause problems for popular browsers. This makes it a heuristic validator, as opposed to KGV and WebTechs which are parsers. "Heuristic" simply means that it operates from a set of guidelines about HTML style.

Weblint performs 22 specific checks. It is looking for constructs that are legal HTML but bad style, as well as for mistakes in the source code. Here is the list, as shown by UniPress (Weblint's publisher) for Weblint v1.014:

On the other hand it misses some outright errors from time to time. One reason that KGV offers the option of showing Weblint's findings about a Web page is to provide style feedback that WebTechs is missing. If you routinely use WebTechs, you should make it a habit to also run your page by Weblint. Or switch to KGV and always take the Weblint option. By using both a parser and a heuristic verifier, you will spot many problems that would otherwise be missed if you used only one or the other.

You can access Weblint on the Web in three places. One is http://www.unipress.com/weblint/; this is the publisher's site. Another is http://www.khoros.unm.edu/staff/neilb/weblint/lintform.html. Figure 19.11 shows the latter interface. Finally, a Weblint Gateway has recently been opened to provide a very streamlined way to obtain verification of your Web page: http://www.cen.uiuc.edu/cgi-bin/weblint.

Figure 19.11: The Weblint interface is another simple design; you may enter either an URL or HTML code.

With Weblint, like WebTechs, you can either submit the URL of a page to be verified, or enter HTML directly into a text box. You have the options in the reports of seeing the HTML source file (automatically line-numbered) and to view the page being checked. You also can have either Netscape or Java extensions checked.

Like KGV, Weblint reports from either Web site are easy to understand (fig. 19.12). However, the reports are not as comprehensive as those provided by WebTechs or KGV.

Figure 19.12: Weblint provides an easy-to-read, brief report.

Using an All-in-One Verification Page

Wouldn't it be nice if you could do all your verification from one place, instead of having to run from one verification site to another? Well, you nearly can. Harold Driscoll, Webmaster for the Chicago Computer Society, has assembled a page at http://www.ccs.org/validate. This page will save you a lot of work (see fig. 19.13).

Figure 19.13: The Chicago Computer Society's Suite of HTML Validation suites.

The Suite of HTML Validation Suites page includes forms that check your page using the three most popular validation services (Kinder, Gentler Validation, Weblint, and WebTechs). Fill in the URL you want checked, and select the switch settings you want. The page returns all your reports in the same format.

WeblintIn addition, this one-stop service includes forms for several other tools. A spell checker (WebSter's Dictionary) returns a list of any words that it does not recognize on your page. The Lynx-Me Text Browser Display shows you what your Web page looks like to viewers using the text browser Lynx. The HTTP Head Request Form and a form titled Display Typical CGI Environment Strings can help when you are writing and debugging CGI programs and scripts. And finally, another form makes it easy to register with the URL-Minder service (see the section on URL-Minder in "Using Other Verification Services on the Web).

Troubleshooting
I notice that each verification service seems to report different problems when I submit the same URL to all of them. What can I do about this?
Always use a combination strategy when checking an URL. That is, use one of the syntax checkers (WebTechs or KGV, but not both) and one of the heuristic checkers (Weblint or its alternate at the U.S. Military Academy, described in the next section). By using both types of checkers, and only one of each, you will cut down on the apparent contradictions. Consistency in the way you do your checks is very important.
Where can I find an explanation of the error messages in WebTechs and KGV reports?
Both of these verifiers use the error messages provided by their SGML parsers. The most comprehensive list and explanation is in the FAQs by Scott Bigham referred to in previous sections.

Using Other Verification Services on the Web

It pays to look for other verification services; a large number of them are on the Web. Perform a search on the keywords verification service, or use other search tools besides Yahoo. I found the services in table 19.6 this way.

I use these services mainly as a backup. The more popular services are sometimes busy, and you can't get onto them. The Slovenian site for HTMLchek, Brown University, Harbinger Net Services, and the U.S. Military Academy, all discussed in this section, are good alternatives.

Finally, the URL-Minder service can be a true blessing to the person with too many links to maintain. It provides you with a way to know when a change occurs to a page that one of your own pages references.

Table 19.6  Other Verification Services on the Web

Service NameURL
Slovenian HTMLchekhttp://www.ijs.si/cgi-bin/htmlchek
U.S.M.A. (West Point)http://www.usma.edu/cgi-bin/HTMLverify
Brown Universityhttp://www.stg.brown.edu/service/url_validate.html
Harbingerhttp://www.harbinger.net/html-val-svc/
URL-Minderhttp://www.netmind.com/URL-minder/example.html

Using HTMLchek

HTMLchek is an interesting tool put together at the University of Texas at Austin. However, the on-Web version is offered by someone at a site in Slovenia (http://www.ijs.si/cgi-bin/htmlchek).

HTMLchek does syntax and semantic checking of URLs, against HTML 2.0, HTML 3.0, or Netscape DTDs. It also looks for common errors. It is another heuristic verifier and can be used as an alternative to Weblint.

HTMLchek returns reports that are not as well-formatted or easy to read as Weblint's. However, they report approximately the same kinds of problems, to the same level of detail. There is no FAQ file for the Slovenian site, but full documentation is available for download at http://uts.cc.utexas.edu/~churchh/htmlchek.html.

Using the U.S. Military Academy's Verification Service

Figure 19.14 shows the HTMLverify service offered by usma.edu (that's the U.S. Military Academy at West Point, in case you aren't an alum). The URL for the service is http://www.usma.edu/cgi-bin/HTMLverify. You can enter the URL of your page, or you can paste HTML source into the window. The system checks whatever you enter or paste against plain-vanilla HTML 2.0 standards alone. You can choose to have it include a check against the Netscape extensions as well.

Figure 19.14: HTMLverify is a basic HTML verification service, offered by the U.S. Military Academy at West Point.

HTMLverify is actually an interface to a modified version of Weblint, so it's another heuristic checker. If you enter an URL in the first text box and then click the Verify button at the bottom of the form, you get a report of any problems Weblint found with the HTML source. This report may look something like the one shown in figure 19.15. As with any automatically generated report, not every error reported is really an error. However, the report does generate a worthwhile list of items.

Figure 19.15: The HTML Verification Output from HTMLverify for the Web Page indicates a few problems with the soruce.

Using Brown University's Verification Service

Brown University's Scholarly Technology Group (STG) maintains a verification service at http://www.stg.brown.edu/service/url_validate.html. This is about as simple an interface as you will see anywhere. It consists of a text box, where you enter the URL to verify. You select the DTD to use from a pull-down list; this includes Netscape 1.1 (default), HTML 2.0, HTML 3.0, and TEI Lite. You can check a box to ask for a parse outline, and then you click the Validate button.

The output is similar to WebTechs for the level of obscurity, but it seems to be complete. It is very fast. Like WebTechs, the STG's service is a parser. It would be a good alternative to WebTechs or to KGV. There is no FAQ.

Using Harbinger's Verification Service

This is a site where WebTechs HTML Check Toolkit has been installed and made available to the Web. The interface is an exact duplicate of the WebTechs site. The use of the tool and the reports it returns are also exactly the same in every respect.

This service was formerly located at Georgia Tech, but moved with Kipp Jones to Harbinger Net Services. You will find the verifier at http://www.harbinger.net/html-val-svc/.

Using URL-Minder

This isn't exactly a verification service, but it can be a great help to you in keeping your links updated and the dead links pruned. The URL-Minder service notifies you whenever there is a change to URLs to which you have embedded links on your page. You register your e-mail address and the other pages with URL-Minder at http://www.netmind.com/URL-minder/example.html. (This address also takes you to a complete description of the service.) The service sends you e-mail within a week of any changes to the pages you specify.

You can also embed a form on your page that readers can use to request notification from URL-Minder whenever your page changes. You can set this up so that customers get either a generic message or a tailored one.

Troubleshooting
I get so many errors from some of these verification services, where should I begin fixing problems?
Most of the verifiers will return more than one error statement for each actual error. In addition, if there are a lot of errors, the verifier may become confused. The best strategy is to fix the first few problems in the report, then resubmit the URL or source code for checking. This tends to very quickly reduce the number of errors reported.
I'm really having trouble understanding these terse error statements. Where can I get help?
If the verifier offers the option, try running in "pedantic" mode. This will give you longer explanations.

Installing the WebTechs HTML Check Toolkit Locally

The WebTechs Validation Service is the definitive HTML-checker on the Web. A version of the software has always been available for installation on local servers, but it wasn't always easy to obtain. It also was not easy to install successfully.

WebTechs has solved these problems with its HTML Check Toolkit. WebTechs now offers an interactive on-line service whereby you specify the type of operating system you are running, the directories in which the software is to be installed, and the type of compressed tar file you require. WebTechs server will build a toolkit tailored to these specifications and download it to you. It also builds a set of installation and testing instructions tailored to your system.

To install and use the toolkit, you need about 500K of disk space, and one of the following 24 operating systems (others are being added):

To obtain the toolkit, go to the WebTechs home page at http://www.webtechs.com and choose the link "HTML Check Toolkit." From that page, after reading any updates to the information you see in this book, choose "Downloading and Configuration." You're on your way to HTML verification from the comfort and convenience of your own server. When you are finished, you will be able to type html-check *.html and get a complete validation of your HTML files.

Obtaining and Installing Other Verification Suites

You can download three of the other tools discussed in this chapter and install them on your own server. There are a number of others tools available as well. Several of these are listed in table 19.7.

Table 19.7  Verification Tools Available from Web Sites to be Run on Your Server

ToolFunction Source
WeblintChecks syntax and style http://www.khoros.unm.edu/staff/neilb/weblint.html
HTMLChekSyntax checker http://uts.cc.utexas.edu/~churchh/htmlchek.html
HTMLverifyWeblint interface http://www.usma.edu/cgi-bin/HTMLverify
MOMspiderRobot link maintainer http://www.ics.uci.edu/WebSoft/MOMspider
WebxrefCross-references links http://www.sara.nl/cgi bin/rick_acc_webxref
Verify Checks validity of http://wsk.eit.com/wsk/dist/doc/Web Linkslinksadmin/webtest/verify_links.html
IvrfyHTML link verifier http://www.cs.dartmouth.edu/~crow/Ivrfy.html

Tip
In most cases, Frequently Asked Questions (FAQ) or README files accompany the scripts for these programs.

Nearly all of these are perl scripts but not all require that your server be running under UNIX. For example, HTMLchek will run on any platform for which perl and awk are available, including the Mac and MS-DOS.

After you download and install the script for the program of your choice, your server can run your maintenance program for you. Most of programs run from the command line and report directly back. Some of the tools will e-mail the reports to you or to whomever you designate.

Obtaining and Using Weblint

Weblint is available at no charge via anonymous ftp from ftp://ftp.khoral.com/pub/weblint/, as a gzip tar file or a ZIP archive for PC users. The tar file (weblint-1.014.tar.gz) is 46K, the ZIP file (weblint.zip) is 53K. Neil Bowers <neilb@khoral.com> is the owner of the program and welcomes your comments, suggestions, and bug reports.

The program is also supported by two e-mail lists. Announcements for new versions are made via weblint-announce@khoral.com. Discussions related to Weblint and prerelease testing are carried on via weblint-victims@khoral.com. E-mail Neil Bowers to be added to either list, or to obtain details of system requirements for Weblint.

Obtaining and Using HTMLchek

HTMLchek, when run on your own server, will perform more functions than the version available over the Web. Specifically, it will check the syntax of HTML 2.0 or 3.0 files for errors, do local link cross-reference checking, and generate a basic reference-dependency map. It also includes utilities to process HTML files; examples include an HTML-aware search-and-replace program, a program to remove HTML so that a file can be spell-checked, and a program that makes menus and tables of contents within HTML files.

HTMLchek runs under perl and awk but is not UNIX-dependent; it can be run under any operating system for which awk and perl are available. This would include MS-DOS, Macintosh, Windows NT, VMS, Amiga, OS/2, Atari, and MVS platforms.

HTMLchek is available at no charge via anonymous ftp (use your e-mail address as password) from ftp://ftp.cs.buffalo.edu/pub/htmlchek/. The files are available as htmlchek.tar.Z, htmlchek.tar.gz, or htmlchek.zip. Download the one that suits your platform. The documentation can be browsed on line over the Web from http://uts.cc.utexas.edu/~churchh/htmlchek.html. Other ftp sites from which the program can be obtained are listed in the documentation, under the heading, "Obtaining HTMLchek." These alternatives include the Usenet (comp.sources.misc archives), Uunet, and one site in Germany.

HTMLchek is supported by the author, H. Churchyard, at <churchh@uts.cc.utexas.edu>.

Obtaining and Using HTMLverify

Erich Markert, the webmaster at the Academy, has authorized downloading of the perl CGI script for HTMLverify. All you need do is click the button marked "Source" at the bottom of the HTMLverify form (http://www.usma.edu/cgi-bin/HTMLverify) to obtain the perl script. Clicking the "About" button will bring you the details of installation.

In addition to the source code for HTMLverify, you will need perl 5, Lynx version 2.3.7, Weblint (Markert offers his modified version), Lincoln Stein's CGI Module, and Markert's HTML module. All of these except for Lynx are available from the USMA site.

HTMLverify may be the easiest of all the verification checkers to obtain and install.

Obtaining and Using MOMspider

MOMspider is a freeware robot designed to assist in the maintenance of distributed hypertext infostructures. When installed, MOMspider will periodically search a list of webs provided by you. It looks for four types of document change: moved documents, broken links, recently modified documents, and documents about to expire. MOMspider builds a special index document that lists these problems when found, plus other information you requested. MOMspider will report directly to you or by e-mail to any address you provide.

MOMspider requires perl 4.036 and runs on UNIX-based systems. You will need to customize the perl script for your site. You obtain MOMspider, with installation notes, configuration options, and instruction files, from http://www.ics.uci.edu/WebSoft/MOMspider. You can also obtain it via anonymous ftp from ftp://ftp.liege.ics.uci.edu, in the directory /pub/arcadia/MOMspider. A paper describing the MOMspider and its use can be obtained from http://www.ics.uci.edu/WebSoft/MOMspider/www94/paper.html.

Obtaining and Using Webxref

Webxref is a perl program that makes cross-references from an HTML document and the HTML documents linked from it. It is designed to provide a quick and easy check of a local set of HTML documents. It will also check the first level of external URLs referenced by the original document.

When the program has run, it prints a list, with direct and indirect references, of items it found in the file in 17 different categories, including:

You can download Webxref directly from the author at http://www.sara.nl/cgi-bin/ric_acc_webxref. The author is Rick Jansen and you can contact him by e-mail at <rick@sara.nl>.

Obtaining and Using Ivrfy

Ivrfy is a freeware shell script that verifies all the internal links in HTML pages on your server. It also checks the inline images in the documents. Ivrfy is slow; the author reports that it can process 10,000 links to 4,000 pages in an hour and a half on a Sparc 1000 with dual 75MHz CPUs.

Ivrfy assumes that you have five programs in your path: sed, awk, chs, touch, and rm. Obviously this means this is a UNIX-only program. Ivrfy is not secure and should not be run as root. The script requires customization, to specify the name of the server in use, the server's root directory, and three other variables. These are all identified in the README found on the Ivrfy Web page.

Ivrfy is executed from the command line. It reports back the links for which pages were successfully found, those for which the links are broken, and those for which the link was an HTTP link to another server. Broken links include nonexistent pages, unreadable pages, and server-generated index pages. There are a few known bugs and these are all listed in the README.

Download the Ivrfy script from http://www.cs.dartmouth.edu/~crow/Ivrfy.html. The author, Preston Crow, can be reached by e-mail at <crow@cs.dartmouth.edu>.

Obtaining and Using Verify Web Links

Enterprise Integration Technologies Corporation is in the process of developing a Webtest tool suite for its Web Starter Kit. One part of this suite is a link verifier for use by server administrators. It will aid in maintaining links within documents managed at a site. The link verifier tool starts from a given URL and traverses links outward to a specified limit. The verifier then produces a report on the state of the discovered links.

In its present form, the link verifier verifies only http: HREFs in SRC, A, FORM, LINK and BASE tags. It does not verify non-HTTP links (gopher, ftp, file, and so on). This is planned for the future. The verifier will exercise links to remote servers, but it does not attempt to examine the contents of the documents on those servers. Among other interesting features, the verifier can send reports to the administrator by e-mail, and will verify form POST actions. The tool does try to use bandwidth well; it uses HEAD requests on image data and remote documents.

The link verifier tool can be downloaded by anonymous ftp from ftp://ftp.eit.com/pub/wsk/<OS_TYPE>/webtest/verify_links.tar. The <OS_TYPE must be one of the following: sunos (for 4.1.3), solaris (for 2.3), irix, aix, or osfi. No other platforms are supported at this time. A description of the tool is available at http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html.

Public Recognition for Quality Web Pages

All the verification services discussed to this point in this chapter ensure that your Web page makes sense to the Web browsers. Using valid HTML is only one of several factors in creating a quality Web page. What makes a great page is valid HTML plus outstanding content, attractive presentation, elegant layout and style, and a certain je ne sais quoi. To master these elements, considering what other Webmasters have done to create exemplary Web pages is very useful. The various recognition services can be of great use in this area.

Several recognition services appear on the Web. Many of them seem to focus on identifying the "cool" sites. Some of the "cool" sites have so many awards, plaques, badges, and other meritorious graphics displayed that they appear to have had a plate of fruit salad spilled on them.

Being cool is fine, but not necessarily a sign of quality that endures and attracts customers with money to spend (if that is your aim). Being cool is a fashion statement for the day, and perhaps you are looking for something a little more enduring. Finally, being cool and being worth a second read may be different concepts.

So how does a Web author who aspires to quality, like yourself, find the paragons of taste and utility? Two awards have distinguished themselves for their ability to pick enduring winners. They are High Five and Point Top Five. You can and should study sites that have received these honors, with the confidence that such sites had to meet extraordinarily stringent standards.

High Five

You may have seen this icon on a few especially elegant pages on the Web. The High Five Awards Committee gives this plaque to one well-designed site a week. Any site that displays this icon has been selected on the basis of design, conception, execution, and content, with an emphasis on clear information design and aesthetics. High Five (http://www.highfive.com) is sponsored by David Siegel and sustained by the efforts of his six interns. They reside in Palo Alto, California.

No matter who the other person or persons on the High Five Awards Committee may be, the guiding light is David Siegel. David is a type designer, typographer, writer, and Web site designer. He has some very definite ideas about what is good in Web site design and what is not.

Because David is a graphic designer, you will find that his ideas about quality are different from what many HTML mavens define as quality. For example, many SGML and HTML purists don't much care for Netscape. David believes that Netscape lets him do more of the things he wants to do. He does not feel obligated to make pages that are optimized for all browsers.

As a technical person who has also been a calligrapher for many years, I like what David Siegel does in his page designs. Before you make up your mind about Siegel's philosophy, take a look at the pages that receive the High Five. Let your eyes tell you what they like instead of being guided solely by what the HTML rulebook says.

Spending some time on David Siegel's Web site, the Casbah at http://www.dsiegel.com/, would be well worth your while. David provides an informative set of Tips for Writers and Designers, which includes some invaluable help with layout via the "single-pixel GIF trick." You will also like his tip on using images well.

Understanding High Five's Criteria

If you look through David Siegel's gallery of past winners, you are going to see some beautiful, effective Web pages. To understand why they work and how to make yours look like them, consider the three High Five criteria. High Five awards a perfect page five points in each of the following categories: Degree of Difficulty, Execution, and Aesthetics.

These three criteria have equal weight (in theory), and they are all subjective. You may want to read the critiques of past winners to get a handle on the meaning of each term and what each one contributes to the final appear-ance of a page. Reading Siegel's essays, "Severe Tire Damage," and "The Balkanization of the Web," may also help.

The High Five page itself also provides some further hints. It is pretty clear that four things will rule out a page from consideration: table borders, Netscape backgrounds, GIFs that interfere with the message, and general ugliness.

The whole point to High Five and Siegel's Web site is that you, as a designer, should not just accept the way HTML tries to get you to make your pages look. You are designing pages to be read by human beings, not by Web browsers. What a human being sees and how a human being responds to what is seen is informed by thousands of years of culture and individual experience with books and art. You aren't going to change or get past that human bias with one more example of default layout. To be successful and rise above the gray mass of most cyber-publishing, appeal to the aesthetics and culture of your reader.

Obtaining Recognition by High Five

You can submit your Web page to the High Five Awards Committee for consideration. The instructions are in David Siegel's Frequently Asked Questions file, and guidelines appear on the High Five page. Read them thoroughly, along with the rest of the information on the Casbah and High Five sites.

As Siegel reminds you several times, High Five is the Carnegie Hall of Web page awards. You won't get there overnight. But when you think your site is ready, submit it by sending e-mail to submissions@highfive.com. David's interns will review your site first, and if it passes their scrutiny, they will bring it to David's attention. If it also passes David's scrutiny, he will work with you to polish your page to meet his standards.

Siegel also responds to e-mail questions about page design. Read the FAQ to find out what will catch his attention.

Another, more difficult, way to be recognized is to send up to three URLs to interns@highfive.com, along with a message about yourself. If one of the sites you submit is good enough to qualify as a High Five, Siegel will also take a look at your site.

Point Top Five

You've probably seen this icon also, but on a larger number of Web sites. This icon indicates the Point Top Five Survey award. Point also maintains a set of lists of "top tens" in a number of fields.

The HTML verification services and High Five measure Web pages against particular set standards of perfection. Point takes a different approach and tries to measure Web sites with a utilitarian scale: how good is a site from the user's point of view?

Point is a fairly large Internet communications company located in New York. ("Fairly large" is a relative term; in this case it means large enough to maintain a staff of up to 24 Web site reviewers.) Point's Web Reviews give descriptions and ratings of the top five percent of all World Wide Web sites. They consider it their mission to be a guide to the "good stuff."

The home page for Point is at http://www.pointcom.com/; from there you can get to its Top Ten list and other features. One of the first things you should grab is the FAQ file, which gives all the details about Point's award system.

Unlike High Five, Point never offers a critique of your page and does not work with award winners to help improve their products. You submit your page and wait. If the page isn't reviewed and awarded, wait a few months and notify the editors when you have added new material on your page.

Understanding Point's Criteria

Although High Five looks for aesthetic perfection, Point works hard at identifying "the best, smartest, and most entertaining sites around." In addition to the large staff of reviewers, Point considers self-nominations and nominations that it receives from Web surfers to locate sites for review.

Web sites are rated on 50-point scales against three criteria: Content, Presentation, and Experience. To be more specific, here are the official descriptions:

Point reviews each page at least four times a year, and it removes sites that have fallen to lower standards. The reviewers give the Top Five award to any page that meets the excellence criteria, whether the page is commercial, private, or student-run.

Obtaining Recognition by Point

You can submit your own page for review by using the Write Us form on the home page, or you can e-mail the URL and a description of the site to submit@pointcom.com. You are notified only if you are awarded a Top Five. If you don't hear from Point, resubmit your page at a later time.

Once a page is recognized, Point places it among the other winners in its category. Newly reviewed sites also appear in "New & Noteworthy," a daily feature on Point's home page. Finally, the best of the best are added to the Top Ten lists; they are the top ten sites in each category in the Point review catalog.

Learning Standard Web Practices from Other Developers

One of the best resources you could ever hope for comes in the form of other Web developers. Many other people have been through the process of developing a Web site into a thing of beauty, value, or usefulness. When you see a Web site or a page that you really like, drop the Webmaster or the page owner a note to say how much you enjoy the creation. If you ask a polite question or two about how that author did something, you'll most likely get an answer.

You can find other Web developers in many Usenet newsgroups and mailing lists. Here are some of the best:

Newsgroups

alt.fan.mozilla
alt.hypertext
comp.infosystems.www.authoring.cgi
comp.infosystems.www.authoring.html
comp.infosystems.www.authoring.images
comp.text.sgml

Mailing Lists

HTML Authoring Mailing List (see http://www.netcentral.net/lists/html-list.html)
NETTRAIN Mailing List

You can also find plenty of pages and other features that give you good advice about page design. Here are three of the best: