1

Introduction and Overview


It's more than a little bit intimidating to take on the task of writing an intermediate-to-advanced-level book on the subject of World Wide Web programming. The current level of innovation and the rate at which new ideas and techniques are being implemented on the Web is nothing less than exponential. Each new idea or new technique has the potential for adding to or enhancing the vast structure that already is the Web, or refining that structure, making it potentially less cumbersome and easier to understand. Some ideas are adopted as standards, and others fizzle and die lonely little deaths, for lack of use or interest. Keeping up with this process and attempting to deliver a book that will give you, the reader, a feel for which of these ideas and techniques seems to be the most usable, safe, and powerful, is what this book is all about.

Now, with all of that said, we're going to refine our task (thankfully) to the exploration and discussion of Web innovations related to Perl, the general purpose scripting language which is widely used and loved by many people all over the world. We'll also generally limit this work to discussions and examples which implement the features and functionality of Version 5 of Perl, which is the latest and greatest release.

So, that would seem to carve out a nice little niche, yes? Even though this seems like a relatively small development space to confine one's research to, it in fact includes the vast majority of the Web tools and innovations which are being used out there. Perl, and specifically Perl5 is, in fact, the cornerstone of the interactive Web as it exists today. The many ways that people use Perl to implement the features and functionality of their Web pages are as dynamic and subject to innovation, enhancement, and errors, and lonely deaths, as any other component of the Web in general, and probably even more so.

"So why bother?" you might ask. If any such undertaking might be outdated and possibly even moot, by the time it is finished, then what's the point in attempting it in the first place? We'd have to answer that we've asked ourselves this question quite often in the research and preparation for this book. Maybe that fact is sufficient to justify the undertaking. Maybe not. Our primary motivation is simply that, at the present time, there just aren't a lot of books out there which explore the capabilities of Perl5 as it is used in Web programming. There are two primary reasons why this is true. The first is that Perl, as a programming language, wasn't designed with the Web in mind at all. It's been around for a long time, at least in "Internet Years," but its first and primary function was, and still is, according to its author, Larry Wall, "Text processing the UNIX way." In fact, that's where it got its name, as PERL stands for Practical Extraction and Report Language.

The first release of Perl5 just happened to coincide with the explosion of popularity of the HTTP/HTML server/protocol, and now everybody seems to have a Web page. And what's a Web page without a little CGI to give the browser some entertainment, or gather some data, or, yes, even make a sale or two. Now, there are plenty of ways to write a CGI script, but the most popular, for its ease of use, security features, and now with version 5, reusable modules, is Perl.

As we've said, the Web is dynamic, and always changing, especially in these, its early years. It's hard to believe sometimes, at least for me, but the Web is still but a toddler as a technology and protocol. There are a lot of things which are still being decided at high levels, to add functionary, reduce bandwidth, and generally enhance the specifications of the protocol which is HTTP, and CGI. Some commercial entities, like Netscape, have even taken it upon themselves to implement features and functionary in their own browsers and servers which haven't been formally adopted. This has generally been a good thing, from our standpoint, but it does fly in the face of the long-standing tradition of the discussion and adoption of new techniques/features in any protocol. Regardless, given their market share, they do seem to have some leeway.

Similarly, Perl is a dynamic entity. The new object oriented features and capabilities which come with Perl, and the tremendous amount of work which has been done to design and implement reliable "class libraries," or modules, to take advantage of these features can definitely be considered a "work in progress." Enhancements, patches, and new features are coming along almost daily. Occasionally this may lead to a bit of confusion, and sometimes incompatibility, with the current status or version of a given Perl5 module. It's sort of up to you, as a Perl user, to try to keep up with the latest changes. It's up to us, as the authors, to try to help you understand how to do that.

We don't expect this book to go through umpteen reprints, as a truly lasting bit of prose or a high-minded mathematics text might. We do expect to be able to look back when we're done and be able to say that, as of that date, we wrote about the most widely used and accepted techniques, and that we covered most of the latest and most promising developments that relate to Web programming with Perl5, minimally providing you with a means to keep up with them, their implementation, and their implementors. We sincerely hope that the concepts, techniques, and technologies which we choose to write about here will indeed be among the ones which last. But only time will tell.

So, here we go...

What This Book Is and Is Not

Most of the relevant subjects, examples, and code will be assumed to be implemented on a UNIX system. There's just not enough time, or cross-platform capability, to deal with issues specific to other architectures in depth.

What This Book Is Not

This isn't a book for "clueless newbies." We are going to assume that you are familiar with the fundamentals of Web programming, and the protocols, syntax, and conventions which comprise all of HTML/CGI, and HTTP in their latest version. There are a plethora of texts available which document this stuff, and if you're like me, you've probably bought several texts from various publishers, and read FAQs and Web-based tutorials until your eyes rolled back in your head. If not, then we will provide you with a fairly complete set of references to which you can refer to find other references, texts, and information which will refresh your memory or help you obtain the level of knowledge you will need about the above topics to fully understand the examples.

A basic understanding of Perl programming, at least with respect to datatypes, subroutines, operators, and syntax will also be assumed. Again, we'll provide suggestions for other documents and texts which will bring you up to speed on Perl programming in general, if you've not been exposed to it before, or if you need a refresher course.

We also would like for you to actually understand how the examples really work, and how you can modify them to suit your needs, when you're done reading this book. Thus, our aim is to avoid canned scripts, or examples which are specific to some particular need. We'll try to provide explanations, along with the examples, in a verbose form throughout the text, and we'll try to make the examples generic enough to fit your needs, but with some modifications.

Finally, this book is not just about Perl. Our aim is to provide a comprehensive overview of most or all of the tasks which face the typical Webmaster, or Web Team. The emphasis, where appropriate, will be on implementation of any specific task with Perl, but we'll also explore certain tasks and aspects of managing a Web site which have very little, or nothing to do with Perl. In order to make this book as complete a reference as possible, and cover important topics like security, configuration management, server configuration, and certain other tools and processes, we'll have to provide information which may have little or nothing to do with Perl. When this is necessary, we'll try to note it in the text, and get back to Perl as soon as we can.

What This Book Will Provide

Since this book will provide a large number of examples which use Perl5, we'll devote a full chapter to a tutorial and review of the overall process of implementing and using Perl5. Again, we will still assume that you're already familiar with the basics of Perl programming. As of the writing of this introduction, the brand new revision of Programming Perl (Wall, Christiansen, Schwartz - O'Reilly and Associates), the comprehensive reference for Perl programmers, is available. There are other good books, online tutorials, and other resources for those just starting out with Perl.

From what I can tell, however, there are a great number of people out there who see Perl5 only as a means to an end. I can understand this position. Not everyone can be a "Perl junkie" like me, and presumably, not everyone would want to, either. Although I can't imagine why not. :-) Regardless, we aim to teach as much of the Perl5 programming skills as are necessary to implement, use, and customize the latest and coolest tools, tricks, and techniques which are described herein.

We'll also give serious consideration to the all-important security issues (one can never stress this enough, and it will also comprise a full chapter) related to providing a Web service. These issues have, of course, been considered in many previous texts. In fact, any work which did not give consideration to these issues would be lacking at best, and dangerous at worst. Most of the examples and discussion in the security chapter will be implemented with Perl. On the other hand, we won't spend a lot of time discussing other important aspects of security which have little or nothing to do with Perl, like SSL (Secure Socket Layer) and what the little key at the bottom of the Netscape browser means, for instance.

Once we're through with the Perl5 tutorial and security review, we'll move right into the meat of the matter, and presumably, the reason you spent your money on this book, the examples. We'll try to cover each technique with an eye towards the underlying idea, or algorithm, behind it. What does it add to your Web, and the WWW in general, that wasn't there before? How does it differ from the existing implementations, both in Perl4, or in other languages? And what are the costs, if any, which you must absorb, to implement it? Why did the implementor(s) of the tool or module feel that it was important enough to spend their time in developing it?

We'll spend time covering CGI programming, of course. We'll also devote a full section to the discussion of Archivists' issues in general, and especially as they relate to maintaining a full multi-media archive which is dynamic and subject to revisions, changes, and enhancements.

Finally, we'll close with coverage of some of what we feel to be the most exciting, but also the least well developed and implemented, techniques and proposals for using Perl with the Web. Many of you will be familiar with Java, of course, but how many are aware that there is also a Perl5 interpreter available as a Netscape plugin? This is, of course, strictly a proof-of-concept implementation at this point, but it's exciting to think of the power and flexibility of having a Perl-code-aware browser. Also discussed in these final chapters will be some of the more interesting proposals for new features in the HTML/CGI language itself, which involve the implementation of embedded functionality, and abstract it to a certain degree, to include just about anything, including embedded Perl scripts.

The Layout of This Book

So, a semi-formal description is probably in order at this point, of each of the chapters that comprise this book. If you're just browsing this book, considering the purchase, this might help you to decide whether or not to buy it. Alternatively, you can refer back here at anytime for a short description of each of the chapters, beyond what is said in the table of contents.

A CPAN Overview

You'll see the CPAN (Comprehensive Perl Archive Network) referred to many times in this text, and after some consideration, we've decided that the overview of what it is, its history, how it works, and why it's so important belongs here, in the first chapter, rather than in the appendix. The CPAN, with its vast resources and tools, is the foundation of this book. We'll do very little without the help of a module, extension or other tool, including the source code for Perl itself, from the CPAN.

CPAN History

When this author first discovered Perl in 1990, there was little or no means to bootstrap oneself into the learning phase of programming Perl. The version that was available to me at that time was 3.10, and it wasn't very easy to learn, unless one was already familiar with the sed, grep, awk, and other languages, including C, along with those dreaded UNIXisms, Regular Expressions. The UNIX manpage for Perl was long and tedious to read, and made many assumptions about the level of the reader, in terms of UNIX knowledge. At that time, I resolved that if I ever had a chance, I'd try to do something about that.

When the comp.lang.perl newsgroup came along, the common folk suddenly had access to the few true Perl wizards which existed at the time, including Larry himself, along with Randal and Tom, Mark Biggar, and a few others who either worked with Larry, were related to him, or had jumped onto the Perl bandwagon very early on. This made things a little easier when one had a problem, but the Usenet protocol, at that time, was quite a bit more respected, and thus I was a bit shy about posting arbitrary bonehead questions.

Then, in 1991, the first version of the Camel was printed, by the O'Reilly company. O'Reilly was kind enough to make available, along with UUNET, the examples which were given in the text, as a single compressed tar file. Suddenly, everything got a little easier. One could purchase the Camel, and then follow along directly with the examples. This still wasn't enough to become an expert, or even accomplish any given task, but it was a starting point.

Finally, in late 1991, I co-founded one of the first Mom 'n Pop ISPs in the country, Texas Metronet. Almost as soon as I had the root password, I took the collection of scripts which I'd gathered on the newsgroup, along with some of the other things I'd seen around here and there, and created a little ftp area, with some general hierarchy, including admin stuff, networking stuff, UI stuff, and assorted stuff; and the first "organized" Perl archive available on the Net was born. Now anyone could connect to this archive, and hopefully find a little ditty which would help them accomplish the task they had at hand, or at least get them started.

Since then, there's been a mailing list formed for discussion of matters relevant to archiving Perl, and, more importantly, the efforts of many separate Perl archivists all over the Net has been unified and integrated into a single, comprehensive effort known as the CPAN.

CPAN Motivation

As you might guess, the rate of new Perl archives, and separate, distinct locations (we didn't have URLs back then) for a given work, grew rapidly. It became difficult to keep track of where this or that tool or script was located, and whether it was of any use. The most important tools which made it to the comp.sources newsgroups were generally well maintained, and available at multiple locations, but other things were quite unique. Some archives specialized in database access tools, others in documentation, and others in source code for Perl and the various ports. A unification was necessary, in order to make it simpler and more intuitive for the average newbie Perl user to get what he/she needed to implement his/her task, without reinventing the wheel.

So, due primarily to the efforts of a few tireless "Perl Packrats," including Jarkko Hietaniemi of Finland, the process of mirroring all of the existing hierarchies into a single hierarchy was set into place.

Nowadays, the CPAN is much more than simply a collection of scripts, documents, source, and miscellaneous tools and packages. It's become the de facto location for all Perl modules, and extensions, ports, patches, source code, and every other thing. There is also a dynamic, ongoing process attempting to make it easy to navigate, and relatively easy to contribute to, if you ever wish to make something you've written available on the Net.

The most important thing that the CPAN gives us is the ability to re-use code, and save effort. This lends to one of the true and fine qualities of any good Perl programmer: laziness. :-) We encourage you to explore the CPAN, and make use of it often.

CPAN Layout

Once you've connected to a CPAN archive, there's quite a bit of hierarchy to navigate, and it can be a bit intimidating at first to find your way around in there. The hierarchy is relatively well thought out, believe it or not. Let's take a look at its hierarchy, and try to give a formal description of how to get around. When you first connect to any given CPAN site, you'll have to get to the top-level CPAN directory, to start. Of course, if you're using a Web browser, you can just feed it a URL to get there. Just connect to the one closest to you, and get to the top-level CPAN directory, then follow along here.

At the top level, you'll find the following directories:
authors Directory for all individually written submissions of modules, scripts, etc.
doc Documentation and other informative bits
indices Indexes, in ls -lR format
misc Miscellaneous stuff, emacs libraries
modules specific modules by name, by category, and by author
ports ports of Perl to various architectures
scripts original scripts area, specific tools/toys
src Perl sources, Versions 4 and 5
Along with the above directories, you'll find the following files in the top level, by default:
CPAN Textual description of the archive, its sites, contributions, and maintenance.
CPAN.html HTML form of CPAN
ENDINGS Filename extensions in use within the CPAN, and what they imply, in terms of MIME and specific applications.
MIRRORED.BY Sites which mirror the CPAN, and are part of the Comprehensive Perl Archive Network, which are publicly accessible.
MIRRORING.FROM Sites from which the master CPAN site, at FUNET in Finland, mirrors to create the master CPAN hierarchy.
README Notes on the intent and principle of the CPAN from Jarkko Hietaniemi, the "Self-Appointed Master Librarian (OOK!) of the CPAN"
README.html HTML version of the README
RECENT A listing of the most recent submissions and uploads to the CPAN.
RECENT.html HTML version of the RECENT text file.
ROADMAP Simple overview of the CPAN hierarchy.
ROADMAP.html HTML version of the ROADMAP. Useful to navigate through the archive, if you don't mind waiting for repeated FTP connections through the Web browser.

As you can see, each of the files has a specific intent, and some of them provide a view into the archive.

For the sake of brevity, however, let's take a closer look at the intent of each of the top-level directories.
authors This directory is really the foundation of the newer CPAN hierarchy. It has come into being during the last couple of years, specifically for the archival of works, in any form, by specific people. All of the numerous modules and extensions which can be used with Perl5 are to be found under here. There are numerous symlinks which correspond to the full name of the authors, along with the id directory.
authors/id The layout of the authors directory is such that there exists a symlink with the name the same as the full name of the author which, in turn, points at the specific CPAN userid directory allotted to this author. The id directory, within the authors directory, contains the specific author directories. These authors/id/authorid directories are the most important in the CPAN, and form the foundation of all of the other views of the hierarchy. Thus, if you wish to obtain a module, and you know it was written by Dean Roehrich, for instance, you'd change the directory to the authors/Dean_Roehrich directory, which in turn points to authors/id/DMR dir-ectory. Dean himself controls what lives in the DMR dir-ectory through the automated features of the CPAN master site, and can automatically update the items, as he makes new releases, and delete older releases of his works.
doc The doc directory in the top level is where the various forms of documentation for Perl live. In here find all of the Perl pods, and their associated manpages, HTML, and postscript files. Also located in the doc directory are Tom Christiansen's suite of FMTEYEWTK documents, the Perl and TKPerl FAQs, the annotated reference guides, along with other presentations, slide-series, and miscellaneous bits of information related to Perl. Also here find the various pod2 converters which convert the Perl pods into other documentation formats. We'll be discussing pod later in Chapter 2. Note that this directory isn't always current with the latest release of the documentation pods which come with the latest release of Perl5. Grab those from the Perl source itself, within the pod directory.



NOTE:

FMTEYEWTK stands for Far More Than Everything You Ever Wanted To Know. Tom's suite of discourses regarding specific, usually advanced, topics related to Perl.


modules This directory is really more like a switchboard. It contains several sub-directories which provide a different sort of view to the archive's modules, which in turn contain many symlinks back to the latest (hopefully) version of whatever module you're interested in.
modules/by-author This one is simply a symlink back to the top- level authors directory.
modules/by-category This directory contains a number of subdirectories which provide you with a view to the modules by category; the directories include
02_Perl_Core_Modules/
03_Development_Support/
04_Operating_System_Interfaces/
05_Networking_Devices_Inter_Process/
06_Data_Type_Utilities/
07_Database_Interfaces/
08_User_Interfaces/
09_Interfaces_to_Other_Languages/
10_File_Names_Systems_Locking/
11_String_Processing_Language_Text_Proce/
12_Option_Argument_Parameter_Processing/
14_Authentication_Security_Encryption/
15_World_Wide_Web_HTML_HTTP_CGI/
16_Server_and_Daemon_Utilities/
17_Archiving_and_Compression/
18_Images_Pixmap_Bitmap_Manipulation/
19_Mail_and_Usenet_News/
20_Control_Flow_Utilities/
21_File_Handle_Input_Output/
22_Microsoft_Windows_Modules/
23_Miscellaneous_Modules/
99_Not_In_Modulelist/
Each of these contains symlinks back to the specific author's directory and module, as designated appropriate by the CPAN maintainer.
modules/by-module This directory contains a view of all of the Perl library directories, as they are created in @INC, when you install modules and extensions. Each of these directories, in turn, has a symlink to the appropriate version of the specific module(s) or extension(s) which populate that specific library directory. Thus, if you knew you needed the HTML::Element module, and you will later, you could look in the modules/by-module/HTML, and find the symlink, libwww-perl-5.02.tar.gz, which points back to the file: ../../../authors/id/GAAS/libwww-perl-5.02.tar.gz, which is written and maintained by Gisle Aas. Pretty nifty, eh? One copy of any given module exists at any time, but there are a number of ways to get to it, via symlinks.
ports This directory contains Perl ports, in source and binary form, for many architectures and operating systems. Some are older than others, and both Perl4 and Perl5 ports exist. If you are on an architecture other than UNIX, you may need to grab your Perl from this directory.
scripts We mention this particular area of the archive mostly because it will be going away pretty soon. The scripts area is the authors own collection of things from USENET, and all over, beginning in late 1991, and it has just about outlived its usefulness. In its day, it saw something on the order of 10,000 retrievals per week, but with the newer Perl5 modules and authors hierarchy, it's pretty much there just for posterity now. It contains specific examples of scripts and tools, some very old, which implement a given task or tasks, within a simple hierarchy.

CPAN Sites

There are a large number of sites which mirror the CPAN hierarchy which we've described above. The CPAN multiplexer at

http://www.perl.com/perl

will usually point you to an appropriate one. The perl.com archive is Tom's creation and contains plenty of other useful Perl stuff.

Summary

So, now that you've familiarized yourself with the resources you're going to need to work through the examples in this book, you're ready to continue to the tutorial. Just turn the page and dig in.