Chapter 1

What is Perl?


CONTENTS


Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from these files, and printing reports based on that information. It is also a good language for many system management tasks. The language is intended to be practical-easy to use, efficient, and complete-rather than beautiful-tiny, elegant, and minimal. Perl was written by Larry Wall (lwall@sems.com), with the help of lots of other contributors.

Why Perl?

UNIX system administrators and application developers often have to rely on several different languages to accomplish their tasks. This means learning a number of different syntaxes and having to write in multiple languages to accomplish a task. For example, to process a file, a system administrator might have to write a shell script using sh, process a file using awk or grep, and edit the file using sed. For other uses, the administrator may have to create a C program with its longer create/compile/debug development cycle.

It would be better if the administrator could combine many of these tasks into a simple language that is easy to write and develop, and reasonably efficient and complete. Along comes Perl.

In a single language, Perl combines some of the best features of C, sed, awk, and sh. People familiar with these languages have little difficulty being productive in Perl. Perl's expression syntax is very C-like. Perl uses sophisticated pattern-matching techniques to scan large amounts of data very quickly. Although optimized for scanning text, Perl can also deal with binary data. If you have a problem on which you would ordinarily use sed, awk, or sh, but it exceeds these tools' capabilities or must run a little faster and you don't want to write the program in a compiled language such as C, Perl may be the language for you.

A Brief History of Perl

It is helpful to your understanding of Perl to know a little bit about why Perl was created and how it evolved.

Larry Wall developed Perl in 1986. He was a systems programmer on a project that was developing multilevel, secure wide area networks. Larry was in charge of an installation consisting of three Vaxes and three Suns on the West Coast of the United States connected over an encrypted serial line (1200 baud!) to a similar configuration on the East Coast of the United States. Larry's primary job was system support "guru." During this stint, he developed several useful UNIX tools such as rn, patch, and warp.

Perl was developed in response to a management requirement for a configuration management and control system for all six Vaxes and all six Suns. As with most management requests, Larry had a month to develop this tool!

Larry considered the problem of a bicoastal configuration management tool, without writing it from scratch. The tool would have to be capable of viewing problem reports on both coasts with approvals and control. His answer was B-news.

Larry installed B-news on three machines and added two control commands. Configuration management was done using RCS, and approvals and submissions were done using news and rn.

However, managers always need one thing more. Larry's manager asked him to produce reports. B-news was maintained in separate files on a master machine, with lots of cross references between files. Larry's first thought was to use awk to produce the reports. Unfortunately, awk fell a bit short. It couldn't handle opening and closing multiple files based on information in the files. Larry didn't want to code a special purpose tool just for this job, so a new language was born.

The language wasn't originally called Perl. Larry, his coworkers, friends, and family considered just about every three- and four-letter word in existence. One of the earliest names was "Gloria" (his wife's name), but this was replaced due to the confusion it caused in his household. The name became "Pearl," which was changed into the present day "Perl," partly due to the existence of a graphics language called "pearl," but mostly because five letters was a bit much to type all the time. You'll find a reference to the former five-letter version in the entry for the acronym Practical Extraction and Report Language.

The early version of Perl lacked many of the features of today's version. The language included the following :

The manual page was only 15 pages long. But Perl was faster than sed and awk and began to be used on other aspects of the project.

Larry moved on to support research and development and took Perl with him. Perl was becoming a good tool for system administration. Larry borrowed Henry Spencer's regular expression package and modified it for Perl. Then Larry added most of the goodies he and other people wanted and released it on the Internet.

The current version (5+) of the language is a complete rewrite from the previous versions. It provides the following additional benefits:

Usability enhancementsIt is now possible to write much more readable Perl code. (How any C-like language can be called readable is still beyond me!)
Simplified grammarThe new yacc grammar is one half the size of the old one. Many of the arbitrary grammar rules have been regularized. The number of reserved words has been cut by two-thirds. Despite this, nearly all old Perl scripts will continue to work the same.
Lexical scopingPerl variables may now be declared within a lexical scope.
Arbitrarily nested data structuresAny scalar value, including any array element, may now contain a reference to any other variable or subroutine.
Modularity and reusabilityThe Perl library is now defined in terms of modules that can be shared easily among various packages.
Object-oriented programmingA package can function as a class. Dynamic multiple inheritance and virtual methods are supported in a straightforward manner and with very little new syntax. File handles may now be treated as objects.
Embeddability and ExtensibilityPerl may now be embedded easily in your C or C++ application and can either call or be called by your routines through a documented interface.
POSIX compliantA major new module is the POSIX module, which provides access to all available POSIX routines and definitions via object classes, where appropriate.
Package constructors and destructorsThe new BEGIN and END blocks provide a means to capture control as a package is being compiled and after the program exits.
Multiple simultaneous
BM implementations
A Perl program may now access DBM, NDBM, SDBM, GDBM, and Berkeley DB files from the same script, simultaneously.
Subroutine definitions may be autoloaded The AUTOLOAD mechanism enables you to define any arbitrary semantics for undefined subroutine calls.
Regular expression enhancementsYou can now specify non-greedy quantifiers and performing grouping without creating a back reference.
You can write regular expressions with embedded white space and comments for readability. A consistent extensibility mechanism has been added that is upwardly compatible with all old, regular expressions.

The Benefits of Using Perl

Perl has many advantages as a general-purpose scripting language. These benefits include its generous licensing (it's free), its interpreted nature, the fact that Perl is available for most platforms, and more. The following sections detail some of the benefits of this excellent language.

Cost and Licensing

First, Perl is generally available on most server platforms, including the following:

Perl also has the distinct advantage of being "low cost." It is distributed free of charge or, at most, for a small copying charge. Actually, Perl is distributed under the GNU "copyleft," which means that if you can execute Perl on your system, you should have access to the source of Perl for no additional charge. (Actually, a small copying charge might be imposed.) Perl may also be distributed under the "artistic license," which some people find less threatening than the copyleft.

Availability

Perl is readily available from many sources, including any comp.sources.unix archive or CPAN site. If you don't have Perl on your server or development machine, it is easy to obtain either as source code or precompiled binaries for many platforms. For those not on the Internet, Perl is available via anonymous Uucp from both uunet and osu-cis. Perl is often distributed with CD collections of utilities for UNIX platforms. (See appendix B, "Perl Module Archives," for information on Perl archives.)

Interpreted Language

Perl is interpreted. This can be either an advantage or disadvantage, depending on your needs. For example, Perl has a short development cycle compared to compiled languages, but it will never execute as fast as a compiled language. I discuss the disadvantages in the section called, "What Are the Negatives of Using Perl?," but there are some definite advantages.

One advantage of an interpreted language for tool or application development is that you can perform incremental, iterative development and testing without having to go through a create/compile/test/debug/fix cycle. By eliminating the compile portion of the cycle, interpreted languages can speed the development cycle drastically. It can also be helpful if you are evolving your application by implementing it with minimal capabilities and adding advanced capabilities later.

Because it is interpreted and relatively C-like, you can also use Perl as a prototyping language. This can be especially useful with complex or technically difficult projects such as network communication. You can use Perl's shortened development cycle to evaluate your design and then, once it is proven, rewrite the code in the language of your choice. By the way, C and C++ are good choices because Perl is a lot like C and supports much the same functionality.

Practical

Perl is written to be practical. This means that it is

These design goals mean that Perl programs can generally accomplish a goal that would otherwise take several other languages, require complex programming, and take longer to process.

But for many of us, practicality goes beyond this. It means that you can get things done in Perl. In fact, there are usually several ways that Perl can accomplish the same task. It also means that the programmer can concentrate on getting the task done rather than dealing with the "beauty" of the language in which he or she is working.

Complete

As mentioned before, Perl combines some of the best features of several languages. Here's a list of these languages:

grep/awk
General pattern-matching languages for selecting elements from a file.
C
A general-purpose compiled programming language. (Perl is written in C.)
sh
A control language generally used for running programs and scripts written in other languages.
sed
A stream editor for processing text streams (STDIN/STDOUT).

These languages typically have been the tools used by UNIX administrators to accomplish tasks. In fact, they are often touted as the reason that UNIX is an excellent development platform. They are still excellent tools for the purposes for which they were written.

However, if you have to deal with several languages, you also have to deal with learning these languages. For instance, a task to process a single text file might require the administrator to write a shell script to run an awk program to select lines that are subsequently processed by sed.

Figure 1.1: A single Perl script can often do the work of several other utilities.

With Perl, the administrator or developer can accomplish his goals in a single, easy-to-use language that performs the same tasks as these languages.

With version 5.0 of Perl, the language also supports an object-oriented approach to pro-gramming. This means that packages/modules can be distributed as objects and used without knowledge of the underlying code. These packages can also be extended as they can be in other object-oriented languages. The key is that programmers only use the object-oriented features of Perl if they need them for the particular program they are writing.

Easy to Use

Above all, Perl is a language in which you can do things. There are usually several ways to accomplish the same task. Although some techniques are more efficient with system resources than others, users can generally select the technique that is easier for them to use (and maintain/enhance in the future) and go with it.

The ease of use and completeness make Perl appropriate for quick-and-dirty, one-time utilities as well as structured, complex applications.

Efficient

Perl is a straight-line language, which means that simple programs do not have to deal with complex formatting or function/procedure or object/method structures to accomplish their task. As a simple example, let's pay homage to programming texts (including this one) with the "Hello World!" program. Here it is in C:

void main()
{
    printf("Hello World!");
}

And here it is in Perl:

print 'Hello World!'

Get in, get out, and get the job done.

Language Capabilities

Perl is optimized for text processing and, therefore, is very efficient at many tasks required of system administrators and application developers. Many of the files used in UNIX systems administration are plain text files. Selecting records, processing the selected records, and reporting exceptions are the heart of many tasks performed in UNIX administration.

In the current versions of Perl, the language also includes much additional functionality, making it appropriate for tasks such as processing socket calls, embedding in programs written in C, and maintaining POSIX-compliant systems.

Integration with C

Perl can access C libraries to take advantage of much of the code written for this popular language. Utilities included with Perl distributions enable you to convert the headers for these C libraries into their Perl equivalents.

Perl 5.0 can be integrated easily into C and C++ applications. Perl can call or be called by routines written in C or C++. The Perl interface is through a set of perl_call_* functions. The call to C libraries is through the XS language interface.

Specialized Extensions to Perl

There are many specialized extensions to Perl, primarily for handling specific databases such as Oracle, Ingres, Informix. These combine the strengths of the Perl language with the access to the host database.

At the time of this writing, ftp.demon.co.uk (158.152.1.69) is the official repository for database <foo>perls (see the following list), which can be found in /pub/perl/db/perl4/. It's mirrored at ftp.cis.ufl.edu (198.17.47.33) in /pub/perl/scripts/db/.

btreeperl NDBM extensions
ctreeperl C-Tree extensions
duaperl X.500 directory user agent
ingperl Ingres
isqlperl Informix
interperl Interbase
oraperl Oracle 6 and 7
pgperl Postgres
sybperl Sybase 4
uniperl UNIFY 5.0

See appendix B, "Perl Module Archives," for more information on these repositories.

Socket Capability

Perl has the capability to read/write TCP/IP sockets. This gives it the capability to communicate with servers of all types that rely on socket communication. It also enables you to write utility and "robot" programs in the Perl language. For example, Perl's socket capability can be used to write a robot program to automate the checking of a World Wide Web (WWW) site to verify the validity of links on your Web pages. This can be especially useful in keeping a site up-to-date, given the volatility of the Internet in its relative infancy.

Perl Is Relatively Easy to Learn

Unlike many programming languages, Perl is designed to be practical rather than beautiful. By this I mean that Perl was designed from the start to be easy to use, efficient, and complete rather than tiny, elegant, and minimal.

Programming in Perl is relatively easy, especially if you have experience in C or another C-like language. Like many scripting languages, Perl reads its programs from the first line to the last line. It doesn't require complex structures to be able to create a program. It does, however, support subroutines or functions and, in version 5.0, can be object oriented.

Perl Has Built-In Debugging Facilities

The Perl interpreter has a built-in debugger that can help reduce the time it takes to debug applications. The debugger is activated through the use of the -d switch on the command line. In addition, the -w switch provides a complete set of warnings that can be invaluable in debugging Perl scripts.

Perl Help Is Readily Available

Because Perl is very popular as a scripting language, there is a lot of help out there. Newsgroup discussions are a good place to start when you require help on Perl programming. There are newsgroups devoted entirely to Perl and newsgroups devoted to Web page creation in which the majority of the discussion is about Perl. Here are some of them:

NewsgroupComment
comp.lang.perl... This set of newsgroups covers information about Perl in general. Much of the discussion in the specific groups covers using Perl for utility purposes and also as a CGI scripting language.
comp.lang.perl.announce Provides information about new modules for Perl programming.
comp.lang.perl This is the main newsgroup about Perl.
comp.lang.perl.modules Provides discussions of Perl modules.
Comp.lang.perl.tk Provides discussions of Tk used with Perl.

There are, of course, Web pages related to Perl. Check the newsgroups for announcements about these pages. Here are just a couple that I have found as of this writing:

URLComment
http://www.perl.com/ This is the Perl language homepage. It provides links to Perl resources.
http://www.eecs.nwu.edu/perl/perl.html NWU's Perl page.
http://www.yahoo.com/Computers/Languages/Perl/ Yahoo's Perl index.
http://www.virtualschool.edu/mon/Perl.html The "middle of nowhere" Perl archive (Netscape 2.0 pages).
http://www.teleport.com/~rootbeer/perl.html References with a special emphasis on using Perl for Web-related programming and on learning Perl.

See appendix B for more complete information on Perl-related Web pages.

Several lists of frequently asked questions (FAQ) are posted to the Perl newsgroups. One of the best to start with is the Perl Meta-FAQ, produced by Neil Bowers (neilb@khoros.unm.edu). As you would expect, this is an FAQ about FAQs. It's available at this writing from the following sources:

HTMLhttp://www.khoros.unm.edu/staff/neilb/perl/metaFAQ/metaFAQ.html
PostScriptftp://ftp.khoros.unm.edu/pub/perl/metaFAQ.ps
ASCIIftp://ftp.khoros.unm.edu/pub/perl/metaFAQ.txt

Perl Examples Are Readily Available

Again, because Perl is so popular as a utility language, there are lots of examples of Perl modules out there. One of the best sources is by file transfer protocol (FTP) from one of the Comprehensive Perl Archive Network (CPAN) sites around the world (see appendix B).

What Are the Negatives of Using Perl?

Perl has few negatives as a scripting language for system administration tasks and as a language for module development. But there are a few.

Interpreted Language

Perl is interpreted. Therefore, it will not be as fast as compiled languages such as C or C++. Given the speed of modern CPUs, in all but very large or time-critical applications, this will not make a significant difference. And in fact, the interpreted nature of the language can reduce development time significantly by eliminating the time needed to compile and debug versions of the program (see the previous section "The Benefits of Using Perl").

Perceived as Public Domain

Perl isn't strictly in the public domain (see the license agreement for details). But it's close enough. Many large companies have policies against using public domain or copylefted software. In many cases, this bias is more of a mind-set than a negative, but it can be a detriment to using Perl (see the following section, "Informal Support").

Because Perl is in the public domain, there is no corporation that your company can apply leverage against to get something done. But you do have access to the Perl source to make specific needed changes to your environment, if required.

Informal Support

The support for Perl is on an informal basis through the volunteer efforts of users worldwide. Does this mean it is bad? No, not necessarily. In fact, the "support" given through the Internet newsgroups is probably as good as any given by a major corporation. But you can't depend on your question being answered, at least in a timely manner. And you don't have a corporation on which you can apply pressure to support your specific environment. On the other hand, you do have access to the source code for Perl and can look into problems yourself.

Protecting Proprietary Code

Perl isn't compiled (although there is an effort to make it so). Thus, if you distribute your solutions, you distribute code. This can be a deterrent to producing (at least your final application) in Perl. (See the previous discussion of the benefits of Perl, in the section "Interpreted Languages," as a prototyping language.)

Concerns About Reliability

Perl, in its version 5+ incarnation, is undergoing some major changes. Things might not work or might break later. This can be a concern for the future of applications written for a specific version and relying on a specific feature. On the positive side, there are a lot of people testing each release through use. Many of these bugs are quickly detected and ironed out.

Maintainability of Scripts

Perl has somewhat of a reputation for being unreadable. This can be a problem for system maintenance. However, Perl is probably no more unreadable than any C-like language. (C itself, in my opinion, is a very un-pretty-I won't say ugly-language; Perl suffers from that heritage.)

Like with any other language, the maintainability of Perl relies heavily on the willingness of the programmer to structure and comment/document the code. Because many "quick-and-dirty" utilities are written in Perl to get a specific job done and then expanded to be more generally usable, much of the available source code isn't all that pretty. (Sounds a little like the evolution of Perl itself, doesn't it?)

GNU Copyleft License Agreement

The GNU license under which Perl is distributed is really quite innocuous. But, it might be a problem depending upon the type of application you are developing. If you intend to do any of the following, Perl is probably not the best language to use:

What Can Perl Do?

Perl is most commonly used to develop system administration tools. But it has also gained enormous popularity on the Internet. Perl can be, and is, used to develop many Internet applications and their supporting utility applications. The following sections describe some applications of Perl in systems administration and on the Internet.

UNIX System Maintenance

As mentioned before, Perl can perform the work of several other tools, and usually in less time. It is particularly adept at processing the text files typically used as configuration files.

CGI Scripts

Perl is one of the most popular languages for creating CGI applications. There are literally thousands of examples of dynamic CGI programming in Perl. Perl can be used to create dynamic Web pages that can change depending on factors such as which visitor is viewing them.

One of the most common uses of Perl on the Internet is to process form input. Perl is especially adept at this chore because most of that input is textual-Perl's strength.

Mail Processing

Another popular use of Perl is for the automated processing of Internet e-mail. Perl scripts have been used to filter mail based on address or content. Perl scripts have also been written to automate mailing lists. One of the most popular of these programs is Majordomo.

I personally have written a Perl script to automate my "What's New?" Web page. This script processes mail messages and adds them to my "What's New?" page. It also removes the entries from the page after they have been there for a certain length of time.

Automating Web Site Maintenance

Perl can be used to automate the maintenance of Web sites. Because Web pages are little more than text files in a specific format, Perl is particularly adept at processing them. Perl's socket capability can also be used to contact other sites and request information using HTTP. There has even been a Web server written in Perl.

In order to check the links on a site, a Perl program must parse the sites pages starting with the main page, extract the URLs, and determine whether these URLs are still active.

Automating File Retrieval

There are several FTP clients written in Perl. Perl can be used to automate file retrieval via FTP. Again, this combines the socket capability of Perl with its text-processing capability.

Is Perl for You?

Only you can answer that question. The next chapters will give you a grounding in the Perl language that may help you decide whether you wish to use Perl for Internet programming. If you choose not to make it your main Web programming language, then because of its versatility, ease of use, and popularity, you may find that it becomes your utility language for the Web, if nothing else.

Summary

Perl is a practical, easy-to-use, efficient programming language. Add it to your toolbox and use it especially when you have tasks that involve text processing.

Like any programming language, Perl is not the only language you should have in your toolbox, but, when chosen for the appropriate tasks, Perl can give you the ability to solve the problem quickly.

If you're looking for a language which is beautiful, elegant, or minimal, Perl isn't for you. If, on the other hand, you're looking for a tool to get things done, few languages can compare with Perl.