Chapter 3

References


CONTENTS


This chapter describes the use of Perl references and the concept of pointers. It also shows you how to use references to create fairly complex data structures and pass pointers, as well as how to use pointers to subroutines and to pass parameters.

Introduction to References

A reference is simply a pointer to something; it is very similar to the concept of a pointer in C or PASCAL. That something could be a Perl variable, array, hash, or even a subroutine. A reference in your program is simply an address to a value. How you use the value of that reference is really up to you as the programmer and what the language lets you get away with. In Perl, you can use the terms pointer and reference interchangeably without any loss of meaning.

There are two types of references in Perl 5 with which you can work: symbolic and hard.

A symbolic reference simply contains the name of a variable. Symbolic references are useful for creating variable names and addressing them at runtime. Basically, a symbolic reference is like the name of a file or a soft link on a UNIX system. Hard references are more like hard links in the file system; that is, a hard link is merely another path to the same file. In Perl, a hard reference is another name for a data item.

Hard references in Perl also keep track of the number of references to items in an application. When the reference count becomes zero, Perl automatically frees the item being referenced. If that item happens to be a Perl object, the object is "destructed," that is, freed to the memory pool. Perl is object-oriented in itself because everything in a Perl application is an object, including the main package. When the main package terminates, all other objects within the main object are also terminated. Packages and modules in Perl further the ease of use of objects in Perl. Perl modules are covered in Chapter 4, "Introduction to Perl Modules."

When you use a symbolic reference that does not exist, Perl creates the variable for you and uses it. For variables that already exist, the value of the variable is substituted instead of the $variable token. This substitution lets you construct variable names from variable names.

Consider the following example:

$lang = "java";
$java = "coffee";

print "${lang}\n";
print "hot${lang}\n";
print "$$lang \n"

The third print line is important. $$lang is first reduced to $java, then the Perl interpreter will recognize that $java can also be reparsed, and the value of $java, "coffee", is used.

Symbolic references are created via the ${} construct, so ${lang} translates to java, and hot${java} translates to hotjava. If you want to address a variable name hotjava, you could use the statement: ${hot${lang}}. This would be interpreted as, "take the value in $lang, and append it to the word hot. Now take the constructed string (hotjava) and use it as a name because there is a ${} around it."

In other words, the value of the scalar produced by $$lang is taken to be the name of a new variable, and the variable at $java is used. Here's the output from this example:

java
hotjava
coffee

Thus, the difference between a hard reference ($lang) and a symbolic reference ($$lang) is how the variable name is derived. In a hard reference, you are referring to a variable's value directly. With a symbolic reference, you are using another level of indirection by constructing or deriving a symbol name from an existing variable.

References are easy to use in Perl as long as they are used as scalars. To use hard references as anything but scalars, you have to explicitly dereference the variable and tell it how to be used.

Using References

A scalar value in this chapter refers to a variable, such as $pointer, that contains one data item. This item is a scalar and any scalar may hold a hard reference. Arrays and hashes contain scalars; therefore, they can hold many references. Thus, with judicious use of arrays and hashes, you can easily build complex data structures of different combinations of arrays of arrays, arrays of hashes, hashes of functions, and so on.

There are several ways to construct references, and you can have references to just about anything-arrays, scalar variables, subroutines, file handles, and, yes (to the delight of C programmers), even to other references.

To use the value of $pointer as the pointer to an array, you reference the items in the array as @$pointer. The notation @$pointer roughly translates to "take the value in $pointer, and then use this value as the address to an array." Similarly, you use %$pointer for hashes. That is, "take the value of $pointer and interpret is as an address to a hash."

The Backslash Operator

Using the backslash operator is analogous to using the ampersand (&) operator in C to pass the address of an operator. This method is usually used to create a second, new reference to the variable in question. Here's how to create a reference to a scalar variable:

$variable = 22;
$pointer = \$variable;

$ice = "jello"
$iceptr = \$ice;

Now $pointer points to the location containing the value of $variable. The pointer $iceptr points to jello. Even if the original reference ($variable) goes away, you can still access the value from the $pointer reference. It's a hard reference at work here, so you have to get rid of both $pointer and $variable to free up the space in which the value of jello is allocated. Similarly, $variable contains the number 22 and because $pointer refers to $variable, dereferencing the $pointer with the statement $$pointer returns a value of 22. In a subroutine, both $variable and $pointer have to be declared as "local" or "my" variables. If they are both not declared as such, at least one of these variables will persist as a global variable long after the subroutine in which they are declared returns. As long as either of these variables exists, the space for storing the numbers will also exist.

The variable $pointer contains the address of the $variable, not the value itself. To get the value, you have to dereference $pointer with two dollar signs, $$. Listing 3.1 illustrates how this works.


Listing 3.1. References to scalars.
1 #!/usr/bin/perl
2
3 $value = 10;
4
5 $pointer = \$value;
6
7 printf "\n Pointer Address $pointer of  $value \n";
8
9 printf "\n What Pointer *($pointer) points to $$pointer\n";

$value in this script is set to 10. $pointer is set to point to the address of $value. The two printf statements show how the value of the variable is being referenced. If you run this script, you'll see something very close to this output:

Pointer Address SCALAR(0x806c520) of 10

What Pointer *(SCALAR(0x806c520)) points to 10

The address shown in the output from your script definitely will be different from the one shown here. However, you can see that $pointer gave the address, and $$pointer gave the value of the scalar pointed to by $variable.

The word SCALAR followed by a long hexadecimal number in the address value tells you that the address points to a scalar variable. The number following SCALAR is the address where the information of the scalar variable is being kept.

References and Arrays

This is perhaps the most important thing you must remember about Perl: all Perl @ARRAYs and %HASHes are always one-dimensional. As such, the arrays and hashes hold only scalar values and do not directly contain other arrays or complex data structures. If it's a member of an array, it's either a data item or a reference to a data item.

You can also use the backslash operator on arrays and hashes, just as you would for scalar variables. For arrays, you use something like the Perl script in Listing 3.2.


Listing 3.2. Using array references.
 1 #!/usr/bin/perl
 2 #
 3 # Using Array references
 4 #
 5 $pointer = \@ARGV;
 6 printf "\n Pointer Address of ARGV = $pointer\n";
 7 $i = scalar(@$pointer);
 8 printf "\n Number of arguments : $i \n";
 9 $i = 0;
10 foreach (@$pointer) { # Access the entire array.
11            printf "$i : $$pointer[$i++]; \n";
12            }

Let's examine the lines that pertain to references in this shell script, which prints out the contents of the input argument array @ARGV. Line 5 is where the reference $pointer is set to point to the array @ARGV. Line 6 simply prints the address of ARGV out for you. You probably will never have to use the address of ARGV, but had you been using another array, this would be a quick way to get to the address of the first element of the array.

Now $pointer will return the address of the first element of the array. This reference to an array should sound familiar to C programmers, where a reference to a one-dimensional array is really just a pointer to the first element of the array.

In line 7, the function scalar() (not to be confused with the type of variable scalar) is called to get the count of the elements in an array. The parameter passed in could be @ARGV, but in the case of the reference in $pointer, you have to specify the type of parameter expected by the scalar() function. Are you confused yet? There is a scalar() function; a scalar variable holds one value; and a hard reference is a scalar unless it's dereferenced to behave like a non-scalar.

Note
Remember that a reference to something will always be used as scalar. There is no implicit dereferencing in Perl. You specify how you want the scalar value of a reference to be used. Once you have a scalar reference, you can dereference it to be used as a pointer to an array, hash, function, or whatever structure you want.

The type of $pointer in this case is a pointer to the array whose number of elements you have to return. The call is made to the function with @$pointer as the passed parameter. $pointer really gives the address of the first entry in the array, and @ forces the passing of the address of the first element for use as an array reference.

The same reference to the array in line 10 is the same as in line 7. In line 11 all the elements of the array are listed out using the $$pointer[$i] item. How would the Perl compiler interpret the same statement to dereference $pointer to get an item in an array? Well, $pointer points to the first element in the array. Then you go to the ($i - 1)th item in the array (via the use of $pointer[$i++]) and also increment the value of $i. Finally, the value at $$pointer[$i] is returned as a scalar. Because the autoincrement operator is low on the priority list, $i is incremented last of all.

The program is appropriately called testmeout. Here is sample input and output for the code in Listing 3.2.

$ testmeout 1 2 3 4

 Pointer Address of ARGV = ARRAY(0x806c378)

 Number of arguments : 4
0 : 1;
1 : 2;
2 : 3;
3 : 4;

The number following ARRAY in the pointer address of ARGV in this example is the address of ARGV. Not that that address does you any good, but just realize that references to arrays and scalars are displayed with the type to which they happen to be pointing.

The backslash operator can be used with associative arrays too. The idea is the same: you are substituting the $pointer for all references to the name of the associative array. You use %$pointer instead of @$pointer to refer to an array. By specifying the percent sign (%) you are forcing Perl to use the value of $pointer as a pointer to a hash.

For pointers to functions, the address is printed with the word CODE. For a hash, it is printed as HASH. Listing 3.3 provides an example of using hashes.


Listing 3.3. Using references to associative arrays.
 1 #!/usr/bin/perl
 2
 3 #
 4 # Using References to Associative Arrays
 5 #
 6
 7 %month = (
 8             '01', 'Jan',
 9             '02', 'Feb',
10             '03', 'Mar',
11             '04', 'Apr',
12             '05', 'May',
13             '06', 'Jun',
14             '07', 'Jul',
15             '08', 'Aug',
16             '09', 'Sep',
17             '10', 'Oct',
18             '11', 'Nov',
19             '12', 'Dec',
20             );
21
22 $pointer = \%month;
23
24 printf "\n Address of hash = $pointer\n ";
25
26 #
27 # The following lines would be used to print out the
28 # contents of the associative array if %month was used.
29 #
30 # foreach $i (sort keys %month) {
31 # printf "\n $i $$pointer{$i} ";
32 # }
33
34 #
35 # The reference to the associative array via $pointer
36 #
37 foreach $i (sort keys %$pointer) {
38            printf "$i is $$pointer{$i} \n";
39 }

The associative array is referenced via the code in line 22 that contains $pointer = \%month;. This will create a hard reference, $pointer, to the hash called %month. Now you can also refer to the %month associative array by using the value in the $pointer variable. Using the %month variable, you would refer to an element in the hash using the syntax $month{$index}. In order to use the $pointer value, you would simply replace the month with $pointer in the name of the variable. This is very similar to the procedure used with pointers to ordinary arrays. The elements of the %month associative array are referenced with the $$pointer{$index} construct. Of course, because the array is really a hash, the $index is the key into the hash and not a number.

Here is the output from running this test script.

$ mth

 Address of hash = HASH(0x806c52c)

 01 is Jan
 02 is Feb
 03 is Mar
 04 is Apr
 05 is May
 06 is Jun
 07 is Jul
 08 is Aug
 09 is Sep
 10 is Oct
 11 is Nov
 12 is Dec

Associative arrays do not have to be constructed using the comma operator. You can use the => operator instead. In later Perl modules and sample code, you'll see the use of the => operator, which is the same as the comma operator. Using the => operator makes the code a bit easier to read aloud. Examine the output of Listing 3.3 with the print statements in the program to see how the output was generated.

Now let's look at how pointers to arrays and hashes can be dereferenced to get individual items. See the code in Listing 3.4 to see how you can use the => operator.


Listing 3.4. Alternative use of the => operator.
 1 #!/usr/bin/perl
 2
 3 #
 4 # Using Array references
 5 #
 6
 7 %weekday = (
 8             '01' => 'Mon',
 9             '02' => 'Tue',
10             '03' => 'Wed',
11             '04' => 'Thu',
12             '05' => 'Fri',
13             '06' => 'Sat',
14             '07' => 'Sun',
15             );
16
17 $pointer = \%weekday;
18
19 $i = '05';
20
21 printf "\n ================== start test ================= \n";
22 #
23 # These next two lines should show an output
24 #
25             printf '$$pointer{$i} is ';
26             printf "$$pointer{$i} \n";
27             printf '${$pointer}{$i} is ';
28             printf "${$pointer}{$i} \n";
29
30             printf '$pointer->{$i} is ';
31             printf "$pointer->{$i}\n";
32
33 #
34 # These next two lines should not show anything
35 #
36             printf '${$pointer{$i}} is ';
37             printf "${$pointer{$i}} \n";
38             printf '${$pointer->{$i}} is ';
39             printf "${$pointer->{$i}}";
40
41 printf "\n ================== end of test ================= \n";

Here is the output from the Perl script shown in listing 3.4.

 ================== start test =================
$$pointer{$i} is Fri
${$pointer}{$i} is Fri
$pointer->{$i} is Fri
${$pointer{$i}} is
${$pointer->{$i}} is
 ================== end of test =================

In this output, you can see that the first two lines gave you the expected output. The first reference is used in the same way as regular arrays. The second line uses ${pointer} and indexes using {$i}, and the leftmost $ dereferences (gets) the value at the location reached after the indexing.

Then there are the two lines that did not work. In the third line of the output, $pointer{$i} tries to reference an array using the first element instead of its address. The fourth line, ${$pointer->{$i}}, has an extra level of indirection leading to a scalar being used as a pointer and therefore prints nothing.

The -> operator should be very familiar to C++ or C programmers. Using a reference like $variable->{$k} is synonymous with the use of $$variable{$k}. The -> simply means "use the value of the left side of -> as an address and dereference it as a pointer to an array." So, in line 30, you use $pointer-> in place of $pointer to refer to an array. The {$i} is used to index into the array directly, because the $pointer-> is already defined as pointing to an array. In the case of $$pointer{$i}, two preceding dollar signs ($$) are required: one to dereference the value in $pointer, and the other to use the value at the i-th index in the array as a scalar.

We will cover the use of the -> operator in a moment when we use it to index into elements of arrays. Let's first look at how we can use simple array concepts to construct multidimensional arrays.

Using Multidimensional Arrays

The way to create a reference to an array is with the statement @array = list. You can create a reference to a complex anonymous array by using square brackets. Consider the following statement, which sets the parameters for a three-dimensional drawing program:

$line = ['solid', 'black', ['1','2','3'] , ['4', '5', '6']];

This statement constructs an array of four elements. The array is referred to by the scalar $line. The first two elements are scalars indicating the type and color of the line to draw. The next two elements of the array referred to by $line are references to anonymous arrays; they contain the starting and ending points of the line.

To get to the elements of the inner array elements, you can use the following multidimensional syntax:

$arrayReference->[$index] for a single dimensional array, and
$arrayReference->[$index1][$index2] for a two dimensional array, and
$arrayReference->[$index1][$index2][$index3] for a three dimensional array.

Let's see how creating arrays within arrays works in practice. Refer to Listing 3.5 to print out the information pointed to by the $list reference.


Listing 3.5. Using multidimensional array references.
 1 #!/usr/bin/perl
 2
 3 #
 4 # Using Multidimensional Array references
 5 #
 6
 7 $line = ['solid', 'black', ['1','2','3'] , ['4', '5', '6']];
 8
 9 print "\$line->[0] = $line->[0] \n";
10 print "\$line->[1] = $line->[1] \n";
11 print "\$line->[2][0] = $line->[2][0] \n";
12 print "\$line->[2][1] = $line->[2][1] \n";
13 print "\$line->[2][2] = $line->[2][2] \n";
14 print "\$line->[3][0] = $line->[3][0] \n";
15 print "\$line->[3][1] = $line->[3][1] \n";
16 print "\$line->[3][2] = $line->[3][2] \n";
17
18 print "\n"; # The obligatory output beautifier.

Here is the output of the program that shows how to use two-dimensional arrays.

$line->[0] = solid
$line->[1] = black
$line->[2][0] = 1
$line->[2][1] = 2
$line->[2][2] = 3
$line->[3][0] = 4
$line->[3][1] = 5
$line->[3][2] = 6

You can modify the script in Listing 3.5 to work with three-dimensional (or even n-dimensional) arrays, as shown in Listing 3.6.


Listing 3.6. Extending to multiple dimensions.
 1 #!/usr/bin/perl
 2
 3 #
 4 # Using Multidimensional Array references again
 5 #
 6
 7 $line = ['solid', 'black', ['1','2','3', ['4', '5', '6']]];
 8
 9 print "\$line->[0] = $line->[0] \n";
10 print "\$line->[1] = $line->[1] \n";
11 print "\$line->[2][0] = $line->[2][0] \n";
12 print "\$line->[2][1] = $line->[2][1] \n";
13 print "\$line->[2][2] = $line->[2][2] \n";
14
15 print "\$line->[2][3][0] = $line->[2][3][0] \n";
16 print "\$line->[2][3][1] = $line->[2][3][1] \n";
17 print "\$line->[2][3][2] = $line->[2][3][2] \n";
18
19 print "\n";

In this example, the array is three deep; therefore, a reference like $line->[2][3][0] has to be used. For a C programmer, this is akin to the statement Array_pointer[2][3][0], where pointer is pointing to what's declared as an array with three indexes.

In the previous examples, only hard-coded numbers were used as the indexes. There is nothing preventing you from using variables instead. As with array constructors, you can mix and match hashes and arrays to create as complex a structure as you want.

Creating complex structures is the next step. Listing 3.7 illustrates how these two types of arrays can be combined. It uses the point numbers and coordinates to define a cube.


Listing 3.7. Using multidimensional arrays.
 1 #!/usr/bin/perl
 2
 3 #
 4 # Using Multidimensional Array and Hash references
 5 #
 6
 7 %cube = (
 8             '0', ['0', '0', '0'],
 9             '1', ['0', '0', '1'],
10             '2', ['0', '1', '0'],
11             '3', ['0', '1', '1'],
12             '4', ['1', '0', '0'],
13             '5', ['1', '0', '1'],
14             '6', ['1', '1', '0'],
15             '7', ['1', '1', '1']
16             );
17
18 $pointer = \%cube;
19
20 print "\n Da Cube \n";
21 foreach $i (sort keys %$pointer) {
22             $list = $$pointer{$i};
23             $x = $list->[0];
24             $y = $list->[1];
25             $z = $list->[2];
26             printf " Point $i =  $x,$y,$z \n";
27
28 }

In this listing, %cube contains point numbers and coordinates in a hash. Each coordinate itself is an array of three numbers. The $list variable is used to get a reference to each coordinate definition with the following statement:

$list = $$pointer{$i};

After you get the list, you can reference off of it to get to each element in the list with this statement:

$x = $list->[0];
$y = $list->[1];

Note that the same result of assigning values to $x, $y, and $z could be achieved by these two lines of code:

($x,$y,$z) = @$list;
$x = $list->[0];

This works because you are dereferencing what $list points to and using it as an array, which in turn is assigned to the list ($x,$y,$z). $x is still assigned with the -> operator.

When working with hashes or arrays, dereferencing by -> is like a dollar-sign ($) dereference. When accessing individual array elements, you are often faced with writing statements like these two:

$$names[0] = "Kamran";
$names->[0] = "Kamran";

Both lines are equivalent. The substring "$names" in the first line has been replaced with the
-> operator to create the second line. The same procedure can be applied for hash operations:

$$lastnames{"Kamran"} = "Husain";
$lastnames->{"Kamran"} = "Husain";

Arrays in Perl can be created with a fixed size set to the value of the highest index that is used. They do not have to remain at this size, though, and can grow on demand. Referencing them for the first time creates the array and space for the item that is being indexed in the array. Referencing the array again at different indexes creates those elements at the indexed references if they do not already exist. Array references can be created automatically when first referenced in the left side of an equation. Using a reference such as $array[$i] creates an array into which you can index with $i. Such is the case with scalars and even multidimensional arrays.

References to Subroutines

Just as you can reference individual items such as arrays and scalar variables, you can also point to subroutines. In C, this would be akin to pointing to a function. To construct such a reference, you use a statement like this:

$pointer_to_sub = sub { ... declaration of sub ... } ;

Note the use of the semicolon at the end of the sub() declaration. The subroutine pointed to by $pointer_to_sub points to the same function reference even if the statement is placed in a loop. This feature in Perl lets you declare several anonymous sub() functions in a loop without worrying about the fact that you are chewing up memory by declaring the same function over and over as you go about in a loop. As you come around the loop and reassign a scalar to the sub, Perl simply assigns to the same subroutine declared with the first use of the sub() statement.

To call a referenced subroutine, use this syntax:

&$pointer_to_sub( parameters );

This code works because you are dereferencing the $pointer_to_sub and using it with the ampersand (&) as a pointer to a function. The parameters portion may or may not be empty, depending on how your function is defined. The code within a sub is simply a declaration created with this statement. The code within the sub is not executed immediately; however, it is compiled and set for each use. Consider the script shown in Listing 3.8.


Listing 3.8. Using references to subroutines.
 1 #!/usr/bin/perl
 2
 3 sub print_coor{
 4             my ($x,$y,$z) = @_;
 5             print "$x $y $z \n";
 6             return $x;};
 7
 8 $k = 1;
 9 $j = 2;
10 $m = 4;
11 $this  = print_coor($k,$j,$m);
12
13 $that  = print_coor(4,5,6);

When you execute this listing, you get the following output:

$ test
1 2 4
4 5 6

This output tells you that assignments of $x, $y, and $z were done when the first declaration of print_coor was encountered as a call. Each reference to $this and $that now points to a completely different subroutine, the arguments to which were passed at runtime.

Using Subroutine Templates

Subroutines are not limited to returning only data types. They can return references to other subroutines, too. The returned subroutines run in the context of the calling routine but are set up in the original routine that created them. This type of behavior is caused by the way closure is handled in Perl. Closure means that if you define a function in one context, it runs in that particular context in which it was first defined. (A book on object-oriented programming would provide more information on closure.)

To see how closure works, look at Listing 3.9, which you can use to set up different types of error messages. Such subroutines are useful in creating templates of all error messages.


Listing 3.9. Using closure.
 1 #!/usr/bin/perl
 2
 3 sub errorMsg {
 4          my $lvl = shift;
 5                 #
 6                 # define the subroutine to run when called.
 7                 #
 8          return sub {
 9
10                         my $msg = shift;  # Define the error type now.
11                         print "Err Level $lvl:$msg\n"; }; # print later.
12          }
13
14 $severe  = errorMsg("Severe");
15 $fatal = errorMsg("Fatal");
16 $annoy = errorMsg("Annoying");
17
18 &$severe("Divide by zero");
19 &$fatal("Did you forget to use a semi-colon?");
20 &$annoy("Uninitialized variable in use");

The subroutine errorMsg declared here uses a local variable called lvl. After this declaration, errorMsg uses $lvl in the subroutine it returns back to the caller. Therefore, the value of $lvl is set in the context when the subroutine errorMsg is first called, even though the keyword my is used. Therefore, the following three calls set up three different $lvl variable values, each in their own context:

$severe  = errorMsg("Severe");
$fatal   = errorMsg("Fatal");
$annoy   = errorMsg("Annoying");

Now, when the reference to a subroutine is returned by the call to the errorMsg function in each of the lines above, the value of $lvl within the errorMsg function is retained for each context in which $lvl was declared. Thus, the $msg value from the referenced call is used, but the value of $lvl is the value that was first set in the actual creation of the function.

Sound confusing? It is. This is primarily the reason why you do not see this type of code in most Perl programs.

Implementing State Machines

Using arrays and pointers to subroutines, you can come up with some nifty applications. Consider using an array of pointers to subroutines to implement a state machine. Listing 3.10 provides an example of a simple, asynchronous state machine.


Listing 3.10. A simple, asynchronous state machine.
 1 #!/usr/bin/perl
 2 # --------------------------------------------------------------
 3 # Define each state as subroutine. Then create a
 4 # reference to each subroutine. We have four states here.
 5 # --------------------------------------------------------------
 6 $s0 = sub {
 7            local $a = $_[0];
 8            print "State 0 processing $a \n";
 9            if ($a eq '0')  { return(0); }
10            if ($a eq '1')  { return(1); }
11            if ($a eq '2')  { return(2); }
12            if ($a eq '3')  { return(3); }
13            return 0;
14            };
15 # --------------------------------------------------------------
16 $s1 = sub {
17            local $a = shift @_;
18            print "State 1 processing $a \n";
19            if ($a eq '0')  { return(0); }
20            if ($a eq '1')  { return(1); }
21            if ($a eq '2')  { return(2); }
22            if ($a eq '3')  { return(3); }
23            return 1;
24            };
25 # --------------------------------------------------------------
26 $s2 = sub {
27            local $a = $_[0];
28            print "State 2 processing $a \n";
29            if ($a eq '0')  { return(0); }
30            if ($a eq '1')  { return(1); }
31            if ($a eq '2')  { return(2); }
32            if ($a eq '3')  { return(3); }
33            return 2;
34            };
35 # --------------------------------------------------------------
36 $s3 = sub {
37            my  $a = shift @_;
38            print "State 3 processing $a \n";
39            if ($a eq '0')  { return(0); }
40            if ($a eq '1')  { return(1); }
41            if ($a eq '2')  { return(2); }
42            if ($a eq '3')  { return(3); }
43            return 3;
44            };
45 # --------------------------------------------------------------
46 # Create an array of pointers to subroutines. The index
47 # into this array is the current state.
48 # --------------------------------------------------------------
49 @stateTable = ($s0, $s1, $s2, $s3);
50 # --------------------------------------------------------------
51 # Initialize the state to 0.
52 # --------------------------------------------------------------
53 $this = 0;
54 # --------------------------------------------------------------
55 # Implement the state machine.
56 #   set current state to 0
57 #   forever
58 #        get response
59 #        set current state to next state based on response.
60 # --------------------------------------------------------------
61 while (1)
62            {
63            print "\n This state is : $this -> what next? ";
64            $reply = <STDIN>;
65            chop($reply);
66            #
67            # Stop the machine here
68            #
69            if ($reply eq 'q') { exit(0); }
70            print " Reply = $reply \n";
71            #
72            # Get the present state function.
73            #
74            $state = $stateTable[$this];
75            #
76            # Get the next state from this state.
77            #
78            $next = &$state($reply);
79            printf "Next state = $next from this state $this\n";
80            #
81            # Now advance present state to next state
82            #
83            $this = $next;
84     }

Let's see how each function implements the state transitions. All input into each state consists of removing the initial state as the first parameter into the subroutine. In Perl, the @_ variable is the array of input parameters into a subroutine and is always defined in each subroutine. In line 37, the shift command forces the first item from the list of input parameters into $a. The value of $a is then used as the current state of the program.

There are four states in this state machine: S0, S1, S2, and S3. Each state accepts input in the form of a number. Each number is used to get the next state to go to. Note how $a is declared in each state function using the my and local types. So if $a has a value of 2 and receives an input of 3, the current state is 2, and the program will do a state transition from 2 to 3. After the function returns, the current state will be 3.

Lines 6 through 14 define a subroutine that defines the functionality of a state. State S0 transitions to states S1 on receiving a 1, S2 on receiving a 2, and S3 on receiving a 3. All other input will not cause a state transition. The other states, {S1,S2,S3}, behave in an analogous way.

The stateTable array is used to store pointers to each of the functions of the state machine. The four entries are set in line 49. The initial state is set to 0.

Lines 61 through 84 implement the code for transitioning through the state machine by accepting input from <STDIN> and calling the present state function to handle the input. Line 74 is where you get the pointer to the function handling all input for each state in the state machine, and line 78 is where the state-handling function is called. The next state value returned by the function is set to the present state ($this) in line 83.

Passing More Than One Array into a Subroutine

Having arrays is great for collecting relevant information. Now you'll see how to work with multiple arrays via subroutines. Passing one or more arrays into Perl subroutines is done by reference. However, you have to keep in mind a few subtle things about using the @_ symbol when processing these arrays in the subroutine.

The @_ symbol is an array of all the items in a subroutine. So, if you have a call to a subroutine as follows:

$a = 2;
@b = ("x","y","z");
@c = ("cat","mouse","chase");
&simpleSub($a,@b,@c);

the @_ array within the subroutine will be (2, "x", "y", "z", "cat", "mouse", "chase"). That is, the contents of all the elements will be glued together to form one long array.

Obviously, this ability to glue together arrays will be a problem to deal with if you want to do operations on two distinct arrays sequentially. For example, if you have a list of names and a list of phone numbers, you would want to take the first item from the names array and the first item from the number array and print an item. Then take the next name and the next number and print a combination, and so on. If you pass in the contents of the arrays to a function that simply uses @_, the subroutine will see one long array, the first half of which will be a list of strings (names) and the second half of which will be a list of numbers.

The subroutine would have to split the @_ in half into two distinct arrays before it can start processing. The problem gets more complicated if you were to pass three or four arrays such as those containing items like address and ZIP code. Now the subroutine will have to manipulate @_ even more to get the required number of arrays.

The simplest way to handle the passing of multiple arrays into a subroutine is to use references to arrays in the argument list to the subroutine. That is, you pass in a reference to each array that the subroutine will be using. The references will be ordered in the @_ array within the subroutine. The code in the subroutine can dereference each item in the @_ to the type of array being referenced. This procedure is known as passing by reference. The value of what is being referenced can be changed by the subroutine. When an explicit value is sent to a subroutine, (that is, you are passing by value), only the copy of what is sent on the stack is changed, not the actual value. In Perl, values are passed by reference unless you send in a constant number. For example, from the following code:

sub doit {
$_[0] *= 3.141;
}
$\="\n";
$x = 3;
print $x;
doit ($x);
print $x;
# The following line will cause an error since you will attempt to
# modify a read-only value:
# doit(3);

you will see the following values being printed:

3
9.423

The second number is the new value of $x after the call to the doit subroutine. Calling the doit subroutine with a constant value such as shown in the commented lines above will result in an exception with an error message indicating that your program attempted to modify a read-only value. The preceeding test confirms that Perl indeed passes values of variables by reference and not by value.

Note
The value of the $\ system variable is the output separator. In the preceding example, it is set to a newline. By setting the value of $\ to \n, the print statements did not have to prepend a \n to any string being printed. It's a matter of style, of course, and you do not have to use the $\ variable if you do not want to. The default value of this $\ variable is null. The $\ is useful in instances when you are writing special text records with the print statement that have to have a special record separator such as END\n and RECORDEND\n\n.

Listing 3.11 provides a sample subroutine that expects a list of names and a list of phone numbers.


Listing 3.11. Passing multiple arrays into a subroutine.
 1 #!/usr/bin/perl
 2
 3 @names = (mickey, goofy, daffy );
 4 @phones = (5551234, 5554321, 666 );
 5 $i = 0;
 6 sub listem {
 7             my (@a,@b) = @_;
 8             foreach (@a) {
 9             print "a[$i] = ". $a[$i] . " " . "\tb[$i] = " . $b[$i] ."\n";
10            $i++;
11            }
12             }
13
14 &listem(@names, @phones);

Here's the output from this program:

a[0] = mickey           b[0] =
a[1] = goofy  b[1] =
a[2] = daffy   b[2] =
a[3] = 5551234         b[3] =
a[4] = 5554321         b[4] =
a[5] = 666      b[5] =

The @b array is empty, and @a is just like the array @b. This is because the @_ array is a solitary array of all parameters into a subroutine. If you pass in 50 arrays, @_ is still going to be one array of all the elements of the 50 arrays concatenated together.

In the subroutine in this example, the assignment

my (@a, @b) = @_

gets loosely interpreted by your Perl interpreter as "let's see, @a is an array, so let's assign one array from @_ to @a and then assign everything else to @b." Never mind the fact that @_ is itself an array and will therefore get assigned to @a, leaving nothing to assign to @b.

In order to get around this @_-interpretation feature and to be able to pass arrays into subroutines, you would have to pass arrays in by reference. This is done by modifying the script to look like the one shown in Listing 3.12.


Listing 3.12. Passing multiple arrays by reference.
 1 #!/usr/bin/perl
 2
 3 @names = (mickey, goofy, daffy );
 4 @phones = (5551234, 5554321, 666 );
 5 $i = 0;
 6 sub listem {
 7             my ($a,$b) = @_;
 8             foreach (@$a) {
 9                print "a[$i] = " . @$a[$i] . " " . "\tb[$i] = " . @$b[$i] ."\n";
10                         $i++;
11                         }
12             }
13
14 &listem(\@names, \@phones);

Here are the major changes made to this script:

The output from this listing is what we expected:

a[0] = mickey b[0] = 5551234
a[1] = goofy  b[1] = 5554321
a[2] = daffy  b[2] = 666

Pass by Value or by Reference?

Scalar variables, when used in a subroutine argument list, are always passed by reference. You do not have a choice here. You can modify the values of these variables if you really want to. To access these variables, you can use the @_ array and index each individual element in it, using $_[$index], where $index as an integer goes from 0 on up.

Arrays and hashes are different beasts altogether. You can either pass them as references once, or you can pass references to each element in the array. For long arrays, the choice should be fairly obvious, pass the reference to the array only. In either case, you can use the reference(s) to modify what you want in the original array.

Also, the @_ mechanism concatenates all the input arrays to a subroutine into one long array. Sure, this feature is nice if you do want to process the incoming arrays as one long array. Normally, you want to keep the arrays separate when processing them in a subroutine, and passing by reference is the best way that you can do that.

References to File Handles

There are times when you have to write the same output to different output files. For instance, an application programmer might want output to go to a screen in one instance, the printer in another, and a file in yet another, or perhaps even all three at the same time. Rather than make separate statements per handle, it would be nice to write something like this:

spitOut(\*STDIN);
spitOut(\*LPHANDLE);
spitOut(\*LOGHANDLE);

Note how the file handle reference is sent with the \*FILEHANDLE syntax. This is because you're referring to the symbol table in the current package. In the subroutine handling the output to the file handle, you have code that looks something like this:

sub spitOut {
    my $fh = shift;
    print $fh "Gee Wilbur, I like this lettuce\n";
}

What Does the *variable Operator Do?

In UNIX (and other operating systems, too) the asterisk is a sort of wildcard operator. In Perl you can refer to other variables, arrays, subroutines, and so on by using the asterisk operator like this:

*iceCream;

The asterisk used this way is also known as a typeglob. The asterisk on the front can be thought of as a wildcard match for all the mangled names used internally by Perl. When evaluated, a typeglob of *name produces a scalar value that represents the first object found with that name.

A typeglob can be used the same way a reference can be used because the dereference syntax always indicates the kind of reference desired. Therefore, ${*iceCream} and ${\$iceCream} both mean the same scalar variable. Basically, *iceCream refers to the entry in the internal _main associative array of all symbol names for the _main package. Thus, *kamran really translates to $_main{'kamran'} if you are in the _main package context.

A package context implies the use of the associative array of symbol names, called a symbol table, by Perl for resolving variable names in a program. We will cover symbols and symbol tables in Chapter 4. What is confusing is that the terms module and package are used interchangeably in all Perl documentation and these two terms mean the very same thing. Basically, your Perl program runs in the _main package (think "module") and uses other modules to switch symbol tables. Code running in the context of a module has its own symbol table that is different from the symbol table in the main module.

Using Symbolic References

The use of brackets around symbolic references makes it easier to construct strings:

$road = ($w) ? "free":"high";
print "${road}way";

This line will print highway or freeway, depending on the value of $w. This type of syntax will be very familiar to folks writing makefiles or shell scripts. In fact, you can use this ${variable} construct outside of double quotes, like the examples shown here:

print ${road};
print ${road} . "way";
print ${ road } . "way";
$if = "road";
print "\n ${if} way \n";

Note that you can use reserved words in the ${ } brackets, too. However, using reserved words for anything other than their purpose is playing with fire. Be imaginative and make up your own variables.

One last point. Symbolic references cannot be used on variables declared with the my construct because these variables are not kept in any symbol table. Variables declared with the my construct are valid only for the block in which they're created. Variables declared with the local word are visible to all ensuing lower code blocks because they are in a symbol table.

Declaring with Curly Braces

The previous section brings up an interesting point about curly braces for use other than as hashes. In Perl, curly braces are normally reserved for delimiting blocks of code. Let's say you are returning the passed list by sorting it in reverse order. The passed list is in @_ of the called subroutine. Thus, these two statements are equivalent:

sub backward {
            { reverse sort @_ ; }
            };

sub backward {
            reverse sort @_ ;
            };

Curly braces, when preceded with the @ operator, allow you to set up small blocks of evaluated code. The code in Listing 3.13 evaluates an array.


Listing 3.13. Evaluating references to arrays.

1 #!/usr/bin/perl
2 sub average {
3            ($a,$b,$c) = @_;
4                        $x = $a + $b + $c;
5                        $x2 = $a*$a + $b*$b + $c*$c;
6          return ($x/3, $x2/3 ); }

7 $x = 1;
8 $y = 34;
9 $x = 47;

10 print "The midpt is @{[&average($x,$y,$z)]} \n";

You should see the printout of 27 and 1121.6666. In line 10, when @{} is seen in the double-quoted string, the contents of @{} are evaluated as a block of code. The block creates a reference to an anonymous array containing the results of the call to the subroutine average($x,$y,$z). The array is constructed because of the [] brackets around the call. Thus, the [] construct returns a reference to an array, which in turn is converted by @{} into a string and inserted into the double-quoted string.

Multidimensional Associative Arrays

Perl does not directly support multidimensional associative arrays. In most cases, you would not want to use multidimensional arrays, though they are sometimes useful for tracking synonymous variable names.

The syntax for using more than one index into an associative array is not the same as that for multidimensional arrays that use a numeric index. Therefore, you cannot use statements such as this:

$description{'pan'}{'handle'};

as you would with regular arrays. What you can use is the following:

$description{'pan' , 'handle'};

The latter statement lets you index into the %description array using two strings, so you can index the array as

$description{'pan' , 'cake'};
$description{'pan' , 'der'};
$description{'pan' , 'da'};

Your first index here for a row would be pan and each index into the row would be cake, der, da, and handle. It's a bit cumbersome to use, but it will work.

You are not limited to using commas to separate indexes into an associative array. By using the $; system variable you can use more than one index into an associative array and use a separator other than just a comma. The $; system variable is a subscript separator for all items used to index an associative array. The default value of $; is the Ctrl-\ character, but you can set it to anything you want.

When more than one index is used to reference an associative array, all items are concatenated together with the use of the $; variable. That is, the statement

$description{"texas", "pan","handle"} ;

is interpreted as

$description{"texas" . $; . "pan" . $; . "handle"} ;

By setting the value of $; to "::", you can use the index specifier. The following lines of code will illustrate how to do this:

$; = "::";
$description{"pan", "cake"} = "edible";
$description{"pan::da"} = "cute";

The "::" is now interchangeable with the comma separator. There is one catch to using the "::" as a separator: the "::" is also used as an object::member syntax as you will see in Chapter 5, "Object-Oriented Programming in Perl." So a statement like this with the $; set to "::"

$description{"pan::handle", "cake"}

will get translated to

$description{"pan::handle::cake"}

which is something you probably do not want! We will cover this syntax and how to work with objects in Chapter 5, so be patient.

Strict References

To force only hard references in a program and protect yourself from accidentally creating symbolic references, you can use a module called strict, which forces Perl to do strict type checking. To use this module, place the following statement at the top of your Perl script:

use strict 'refs';

From this point, only hard references are allowed for the rest of the script. You place this statement within curly braces, too, where the type checking would be limited to only within the code block for the curly braces.

To turn off the strict type checking at any time within a code block, use this statement:

no strict 'refs';

For More Information

Besides the obvious documents, such as the Perl man pages, look at the Perl source code. The t/op directory in the Perl source tree has some regression test routines that should definitely get you thinking. There are lots of documents and references at the Web sites www.perl.com/index.html, mox.perl.com/index.html, and www.metronet.com/perlinfo/doc/manual/html/perl.html.

Summary

There are two types of references you can deal with in Perl 5: hard or symbolic. Hard links work like the links in UNIX file systems. You can have more than one hard link to the same item. Perl keeps a reference count for you. This reference count is incremented or decremented as references to the item are created or destroyed. When the count goes to zero, the link and the object it is pointing to are both destroyed. Symbolic links are created via the ${} construct and are useful in providing multiple stages of references to objects.

You can have references to scalars, arrays, hashes, subroutines, and even other references. References themselves are scalars and have to be dereferenced to the context before being used. Use @$pointer for an array, %$pointer for a hash, &$pointer for a subroutine, and so on. Multidimensional arrays are possible by using references in arrays and hashes. You can also have references to other elements holding even more references to create very complicated structures. There is a scalar() function, a scalar variable holds one value, and a hard reference is a scalar unless it's dereferenced to behave like a non-scalar. Got that?

Parameters are passed into a subroutine through references. The @_ array is really one long array of all the passed parameters concatenated in one long array. To send separate arrays, use the references to the individual items.

The next chapter covers Perl objects and references to objects. I deliberately did not cover Perl objects in this chapter because they require some knowledge of objects, constructors, and packages.