Chapter 16

Searching the Database


CONTENTS

As the previous chapter demonstrates, databases can be easily implemented both on the server and from within the browser. Regardless of where it's created, if the database is substantial, a means of quickly accessing the data is necessary, because a good deal of processor time is eaten up with the searching process.

The Keys, Please

Before diving into database searching, an introduction to some commonly used termi-nology is in order. Consider the following database:

Covington|Cindy|AR
Walter|Scott|MN
Willson|Susan|TX

In it, each record has three fields: first name, last name, and state. If you were looking for a particular record, you could do a search based on any one or more of these fields.

Within database jargon, the field you are basing your search on is a search key. Keys aren't limited to just a single field-you can create a complex key that consists of more than one field, such as state and last name. This is especially valuable when several database records may have the same information in one or more fields. Keys do not need to search the entire field. This is useful if you want to find all entries in a database that start with "W."

Alternatively, you can create a totally new field for tracking purposes, where you can store a custom key.

The Brute Force Technique

The most straightforward method for searching through a database is the brute force method. With it you simply start at the beginning and work your way through the database one record at a time, looking for the desired field. For small files, this isn't too costly from the standpoint of system time used. However, if the size of the database becomes relatively large, it could take a lot of time to locate the desired information. Factor in the probability that more than one user is going to be doing a search at the same time, and your system can easily slow down to a crawl as each process tries to locate information.

TIP
One way to reduce search time is to divide a large database into several smaller ones. This reduces the number of records that need to be searched. However, you only experience an improvement in search time if the database division is done carefully. If a user chooses to search on a criterion that's different from how you've divided your data, you've actually slowed the process down by now having to open multiple files

NOTE
The concept of "large" when talking about database files is a very relative term and can involve several factors: the number of records within the database, the number of fields within a record, and the number of fields you're using as your search key.

To implement a standard search, loop through the database as shown in listing 16.1.


Listing 16.1  Brute Force Search
# keyField is a list containing the record keys
# key is the desired key to match
#
$howMany = @keyField;
for ($index = 0; $index < $howMany; $index++) {
   if ($keyField{$index} =~ /$key/) {
      break;
   }
}

if ($index < $howMany) {
   # found it!
} else {
   # couldn't find a match
}

The for loop steps through the field data, one record at a time, looking for the pattern stored in $key for a match. If a match is found, the loop is broken with $index equal to the index into the database. The if block then checks the value of $index and if it's less than $howMany, a match was successfully found.

This method isn't without its drawbacks, however. If the record you're looking for is located toward the end of the file, you'll have to search through every record in the database until you find it. It could take a long time if it's the very last record in a large database; as the database grows in size, the time necessary to read the entire database could become unacceptable. You have to decide if the amount of time it could take is worth the effort.

You can avoid this problem with a little planning and a different search technique, but before this new trick can work, the database needs to be ordered or sorted into some kind of order.

Sorting Database Information

If you've done any programming, you're no doubt familiar with sorting, or the ordering of data based on a preestablished criterion. Several search mechanisms that are considerably faster than the brute-force technique rely on the data being sorted.

With Perl, sorting is an easy task as there is a sort command built into the language. The syntax of sort is:

@sortedArray = sort sortFunction %unsortedArray;

where:

The only involved part of setting up a sort is the job of writing the sortFunction. The Perl sort command sets the global variables $a and $b to the two array elements being compared. It's then the job of the sortFunction to return one of three possible values:

-1-if $a is less than $b (sort-wise)
0-if $a equals $b (sort-wise)
1-if $a is greater than $b (sort-wise)

as demonstrated in listing 16.2.


Listing 16.2  Simple Sort Structure
sub sortFunction {
   if ($a < $b) {
      return -1;
   } elseif ($a == $b) {
      return 0;
   } elseif ($a > $b) {
      return 1;
   }
}

If the comparison being performed is simple (as shown in listing 16.2), this can be written in Perl as:

$a <=> $b;

which reduces the entire if...elseif...elseif block to one line.

Once the data sorts, a search technique like the binary search is employed.

The Binary Search

The binary search is so named because it does the following:

This process of split-compare-discard continues until the desired record is found. From an order of complexity standpoint, a binary search, at most, has to look at half of the records within a database-even if the desired data is located toward the end of the file.

NOTE
The concept of order of complexity was indirectly introduced earlier in the chapter when talking about the brute-force search. Order of complexity is programming terminology for how many iterations through a dataset a search algorithm must perform (number of records that must be looked at). The goal when designing a search mechanism is to minimize the order of complexity.

Listing 16.3 demonstrates the binary search mechanism.


Listing 16.3  Binary Search
# keyField is a list containing the record keys
# key is the desired key to match
#
$howMany = @keyField;
$first   = 0;
$last    = $howMany - 1;

while(true) {
   $index  = $first + ($last - $first) / 2;

   if($keyField{$index} =~ /$key/) {
      break;
   }

   if($keyField{$index} < $key) {
      # record comes before key
      $first = $index + 1;
   } elseif ($keyField{$index} > $key) {
      # record comes after key
      $last = $index - 1;
   } else {
      # at this point, we've no match
      $index = $howMany;
   }
}

if ($index < $howMany) {
   # found it!
} else {
   # couldn't find a match
}

Indexed Search

If you want to search your database in more than one field, then simply sorting the data won't work. This only works if you sort on the last field first, then work your way backwards. In this case, creating a separate index file makes searching more practical. The index file is a database in itself, except that instead of the entire data record that includes all fields, the index file stores only two things:

  1. The field being indexed.
  2. An index counter that identifies the location of the record within the main database.

For example, given the following database:

Brandt|Carl|MN
Covington|Cindy|AR
Walter|Scott|MN
Willson|Susan|TX

The database itself is sorted alphabetically by last name, but not by state. An index file for state-based indexing would look something like this:

AR|2
MN|1
MN|3
TX|4

where the second field indicates which record has the necessary match.

Searching with an index file is no different from searching the database directly:

  1. Open the index.
  2. Scan the index for a match.
  3. Use the indexField data to find the record (or records) in the main database.

As with searching the main database, you can use accelerated search techniques, like the binary search, to find matching records.

Multiple Keys

The fields used to match records during a database search are called key fields. A key field may be "unique" (only one record has any given key) or it may be "shared" (more than one record has the same key). Binary searches are ideal for unique key databases, but shared keys cause a problem: If the binary search lands in the middle of a range of records that share a common key, which way does it go?

Shared keys can be avoided by creatively designing custom keys instead of using field data directly. For example, you could combine both a state field and a ZIP code field to create a new key.

From Here…

This chapter looked at different methods for accessing the information stored within a database. With these techniques under your belt, you can apply them to several other methods, such as: