SADSAM

A program which understands natural language with special relation to family trees.

The program is restricted to basic English which consists of about 1700 words and a grammar as defined by Ogdon in 1933. The program only accepts full stops and dashes for minor clauses. The program uses list structures to build up information for the parsing of a sentence. The goal is to be able to answer questions about the family relationship of the people mentioned in a passage of text.

The parsing program

The grammars of certain langusges may be described by rules which are called phase structure rules (Chomsky 1957) A simple grammar for parsing a sentence in basic english might be

1. S -> NP + PRED

2. S -> NP + VP + NP

3. S -> NP + V + NP

4. NP -> they

5. NP -> planes

6. NP -> A + N

7. N -> planes

8. N -> man

9. A -> the

10. A -> flying

11. VP -> AUX + V

12. V -> are

13. V -> flying

14. AUX-> are

15. PRED-> VP + NP + AD

16. AD -> swiftly

The given sentence is split into words or group[s of words and each word is matched by a part of speech. The parts of speech are components of the rules and gradually the components of the rules are combined together to parse the sentence.

Example

2. S -> NP + VP + NP

11. VP -> AUX + V

2,11 S -> NP + AUX + V + NP

they are flying planes

An alternative

3. S -> NP + V + NP

6. NP -> A + N

3,6 S -> NP + V + NP

3,6 S -> NP + V + A + N

they are flying planes

Consider the parsing of the sentence

They are flying planes swiftly

Some rules produce some odd sentences but even when these are eliminated the rules for natural language will generate more than one possible parsing. All such parsings must be generated by the program. The correct parsing can only be found from the context. It has not yet been possible to find a complete set of rules to parse all English sentences and over ten thousand have been used. In SADSAM it was assumed that all sentences could be parsed by moving from left to right. Secondly the number of intermediate groupings was restricted as the storage was limited. This cut down on the variety and complexity of the sentences that could be analysed. The sentence parsing program reads the sentence and finds the most likely part of speech for each word. The alternatives are also listed. The process works from right to left grouping in the words into substructures until all the substructures tie into an acceptable structure. If an impasse is reached at any stage a backtrack is necessary at that level to the next possible alternative.

The tree for the sentence

They are flying planes swiftly

Note the more complex parsing for the above sentence and note especially the depth.

The next task is to identify the subject object and verb as shown in the above tree. Once a word is matched the next word is examined and if it fitted into a larger structure it did so and the next word was treated likewise. Considering the tree and sentence shown above, 'the' is an article so a noun is needed, however 'very' is an adverb and it means that instead of a noun an adjective is now needed and it is found 'big'. The two words 'very big' form an adjectival phrase. Now a noun is needed and found when 'man appears and this then forms a noun phrase and is the SUBJECT. A verb 'bit' occurs and the OBJECT is a nounphrase 'the dog' an article followed by a noun.The parsing uses the above rules but also uses a bottom up approach where the parts of speech of the words are linked to appropriate grammatical rules

0. They are flying planes swiftly

1. NP V V N AD

abortive

2. NP V V NP AD

abortive

3. NP V A N AD 6

NP V NP AD 3

S AD

abortive

4. NP V A NP AD

abortive

5. NP AUX A N AD

NP AUX NP AD

abortive

6. NP AUX A NP AD

abortive

7. NP AUX V N AD

NP VP N AD

abortive

8. NP AUX V NP AD

NP VP NP AD

S AD

abortive

NP PRED

S SUCCESS

Semantic Phase

Once the sentence diagram was constructed a list of nouns was built up which included the possessive adjectives such as BILL'S. Subject-Object combinations related by parts of the verb

'to be' were marked equivalent. A search was then made for the 8 key relationships

MOTHER FATHER BROTHER SISTER MARRIED OFFSPRING

BROTHER IN LAW SISTER IN LAW

A list of elementary relations was formed these were word triplets two nouns ansd a relation. They were built up from phrases like

Jane's brother is the father of John.

A family tree was then constructed and the essential building block was a family unit. There wereseveral family units and elements in the respective family units were linked together when it was discovered that Margaret's son was John's father.

FAMILY UNIT 1 FAMILY UNIT 3

-> HUSBAND -> A ----- -> HUSBAND -> E

-> WIFE -> B | --- -> WIFE -> F

-> OFFSPRING ->----------------- | -> OFFSPRING -> ?

-> H'S PARENTS ->? | -> H'S PARENTS -> FU1

-> W'S PARENTS ->? | -> W'S PARENTS -> FU2

FAMILY UNIT 2 FAMILY UNIT 4

-> HUSBAND -> C -> HUSBAND -> G

-> WIFE -> D -> WIFE -> ?

-> OFFSPRING ->---------------- -> OFFSPRING -> ?

-> H'S PARENTS ->? -> H'S PARENTS -> FU2

-> W'S PARENTS ->? -> W'S PARENTS -> ?

Two family units 1 & 2 are created A is married to B C to D and A&B have a son E who marries F the offspring of C&D. This creates a new family unit 3. When G is encountered as a brother of F a new family unit is constructed. The whole process is additive. The order of presentation of information is crucial for storage efficiency eg X has off spring E,F,G,H requires two independent structures. Then B and H are brothers causes a linkage then a condensation. If B and H are brothers was given first the storage allocation would have been easier. The family units are built up from dialogue such as:

Joey was playing with his brother Bobby in their Aunt Jane's garden when their mother arrived.

Is Jane mother's sister or sister in law.

How can the alternatives be stored and allowed for.

SADSAM took natural language and built up fairly sophisticated data structures about families, whilst it was able to store the material it gave good answers but its major limitation was caused by the unwieldy static storage element.

BASEBALL

"did RED SOX ever win 6 games in a row" NOT ALLOWED

"who did the RED SOX lose to on July 5th?" OK

"did every team play at least once in each park in each month?" OK

"where did RED SOX play on July 7th"

PLACE = ?

TEAM = RED SOX

MONTH = JULY

DAY = 7

"what teams won 10 games in July?"

TEAM(winning) =

GAME(number of) =

MONTH = JULY

Data is organised in a hierarchical way.

MONTH = JULY

PLACE = BOSTON

DAY = 7

GAME NO = 96

TEAM = RED SOX , SCORE = 5

TEAM = YANKEES , SCORE = 3

SYNTAX ANALYSIS

[How many games] did [the Yankees] play (in [July])?

"who did the RED SOX lose to on July 5th?"

(to [who] did [the RED SOX] lose ( on [July 5th])?

"in how many places did each team play in July"

requires a count of the places for each team

"where did each team play in July" gives a single path

"on how many days in July did eight teams play"

DAYnumber of = ?

MONTH = JULY

TEAMnumber of = 8

implicit question a count kept of days when team= 8

DAY = EACH DAY = ?

MONTH = JULY MONTH = JULY

TEAM = ? TEAMnumber of = 8NOTES ON BOBROW'S STUDENT

1964 IBM 7094 10[5 ]adds per sec

The general idea was to display intelligence by solving suitable problems. The next task was to decide what type of problems to solve. Bobrow chose problems that did not require too much storage and involved numerical calculations, thus he chose high school algebra problems which consisted of problems such as

The price of a radio is 69.70 dollars. If this price is 15% less than the marked price

what is the marked price.

STUDENT then replies 82 dollars.

If 1 span equals 9in and 1 fathom equals 6 feet how many spans equal 1 fathom?

Global information

1 foot equals 12inches

STUDENT then replies 1 fathom equals 8 spans.

SEMANTIC DISCOURSE

When two people meet and talk or communicate with one another they transmit and receive messages. Each person has a model of the real world and this is used to encode the messages. The receiver needs another model to decode the messages and the models must be compatible. The models need not be the same but they must have common elements. The conversation becomes coherent if a complete understanding is achieved where the receiver can build a unique unambiguous logical model from the message.

Examples

How many chaplains are there in the U S army?

How many in the navy?

Kernel Sentences.

Kernel sentences were proposed. These are simple sentences requiring no transformations and can be directly transmitted into the model. They are in standard form and can easily be understood.

Generation of Kernel Sentences

Any particular idea can be expressed in several ways and thus the sentences may need transformation using linguistic forms.

Linguistic forms

number of---------

father of----------

---------'s father

-----plus--------

Relational linguistic forms also exist

---- equals -----

---- gave ---- to -----

Example

John's father gave 0.3 times the salary of Bill to Jill

TRANSFORMATIONS

There are two types of transformation

Syntactic

Can change the voice from active to passive or vice versa

Can make pronoun substitutions or reference substitutions

e.g. these it latter former he she or it

Definitional

examples

twice half something exceeds an item by how much

THE STUDENT DEDUCTIVE MODEL

Student performs transformations to obtain a set of kernel sentences and then transforms these sentences into expressions which are stored in model. The model stores equality and other relationships and holds the arithmetic operations as functions in LISP notation. The model demonstrates understanding by answering questions based upon these expressions which are based upon the input information. The constructions are functional LISP-based.

PROGRAM STRUCTURE

There are two programs in the project.

STUDENT

Takes the information in the question and tries to find a solution using global information.

REMEMBER

Saves information from one problem to the next and maintains the global information that is used in STUDENT. It has its own input and saves facts such as

distance equals speed times time

one half always means 0.5

feet is the plural of foot

OPERATION OF STUDENT

Once the kernel sentences are formed STUDENT then transforms them into simultaneous equations or linear equations. The program SOLVE is called and if a solution is found the STUDENT program prints the values of the unknown variables in a fixed format.

The format used is

variable IS value

If a solution cannot be found then heuristics are usedtwo similar phrases are treated as the same

global information is utilised if still in doubt help is sought from the user.

CATEGORIES OF WORDS IN A TRANSFORMATION

1 VARIABLES

variables are strings and the problem is to to link together two similar strings

2 SUBSTITUTORS

mandatory substitutors such as twice or half

optional ones such as perimeter of the rectangle becoming twice the sum of the length and breadth of the rectangle.

3 OPERATORS

complex combinations of the basic functions eg

plus n percent less than difference between

square of

some follow others precede

some are mixed in the operand

(the problem to be solved is)

(if the number of customers Tom gets is twice the square of 20% the number jof ads he runs and the number of ads he runs is 45 what is the number of customers Toms getsQ)

Mandatory substitutions made

(with mandatory substitutions the problem is)

(if the number of customers Tom gets is 2 times the square of 20 percent the number of ads he runs and the number of ads he runs is 45 what is the

number of customers Toms getsQ)

(with words tagged by function the problem is)

(if the number (of/OP) customersTom (gets/VERB) is 2 (times/OP1) the

(square/OP1) of 20 (percent/OP2) the number (of/OP1) ads (he/PRO) runs

and the number (of/OP) ads he runs is 45 (what/QWORD) is the number

(of/OP) customers Toms (gets/WORD) (QMARK/DLM))

(the simple sentences are)

(the number (of/OP) customers Tom (gets/VERB) is 2 (times/OP1) the

(square/OP1) of 20 (percent/OP2) the number (of/OP1) ads (he/PRO) runs

(period/DLM))

(the number (of/OP) ads he runs is 45 (period/DLM))

(what/QWORD) is the number (of/OP) customer Toms (gets/WORD)

(QMARK/DLM))

From kernel sentences to equations

P1 is P2 (equal P1 P2)

P1 (the number (of/OP) customer Tom (gets/VERB)

P2 2 (times/OP1) the (square/OP1) of 20 (percent/OP2) the number

(of/OP1) ads (he/PRO) runs (period/DLM))

(times 2 (expt (times 0.2 (number of ads) (he/pro) runs))2 ))

second sentence becomes

(equal (number of ads (he/PRO) runs) 45)

the third sentence

(equal X0001 (number of customers tom (gets/VERB)))

(The equations to be solved are)

(equal X0001 (number of customers tom (gets/VERBS)))

(equal (number of ads (he/PRO) runs) 45)

(equal (the number (of/OP) customers Tom (gets/VERB)

(times 2 (expt (times 0.2 (number of ads) (he/pro) runs))2 ))

(the number of customers tom gets is 162)

FURTHER PROBLEMS TACKLED

(Tom has twice as many fish as Mary has guppies. If Mary has 3 guppies what is the number of fish Tom has Q)

(the number of fish Tom has is 6)

Mary has 3 guppies becomes

The number of guppies Mary has is 3

( A number is multiplied by 6. This product is increased by 44. The result is 68. Find the number Q)

(The equations to be solved are)

(Equal X001 (number))

(Equal ( plus ( times ( number ) 6 ) 44 ) 68)

(the number is 4)

(if 1 span equals 9 inches and 1 fathom equals 6 feet how many spans equal 1 fathom)

(equal X001 ( times 1 (fathoms )))

(equal (times 1 (fathoms )) (times 6 (feet)))

(equal (times 1 (spans )) (times 9 (inches)))

We also needed

(equal (times 1 (yards )) (times 3 (feet)))

(equal (times 1 (feet )) (times 12 (inches)))

(1 fathom is 8 spans)

(The number of soldiers the Russians have is one half of the number of guns they have. The number of guns they have is 7000. What is the number of soldiers they have Q)

But assuming that

(( Number of soldiers (they/PRO) (have/VERB) is equal to (number of soldiers russians (have/VERB)))

(the number of soldiers they have is 3500)

(the problem to be solved is)

(the petrol consumption of my car is 15mpg. The distance between Boston and New York is 250 miles. What is the number of gallons of petrol used in a trip between New York and Boston Q)

Insufficient information as it stands

Assumptions

(distance) is equal to ( distance between New York and Boston)

(petrol consumption) is equal to (petrol consumption of my car)

(number of gallons used) is equal to

(number of gallons used on a trip between New York and Boston)

The number of gallons used on the trip between New York and Boston is 16 2/3

The sum of the perimeter of a triangle and a rectangle is 24.

The perimeter of a triangle is half that of the rectangle what is the perimeter of the rectangle.

The perimeter of a rectangle is 24. The length is twice the breadth what are the sides of

the rectangle.

The length of a rectangle is 8 inches more than the width, one half of the perimeter is 18 inches find the length and width of the rectangle.

CONCLUSIONS

Input is restricted however it solved the problems submitted as fast as humans.

It was based upon equality and five arithmetic operations

Tom has 2 apples, 3 bananas and 4 pears caused problemds and became invalid

The number of times I went to the cinema and 2 of the boys who passed

caused difficulty to the program

System filled IBM 7094 32k words

It demonstrated successfully that an understanding machine could be built.

The entire subset of English recognised by STUDENT is derived from the following set of Basic patterns.

(WHAT ARE * AND * ) (FIND * AND * )

(WHAT IS * ) (* IS MULTIPLIED BY * )

(HOW MANY *1 IS * ) (* IS DIVIDED BY *)

(HOW MANY * DO * HAVE) (* IS *)

(HOW MANY * DOES * HAVE) (* ( *1/VERB ) *1 *)

(FIND *)

(* ( *1/VERB ) * AS MANY * AS * ( *1/VERB ) *)

* means a string of any length

*1 means one word *1/VERB must be a verb in the dictionary.

Global information

(FEET IS THE PLURAL OF FOOT)

(ONE HALF ALWAYS MEANS 0.5)

(SUCCESSFUL CANDIDATES SOMETIMES MEANS STUDENTS

WHO PASSED THE ADMISSION TEST)

(DISTANCE EQUALS SPEED TIMES TIME)

(ONE FOOT EQUALS 12 INCHES)