A program which understands natural language with special relation to family trees.
The program is restricted to basic English which consists of about 1700 words and a grammar as defined by Ogdon in 1933. The program only accepts full stops and dashes for minor clauses. The program uses list structures to build up information for the parsing of a sentence. The goal is to be able to answer questions about the family relationship of the people mentioned in a passage of text.
The parsing program
The grammars of certain langusges may be described by rules which are called phase structure rules (Chomsky 1957) A simple grammar for parsing a sentence in basic english might be
1. S -> NP + PRED
2. S -> NP + VP + NP
3. S -> NP + V + NP
4. NP -> they
5. NP -> planes
6. NP -> A + N
7. N -> planes
8. N -> man
9. A -> the
10. A -> flying
11. VP -> AUX + V
12. V -> are
13. V -> flying
14. AUX-> are
15. PRED-> VP + NP + AD
16. AD -> swiftly
The given sentence is split into words or group[s of words and each word is matched by a part of speech. The parts of speech are components of the rules and gradually the components of the rules are combined together to parse the sentence.
Example
2. S -> NP + VP + NP
11. VP -> AUX + V
2,11 S -> NP + AUX + V + NP
they are flying planes
An alternative
3. S -> NP + V + NP
6. NP -> A + N
3,6 S -> NP + V + NP
3,6 S -> NP + V + A + N
they are flying planes
Consider the parsing of the sentence
They are flying planes swiftly
Some rules produce some odd sentences but even when these are eliminated the rules for natural language will generate more than one possible parsing. All such parsings must be generated by the program. The correct parsing can only be found from the context. It has not yet been possible to find a complete set of rules to parse all English sentences and over ten thousand have been used. In SADSAM it was assumed that all sentences could be parsed by moving from left to right. Secondly the number of intermediate groupings was restricted as the storage was limited. This cut down on the variety and complexity of the sentences that could be analysed. The sentence parsing program reads the sentence and finds the most likely part of speech for each word. The alternatives are also listed. The process works from right to left grouping in the words into substructures until all the substructures tie into an acceptable structure. If an impasse is reached at any stage a backtrack is necessary at that level to the next possible alternative.
The tree for the sentence
They are flying planes swiftly
Note the more complex parsing for the above sentence and note especially the depth.
The next task is to identify the subject object and verb as shown in the above tree. Once a word is matched the next word is examined and if it fitted into a larger structure it did so and the next word was treated likewise. Considering the tree and sentence shown above, 'the' is an article so a noun is needed, however 'very' is an adverb and it means that instead of a noun an adjective is now needed and it is found 'big'. The two words 'very big' form an adjectival phrase. Now a noun is needed and found when 'man appears and this then forms a noun phrase and is the SUBJECT. A verb 'bit' occurs and the OBJECT is a nounphrase 'the dog' an article followed by a noun.The parsing uses the above rules but also uses a bottom up approach where the parts of speech of the words are linked to appropriate grammatical rules
0. They are flying planes swiftly
1. NP V V N AD
abortive
2. NP V V NP AD
abortive
3. NP V A N AD 6
NP V NP AD 3
S AD
abortive
4. NP V A NP AD
abortive
5. NP AUX A N AD
NP AUX NP AD
abortive
6. NP AUX A NP AD
abortive
7. NP AUX V N AD
NP VP N AD
abortive
8. NP AUX V NP AD
NP VP NP AD
S AD
abortive
NP PRED
S SUCCESS
Semantic Phase
Once the sentence diagram was constructed a list of nouns was built up which included the possessive adjectives such as BILL'S. Subject-Object combinations related by parts of the verb
'to be' were marked equivalent. A search was then made for the 8 key relationships
MOTHER FATHER BROTHER SISTER MARRIED OFFSPRING
BROTHER IN LAW SISTER IN LAW
A list of elementary relations was formed these were word triplets two nouns ansd a relation. They were built up from phrases like
Jane's brother is the father of John.
A family tree was then constructed and the essential building block was a family unit. There wereseveral family units and elements in the respective family units were linked together when it was discovered that Margaret's son was John's father.
FAMILY UNIT 1 FAMILY UNIT 3
-> HUSBAND -> A ----- -> HUSBAND -> E
-> WIFE -> B | --- -> WIFE -> F
-> OFFSPRING ->----------------- | -> OFFSPRING -> ?
-> H'S PARENTS ->? | -> H'S PARENTS -> FU1
-> W'S PARENTS ->? | -> W'S PARENTS -> FU2
FAMILY UNIT 2 FAMILY UNIT 4
-> HUSBAND -> C -> HUSBAND -> G
-> WIFE -> D -> WIFE -> ?
-> OFFSPRING ->---------------- -> OFFSPRING -> ?
-> H'S PARENTS ->? -> H'S PARENTS -> FU2
-> W'S PARENTS ->? -> W'S PARENTS -> ?
Two family units 1 & 2 are created A is married to B C to D and A&B have a son E who marries F the offspring of C&D. This creates a new family unit 3. When G is encountered as a brother of F a new family unit is constructed. The whole process is additive. The order of presentation of information is crucial for storage efficiency eg X has off spring E,F,G,H requires two independent structures. Then B and H are brothers causes a linkage then a condensation. If B and H are brothers was given first the storage allocation would have been easier. The family units are built up from dialogue such as:
Joey was playing with his brother Bobby in their Aunt Jane's garden when their mother arrived.
Is Jane mother's sister or sister in law.
How can the alternatives be stored and allowed for.
SADSAM took natural language and built up fairly sophisticated data structures about families, whilst it was able to store the material it gave good answers but its major limitation was caused by the unwieldy static storage element.
BASEBALL
"did RED SOX ever win 6 games in a row" NOT ALLOWED
"who did the RED SOX lose to on July 5th?" OK
"did every team play at least once in each park in each month?" OK
"where did RED SOX play on July 7th"
PLACE = ?
TEAM = RED SOX
MONTH = JULY
DAY = 7
"what teams won 10 games in July?"
TEAM(winning) =
GAME(number of) =
MONTH = JULY
Data is organised in a hierarchical way.
MONTH = JULY
PLACE = BOSTON
DAY = 7
GAME NO = 96
TEAM = RED SOX , SCORE = 5
TEAM = YANKEES , SCORE = 3
SYNTAX ANALYSIS
[How many games] did [the Yankees] play (in [July])?
"who did the RED SOX lose to on July 5th?"
(to [who] did [the RED SOX] lose ( on [July 5th])?
"in how many places did each team play in July"
requires a count of the places for each team
"where did each team play in July" gives a single path
"on how many days in July did eight teams play"
DAYnumber of = ?
MONTH = JULY
TEAMnumber of = 8
implicit question a count kept of days when team= 8
DAY = EACH DAY = ?
MONTH = JULY MONTH = JULY
TEAM = ? TEAMnumber of = 8NOTES ON BOBROW'S STUDENT
1964 IBM 7094 10[5 ]adds per sec
The general idea was to display intelligence by solving suitable problems. The next task was to decide what type of problems to solve. Bobrow chose problems that did not require too much storage and involved numerical calculations, thus he chose high school algebra problems which consisted of problems such as
The price of a radio is 69.70 dollars. If this price is 15% less than the marked price
what is the marked price.
STUDENT then replies 82 dollars.
If 1 span equals 9in and 1 fathom equals 6 feet how many spans equal 1 fathom?
Global information
1 foot equals 12inches
STUDENT then replies 1 fathom equals 8 spans.
SEMANTIC DISCOURSE
When two people meet and talk or communicate with one another they transmit and receive messages. Each person has a model of the real world and this is used to encode the messages. The receiver needs another model to decode the messages and the models must be compatible. The models need not be the same but they must have common elements. The conversation becomes coherent if a complete understanding is achieved where the receiver can build a unique unambiguous logical model from the message.
Examples
How many chaplains are there in the U S army?
How many in the navy?
Kernel Sentences.
Kernel sentences were proposed. These are simple sentences requiring no transformations and can be directly transmitted into the model. They are in standard form and can easily be understood.
Generation of Kernel Sentences
Any particular idea can be expressed in several ways and thus the sentences may need transformation using linguistic forms.
Linguistic forms
number of---------
father of----------
---------'s father
-----plus--------
Relational linguistic forms also exist
---- equals -----
---- gave ---- to -----
Example
John's father gave 0.3 times the salary of Bill to Jill
TRANSFORMATIONS
There are two types of transformation
Syntactic
Can change the voice from active to passive or vice versa
Can make pronoun substitutions or reference substitutions
e.g. these it latter former he she or it
Definitional
examples
twice half something exceeds an item by how much
THE STUDENT DEDUCTIVE MODEL
Student performs transformations to obtain a set of kernel sentences and then transforms these sentences into expressions which are stored in model. The model stores equality and other relationships and holds the arithmetic operations as functions in LISP notation. The model demonstrates understanding by answering questions based upon these expressions which are based upon the input information. The constructions are functional LISP-based.
PROGRAM STRUCTURE
There are two programs in the project.
STUDENT
Takes the information in the question and tries to find a solution using global information.
REMEMBER
Saves information from one problem to the next and maintains the global information that is used in STUDENT. It has its own input and saves facts such as
distance equals speed times time
one half always means 0.5
feet is the plural of foot
OPERATION OF STUDENT
Once the kernel sentences are formed STUDENT then transforms them into simultaneous equations or linear equations. The program SOLVE is called and if a solution is found the STUDENT program prints the values of the unknown variables in a fixed format.
The format used is
variable IS value
If a solution cannot be found then heuristics are usedtwo similar phrases are treated as the same
global information is utilised if still in doubt help is sought from the user.
CATEGORIES OF WORDS IN A TRANSFORMATION
1 VARIABLES
variables are strings and the problem is to to link together two similar strings
2 SUBSTITUTORS
mandatory substitutors such as twice or half
optional ones such as perimeter of the rectangle becoming twice the sum of the length and breadth of the rectangle.
3 OPERATORS
complex combinations of the basic functions eg
plus n percent less than difference between
square of
some follow others precede
some are mixed in the operand
(the problem to be solved is)
(if the number of customers Tom gets is twice the square of 20% the number jof ads he runs and the number of ads he runs is 45 what is the number of customers Toms getsQ)
Mandatory substitutions made
(with mandatory substitutions the problem is)
(if the number of customers Tom gets is 2 times the square of 20 percent the number of ads he runs and the number of ads he runs is 45 what is the
number of customers Toms getsQ)
(with words tagged by function the problem is)
(if the number (of/OP) customersTom (gets/VERB) is 2 (times/OP1) the
(square/OP1) of 20 (percent/OP2) the number (of/OP1) ads (he/PRO) runs
and the number (of/OP) ads he runs is 45 (what/QWORD) is the number
(of/OP) customers Toms (gets/WORD) (QMARK/DLM))
(the simple sentences are)
(the number (of/OP) customers Tom (gets/VERB) is 2 (times/OP1) the
(square/OP1) of 20 (percent/OP2) the number (of/OP1) ads (he/PRO) runs
(period/DLM))
(the number (of/OP) ads he runs is 45 (period/DLM))
(what/QWORD) is the number (of/OP) customer Toms (gets/WORD)
(QMARK/DLM))
From kernel sentences to equations
P1 is P2 (equal P1 P2)
P1 (the number (of/OP) customer Tom (gets/VERB)
P2 2 (times/OP1) the (square/OP1) of 20 (percent/OP2) the number
(of/OP1) ads (he/PRO) runs (period/DLM))
(times 2 (expt (times 0.2 (number of ads) (he/pro) runs))2 ))
second sentence becomes
(equal (number of ads (he/PRO) runs) 45)
the third sentence
(equal X0001 (number of customers tom (gets/VERB)))
(The equations to be solved are)
(equal X0001 (number of customers tom (gets/VERBS)))
(equal (number of ads (he/PRO) runs) 45)
(equal (the number (of/OP) customers Tom (gets/VERB)
(times 2 (expt (times 0.2 (number of ads) (he/pro) runs))2 ))
(the number of customers tom gets is 162)
FURTHER PROBLEMS TACKLED
(Tom has twice as many fish as Mary has guppies. If Mary has 3 guppies what is the number of fish Tom has Q)
(the number of fish Tom has is 6)
Mary has 3 guppies becomes
The number of guppies Mary has is 3
( A number is multiplied by 6. This product is increased by 44. The result is 68. Find the number Q)
(The equations to be solved are)
(Equal X001 (number))
(Equal ( plus ( times ( number ) 6 ) 44 ) 68)
(the number is 4)
(if 1 span equals 9 inches and 1 fathom equals 6 feet how many spans equal 1 fathom)
(equal X001 ( times 1 (fathoms )))
(equal (times 1 (fathoms )) (times 6 (feet)))
(equal (times 1 (spans )) (times 9 (inches)))
We also needed
(equal (times 1 (yards )) (times 3 (feet)))
(equal (times 1 (feet )) (times 12 (inches)))
(1 fathom is 8 spans)
(The number of soldiers the Russians have is one half of the number of guns they have. The number of guns they have is 7000. What is the number of soldiers they have Q)
But assuming that
(( Number of soldiers (they/PRO) (have/VERB) is equal to (number of soldiers russians (have/VERB)))
(the number of soldiers they have is 3500)
(the problem to be solved is)
(the petrol consumption of my car is 15mpg. The distance between Boston and New York is 250 miles. What is the number of gallons of petrol used in a trip between New York and Boston Q)
Insufficient information as it stands
Assumptions
(distance) is equal to ( distance between New York and Boston)
(petrol consumption) is equal to (petrol consumption of my car)
(number of gallons used) is equal to
(number of gallons used on a trip between New York and Boston)
The number of gallons used on the trip between New York and Boston is 16 2/3
The sum of the perimeter of a triangle and a rectangle is 24.
The perimeter of a triangle is half that of the rectangle what is the perimeter of the rectangle.
The perimeter of a rectangle is 24. The length is twice the breadth what are the sides of
the rectangle.
The length of a rectangle is 8 inches more than the width, one half of the perimeter is 18 inches find the length and width of the rectangle.
CONCLUSIONS
Input is restricted however it solved the problems submitted as fast as humans.
It was based upon equality and five arithmetic operations
Tom has 2 apples, 3 bananas and 4 pears caused problemds and became invalid
The number of times I went to the cinema and 2 of the boys who passed
caused difficulty to the program
System filled IBM 7094 32k words
It demonstrated successfully that an understanding machine could be built.
The entire subset of English recognised by STUDENT is derived from the following set of Basic patterns.
(WHAT ARE * AND * ) (FIND * AND * )
(WHAT IS * ) (* IS MULTIPLIED BY * )
(HOW MANY *1 IS * ) (* IS DIVIDED BY *)
(HOW MANY * DO * HAVE) (* IS *)
(HOW MANY * DOES * HAVE) (* ( *1/VERB ) *1 *)
(FIND *)
(* ( *1/VERB ) * AS MANY * AS * ( *1/VERB ) *)
* means a string of any length
*1 means one word *1/VERB must be a verb in the dictionary.
Global information
(FEET IS THE PLURAL OF FOOT)
(ONE HALF ALWAYS MEANS 0.5)
(SUCCESSFUL CANDIDATES SOMETIMES MEANS STUDENTS
WHO PASSED THE ADMISSION TEST)
(DISTANCE EQUALS SPEED TIMES TIME)
(ONE FOOT EQUALS 12 INCHES)