CHAPTER XIX
CHAPTER XIX
Artificial Intelligence:
Prospects
"Almost" Situations and Subjunctives
AFTER READING Contrafactus, a friend said to me, "My uncle was
almost President of the U.S.!" "Really?" I said. "Sure," he
replied, "he was skipper of the PT 108." (John F. Kennedy was
skipper of the PT 109.)
That is what Contrafactus is all about. In everyday thought, we
are constantly manufacturing mental variants on situations we face,
ideas we have, or events that happen, and we let some features stay
exactly the same while others "slip". What features do we let slip?
What ones do we not even consider letting slip? What events are
perceived on some deep intuitive level as being close relatives of
ones which really happened? What do we think "almost" happened or
"could have" happened, even though it unambiguously did not? What
alternative versions of events pop without any conscious thought
into our minds when we hear a story? Why do some counterfactuals
strike us as "less counterfactual" than other counterfactuals?
After all, it is obvious that anything that didn't happen didn't
happen. There aren't degrees of "didn't-happen-ness". And the same
goes for "almost" situations. There are times when one plaintively
says, "It almost happened", and other times when one says the same
thing, full of relief. But the "almost" lies in the mind, not in
the external facts.
Driving down a country road, you run into a swarm of bees. You
don't just duly take note of it; the whole situation is immediately
placed in perspective by a swarm of "replays" that crowd into your
mind. Typically, you think, "Sure am lucky my window wasn't
open!"-or worse, the reverse: "Too bad my window wasn't closed!"
"Lucky I wasn't on my bike!" "Too bad I didn't come along five
seconds earlier." Strange but-possible replays: "If that had been a
deer, I could have been killed!" "I bet those bees would have
rather had a collision with a rosebush." Even stranger replays:
"Too bad those bees weren't dollar bills!" "Lucky those bees
weren't made of cement!" "Too bad it wasn't just one bee instead of
a swarm." "Lucky I wasn't the swarm instead of being me." What
slips naturally and what doesn't-and why?
In a recent issue of The New Yorker magazine, the following
excerpt from the "Philadelphia Welcomat" was reprinted:'
If Leonardo da Vinci had been born a female the ceiling of the
Sistine Chapel might never have been painted.1The New The New
Yorker commented:
And if Michelangelo had been Siamese twins, the work would have
been completed in half the time.
The point of The New Yorker's comment is not that such
counterfactuals are false; it is more that anyone who would
entertain such an idea-anyone who would "slip" the sex or number of
a given human being-would have to be a little loony. Ironically,
though, in the same issue, the following sentence, concluding a
book review, was printed without blushing:
I think he [Professor Philipp Frank would have enjoyed both of
these books enormously.2Now poor Professor Frank is dead; and
clearly it is nonsense to suggest that someone could read books
written after his death. So why wasn't this serious sentence also
scoffed at? Somehow, in some difficult-to-pin-down sense, the
parameters slipped in this sentence do not violate our sense of
"possibility" as much as in the earlier examples. Something allows
us to imagine "all other things being equal" better in this one
than in the others. But why? What is it about the way we classify
events and people that makes us know deep down what is "sensible"
to slip, and what is "silly":
Consider how natural it feels to slip from the valueless
declarative "I don't know Russian" to the more charged conditional
"I would like to know Russian" to the emotional subjunctive "I wish
I knew Russian" and finally to the rich counterfactual "If I knew
Russian, I would read Chekhov and Lermontov in the original". How
flat and dead would be a mind that saw nothing in a negation but an
opaque barrier! A live mind can see a window onto a world of
possibilities.
I believe that "almost" situations and unconsciously
manufactured subjunctives represent some of the richest potential
sources of insight into how human beings organize and categorize
their perceptions of the world.
An eloquent co-proponent of this view is the linguist and
translator George
Steiner, who, in his book After Babel, has written:
Hypotheticals, 'imaginaries', conditionals, the syntax of
counter-factuality and contingency may well be the generative
centres of human speech.... [They] do more than occasion
philosophical and grammatical perplexity. No less than future
tenses to which they are, one feels, related, and with which they
ought probably to be classed in the larger set of 'suppositionals'
or `alternates', these `if' propositions are fundamental to the
dynamics of human feeling... .
Ours is the ability, the need, to gainsay or 'un-say' the world,
to image and speak it otherwise.... We need a word which will
designate the power, the compulsion of language to posit
'otherness'. . . . Perhaps 'alternity' will do: to define the
`other than the case', the counter-factual propositions, images,
shapes of will and evasion with which we charge our mental being
and by means of which we build the changing, largely fictive milieu
of our somatic and our social existence... .
Finally, Steiner sings a counterfactual hymn to
counterfactuality:
It is unlikely that man, as we know him, would have survived
without the fictive, counter-factual, anti-determinist means of
language, without the semantic capacity, generated and stored in
the `superfluous, zones of the cortex, to conceive of, to
articulate possibilities beyond the treadmill of organic decay and
death .3The manufacture of "subjunctive worlds" happens so
casually, -so naturally, that we hardly notice what we are doing.
We select from our fantasy a world which is close, in some internal
mental sense, to the real world. We compare what is real with what
we perceive as almost real. In so doing, what we gain is some
intangible kind of perspective on reality. The Sloth is a droll
example of a variation on reality-a thinking being without the
ability to slip into subjunctives (or at least, who claims to be
without the ability-but you may have noticed that what he says is
full of counterfactuals'.). Think how immeasurably poorer our
mental lives would be if we didn't have this creative capacity for
slipping out of the midst of reality into soft "what if'-s! And
from the point of view of studying human thought processes, this
slippage is very interesting, for most of the time it happens
completely without conscious direction, which means that
observation of what kinds of things slip, versus what kinds don't,
affords a good window on the unconscious mind.
One way to gain some perspective on the nature of this mental
metric is to "fight fire with fire". This is done in the Dialogue,
where our "subjunctive ability" is asked to imagine a world in
which the very notion of
subjunctive ability is slipped, compared to what we expect. In
the Dialogue, the first subjunctive instant replay-that where
Palindromi stays in bounds-is quite a normal thing to imagine. In
fact, it was inspired by a completely ordinary, casual remark made
to me by a person sitting next to me at a football game. For some
reason it struck me and I wondered what made it seem so natural to
slip that particular thing, but not, say, the number of the down,
or the present score. From those thoughts, I went on to consider
other, probably less slippable features, such as the weather
(that's in the Dialogue), the kind of game (also in the Dialogue),
and then even loonier variations (also in the Dialogue). I noticed,
though, that what was completely ludicrous to slip in one situation
could be quite slippable in another. For instance, sometimes you
might spontaneously wonder how things would be if the ball had a
different shape (e.g., if you are playing basketball with a
half-inflated ball); other times that would never enter your mind
(e.g., when watching a football game on TV).
Layers of Stability
It seemed to me then, and still does now, that the slippability
of a feature of some event (or circumstance) depends on a set of
nested contexts in which the event (or circumstance) is perceived
to occur. The terms constant, parameter, and variable, borrowed
from mathematics, seem useful here. Often mathematicians,
physicists, and others will carry out a calculation, saying "c is a
constant, p is a parameter, and v is a variable". What they
mean.is that any of them can vary (including the "constant");
however, there is a kind of hierarchy of variability. In the
situation which is being represented b~ the symbols, c establishes
a global condition; p establishes some less global condition which
can vary while c is held fixed; and finally, v can run around while
c and p are held fixed. It makes little sense to think of holding v
fixed while c and p vary, for c and p establish the context in
which v has meaning. For instance, think of a dentist who has a
list of patients, and for each patient, a list of teeth. It makes
perfect sense (and plenty of money) to hold the patient fixed and
vary his teeth-but it makes no sense at all to hold one tooth fixed
and vary the patient. (Although sometimes it makes good sense to
vary the dentist ...)
We build up our mental representation of a situation layer by
layer. The lowest layer establishes the deepest aspect of the
context-sometimes being so low that it cannot vary at all. For
instance, the three-dimensionality of our world is so ingrained
that most of us never would imagine letting it slip mentally. It is
a constant constant. Then there are layers which establish
temporarily, though not permanently, fixed aspects of situations,
which could be called background assumptions-things which, in the
back of your mind, you know can vary, but which most of the time
you unquestioningly accept as unchanging aspects. These could still
be called "constants". For instance, when you go to a football
game, the rules of the game are constants of that sort. Then there
are "parameters": you think of them as more variable, but you
temporarily hold them constant. At a football game, parameters
might include the weather, the opposing team, and so forth. There
could be-and probably are-several layers of parameters. Finally, we
reach the "shakiest" aspects of your mental representation of the
situation-the variables. These are things such as Palindromi's
stepping out of bounds, which are mentally "loose" and which you
don't mind letting slip away from their real values, for a short
moment.
Frames and Nested Contexts
The word frame is in vogue in Al currently, and it could be
defined as a computational instantiation of a context. The term is
due to Marvin Minsky, as are many ideas about frames, though the
general concept has been floating around for a good number of
years. In frame language, one could say that mental representations
of situations involve frames nested within each other. Each of the
various ingredients of a situation has its own frame. It is
interesting to verbalize explicitly one of my mental images
concerning nested frames. Imagine a large collection of chests of
drawers. When you choose a chest, you have a frame, and the drawer
holes are places where "subframes" can be attached. But subframes
are themselves chests of drawers. How can you stick a whole chest
of drawers into the slot for a single drawer in another chest of
drawers? Easy: you shrink and distort the second chest, since,
after all, this is all mental, not physical. Now in the outer
frame, there may be several different drawer slots that need to
be
filled; then you may need to fill slots in some of the inner
chests of drawers (or subframes). This can go on, recursively.
The vivid surrealistic image of squishing and bending a chest of
drawers so that it can fit into a slot of arbitrary shape is
probably quite important, because it hints that your concepts are
squished and bent by the contexts you force them into. Thus, what
does your concept of "person" ', become when the people you are
thinking about are football players? It certainly is a distorted
concept, one which is forced on you by the overall context. You
have stuck the "person" frame into a slot in the "football game"
frame. The theory of representing knowledge in frames relies on the
idea that the world consists of quasi-closed subsystems, each of
which can serve as a context for others without being too
disrupted, or creating too much disruption, in the process.
One of the main ideas about frames is that each frame comes with
its own set of expectations. The corresponding image is that each
chest. of drawers comes with a built-in, but loosely bound, drawer
in each of its
drawer slots, called a default. If I tell you, "Picture a river
bank", you will invoke a visual image which has various features,
most of which you could override if I added extra phrases such as
"in a drought" or "in Brazil" or "without a merry-go-round". The
existence of default values for slots allows the recursive process
of filling slots to come to an end. In effect, you say, "I will
fill in the slots myself as far as three layers down; beyond that I
will take the default options." Together with its default
expectations, a frame contains knowledge of its limits of
applicability, and heuristics for switching to other frames in case
it has been stretched beyond its limits of tolerance.
The nested structure of a frame gives you a way of "zooming in"
and looking at small details from as close up as you wish: you just
zoom in on the proper subframe, and then on one of its subframes,
etc., until you have the desired amount of detail. It is like
having a road atlas of the USA which has a map of the whole country
in the front, with individual state maps inside, and even maps of
cities and some of the larger towns if you want still more detail.
One can imagine an atlas with arbitrary amounts of detail, going
down to single blocks, houses, rooms, etc. It is like looking
through a telescope with lenses of different power; each lens has
its own uses. It is important that one can make use of all the
different scales; often detail is irrelevant and even
distracting.
Because arbitrarily different frames can be stuck inside other
frames' slots, there is great potential for conflict or
"collision". The nice neat scheme of a single, global set of layers
of "constants", "parameters", and "variables" is an
oversimplification. In fact, each frame will have its own hierarchy
of variability, and this is what makes analyzing how we perceive
such a complex event as a football game, with its many subframes,
subsubframes, etc., an incredibly messy operation. How do all these
many frames interact with each other? If there is a conflict where
one frame says, "This item is a constant" but another frame says,
"No, it is a variable!", how does it get resolved? These are deep
and difficult problems of frame theory to
which I can give no answers. There has as yet been no complete
agreement on what a frame really is, or on how to implement frames
in Al programs. I make my own stab at discussing some of these
questions in the following section, where I talk about some puzzles
in visual pattern recognition, which I call "Bongard problems".
Bongard Problems
Bongard problems (BP's) are problems of the general type given
by the Russian scientist M. Bongard in his book Pattern
Recognition. A typical BP-number 51 in his collection of one
hundred-is shown in Figure 119.
FIGURE 119. Bongard problem 51. From R1. Bongard, Pattern
Recognition (Rochelle Park, N.,J.: Hayden Book Co., Spartan Books,
1970).]
These fascinating problems are intended for pattern-recognizers,
whether human or machine. (One might also throw in
ETI's-extraterrestrial intelligences.) Each problem consists of
twelve boxed figures (henceforth called boxes): six on the left,
forming Class I, and six on the right, forming Class II. The boxes
may be indexed this way:
I-A I-B
II-AII-B
I-CI-D
II-CII-D
I-EI-F
II-EII-F
The problem is "How do Class I boxes differ from Class II
boxes?"
A Bongard problem-solving program would have several stages, in
which raw data gradually get converted into descriptions. The early
stages are relatively inflexible, and higher stages become
gradually more flexible. The final stages have a property which I
call tentativity, which means simply that the way a picture is
represented is always tentative. Upon the drop of a hat, a
high-level description can be restructured, using all the devices
of the
later stages. The ideas presented below also have a tentative
quality to them. I will try to convey overall ideas first, glossing
over significant difficulties. Then I will go back and try to
explain subtleties and tricks and so forth. So your notion of how
it all works may also undergo some revisions as you read. But that
is in the spirit of the discussion.
Preprocessing Selects a Mini-vocabulary
Suppose, then, that we have some Bongard problem which we want
to solve. The problem is presented to a TV camera and the raw data
are read in. Then the raw data are preprocessed. This means that
some salient features are detected. The names of these features
constitute a "mini-vocabulary" for the problem; they are drawn from
a general "salient-feature vocabulary". Some typical terms of the
salient-feature vocabulary are:
line segment, curve, horizontal, vertical, black, white, big,
small, pointy, round ...
In a second stage of preprocessing, some knowledge about
elementary shapes is used; and if any are found, their names are
also made available. Thus, terms such as
triangle, circle, square, indentation, protrusion, right angle,
vertex, cusp, arrow ...
may be selected. This is roughly the point at which the
conscious and the unconscious meet, in humans. This discussion is
primarily concerned with describing what happens from here on
out.
High-Level Descriptions
Now that the picture is "understood", to some extent, in terms
of familiar concepts, some looking around is done. Tentative
descriptions are made for one or a few of the twelve boxes. They
will typically use simple descriptors such as
above, below, to the right of, to the left of, inside, outside
of, close to, far from, parallel to, perpendicular to, in a row,
scattered, evenly spaced, irregularly spaced, etc.
Also, definite and indefinite numerical descriptors can be
used:
1, 2, 3, 4, 5, ... many, few, etc.
More complicated descriptors may be built up, such as
further to the right of, less close to, almost parallel to,
etc.
FIGURE 120. Bongard problem 47. [From M. Bongard, Pattern
Recognition.]Thus, a typical box-say I-F of BP 47 (Fig. 120)-could
be variously described as having:
three shapes
or
three white shapes
or
a circle on the right
or
two triangles and a circle
or
two upwards-pointing triangles
or
one large shape and two small shapes
or
one curved shape and two straight-edged shapes
or
a circle with the same kind of shape on the inside and
outside.
Each of these descriptions sees the box through a "filter". Out
of context, any of them might be a useful description. As it turns
out, though, all of them are "wrong", in the context of the
particular Bongard problem they are part of. In other words, if you
knew the distinction between Classes I and II in BP 47, and were
given one of the preceding lines as a description of an unseen
drawing, that information would not allow you to tell to which
Class the drawing belonged. The essential feature of this box, in
context, is that it includes
a circle containing a triangle.
Note that someone who heard such a description would not be able
to reconstruct the original drawing, but would be able to recognize
drawings
FIGURE 121. Bongard problem 91. [From M. Bongard, Pattern
Recognition.]which have this property. It is a little like musical
style: you may be an infallible recognizer of Mozart, but at the
same time unable to write anything which would fool anybody into
thinking it was by Mozart.
Now consider box I-D of BP 91 (Fig. 121). An overloaded but
"right" description in the context of BP 91 is
a circle with three rectangular intrusions.
Notice the sophistication of such a description, in which the
word "with" functions as a disclaimer, implying that the "circle"
is not really a circle: it is almost a circle, except that . . .
Furthermore, the intrusions are not full rectangles. There is a lot
of "play" in the way we use language to describe
things. Clearly, a lot of information has been thrown away, and
even more could be thrown away. A priori, it is very hard to know
what it would be smart to throw away and what to keep. So some sort
of method for an intelligent compromise has to be encoded, via
heuristics. Of course, there is always recourse to lower levels of
description (i.e., less chunked descriptions) if discarded
information has to be retrieved, just as people can constantly look
at the puzzle for help in restructuring their ideas about it. The
trick, then, is to devise explicit rules that say how to
make tentative descriptions for each box;
compare them with tentative descriptions for other boxes of
either Class;
restructure the descriptions, by
(i) adding information,
(ii) discarding information,
or (iii) viewing the same information from another angle;
iterate this process until finding out what makes the two Classes
differ.
Templates and Sameness-Detectors
One good strategy would be to try to make descriptions
structurally similar to each other, to the extent this is possible.
Any structure they have in common will make comparing them that
much easier. Two important elements of this theory deal with this
strategy. One is the idea of "description-schemas" or templates;
the other is the idea of Sam-a "sameness detector".
First Sam. Sam is a special agent present on all levels of the
program. (Actually there may be different kinds of Sams on
different levels.) Sam constantly runs around within individual
descriptions and within different descriptions, looking for
descriptors or other things which are repeated. When some sameness
is found, various restructuring operations can be triggered, either
on the single-description level or on the level of several
descriptions at once.
Now templates. The first thing that happens after preprocessing
is an attempt to manufacture a template, or description-schema-a un
form format for the descriptions of all the boxes in a problem. The
idea is that a description can often be broken up in a natural way
into subdescriptions, and those in turn into subs ubdescriptions,
if need be. The bottom is hit when you come to primitive concepts
which belong to the level of the preprocessor. Now it is important
to choose the way of breaking descriptions into parts so as to
reflect commonality among all the boxes; otherwise you are
introducing a superfluous and meaningless kind of "pseudo-order"
into the world.
On the basis of what information is a template built? It is best
to look at an example. Take BP 49 (Fig. 122). Preprocessing yields
the information that each box consists of several little o's, and
one large closed curve. This is a valuable observation, and
deserves to be incorporated in the template. Thus a first stab at a
template would be:
large closed curve:-----
small o's:-----
FIGURE 122. Bongard problem 49. [From M. Bongard, Pattern
Recognition.]
It is very simple: the description-template has two explicit
slots where subdescriptions are to be attached.
A Heterarchical Program
Sow an interesting thing happens, triggered by the term "closed
curve". one of the most important modules in the program is a kind
of semantic net--the concept network-in which all the known nouns,
adjectives, etc., are linked in ways which indicate their
interrelations. For instance, "closed curve" is strongly linked
with the terms "interior" and "exterior". The concept net is just
brimming with information about relations between terms, such as
what is the opposite of what, what is similar to what, what often
occurs with what, and so on. A little portion of a concept network,
to be explained shortly, is shown in Figure 123. But let us follow
what happens now, in the solution of problem 49. The concepts
"interior" and "exterior" are activated by their proximity in the
net to "closed curve". This suggests to the template-builder that
it might be a good idea to make distinct slots for the interior and
exterior of the curve. Thus, in the spirit of tentativity, the
template is tentatively restructured to be this:
large closed curve: ----
little o's in interior: ----
little o's in exterior:----
Now when subdescriptions are sought, the terms "interior" and
"exterior" will cause procedures to inspect those specific regions
of the box. What is found in BP 49, box I-A is this:
large closed curve: circle
little o's in interior: threelittle o's in exterior: threeAnd a
description of box II-A of the same BP might be
large closed curve: cigar
little o's in interior: three
little o's in exterior: threeNow Sam, constantly active in
parallel with other operations, spots the recurrence of the concept
"three" in all the slots dealing with o's, and this is strong
reason to undertake a second template-restructuring operation.
Notice that the first was suggested by the concept net, the second
by Sam. Now our template for problem 49 becomes:
large closed curve:----
three little o's in interior: -----
three little o's in exterior:-----
FIGURE 123. A small portion of a concept network for a program
to solve Bongard Problems. "Nodes" are joined by "links", which in
turn can be linked. By considering a link as a verb and the nodes
it joins as subject and object, you can pull out some English
sentences from this diagram.Now that "three" has risen" one level
of generality-namely, into the template-it becomes worthwhile to
explore its neighbors in the concept network. One of them is
"triangle", which suggests that triangles of o's may be important.
As it happens, this leads down a blind alley-but how could you know
in advances It is a typical blind alley that a human would explore,
so it is good if our program finds it too! For box II-E, a
description such as the following might get generated:
large closed curve: circlethree little o's in interior:
equilateral triangle
three little o's in exterior: equilateral triangleOf course an
enormous amount of information has been thrown away concerning the
sizes, positions, and orientations of these triangles, and many
other things as well. But that is the whole point of making
descriptions instead of just using the raw data! It is the same
idea as funneling, which we discussed in Chapter XI.
The Concept Network
We need not run through the entire solution of problem 49; this
suffices to show the constant back-and-forth interaction of
individual descriptions, templates, the sameness-detector Sam, and
the concept network. We should now look a little more in detail at
the concept network and its function. A simplified portion shown in
the figure codes the following ideas:
"High" and "low" are opposites.
"Up" and "down" are opposites.
"High" and "up" are similar.
"Low" and "down" are similar.
"Right" and "left" are opposites.
The "right-left" distinction is similar to the "high-low"
distinction.
"Opposite" and "similar" are opposites.
Note how everything in the net-both nodes and links-can be
talked about. In that sense nothing in the net is on a higher level
than anything else. Another portion of the net is shown; it codes
for the ideas that
A square is a polygon.
A triangle is a polygon.
A polygon is a closed curve.
The difference between a triangle and a square is that one has 3
sides and the other has 4.
4 is similar to 3.
A circle is a closed curve.
A closed curve has an interior and an exterior. "Interior" and
"exterior" are opposites.
The network of concepts is necessarily very vast. It seems to
store knowledge only statically, or declaratively, but that is only
half the story. Actually, its knowledge borders on being procedural
as well, by the fact that the proximities in the net act as guides,
or "programs", telling the main program how to develop its
understanding of the drawings in the boxes.
For instance, some early hunch may turn out to be wrong and yet
have the germ of the right answer in it. In BP 33 (Fig. 124), one
might at first
FIGURE 124. Bongard problem 33. [From M. Bongard, Pattern
Recognition.]
jump to the idea that Class I boxes contain "pointy" shapes,
Class II boxes contain "smooth" ones. But on closer inspection,
this is wrong. Nevertheless, there is a worthwhile insight here,
and one can try to push it further, by sliding around in the
network of concepts beginning at "pointy". It is close to the
concept "acute", which is precisely the distinguishing feature of
Class I. Thus one of the main functions of the concept network is
to allow early wrong ideas to be modified slightly, to slip into
variations which may be correct.
Slippage and Tentativity
Related to this notion of slipping between closely related terms
is the notion of seeing a given object as a variation on another
object. An excellent example has been mentioned already-that of the
"circle with three indentations", where in fact there is no circle
at all. One has to be able to bend concepts, when it is
appropriate. Nothing should be absolutely rigid. Onthe other hand,
things shouldn't be so wishy-washy that nothing has any meaning at
all, either. The trick is to know when and how to slip one concept
into another.
An extremely interesting set of examples where slipping from one
description to another is the crux of the matter is given in
Bongard problems 85-87 (Fig. 125). BP 85 is rather trivial. Let us
assume that our program identifies "line segment" in its
preprocessing stage. It is relatively simple for it then to count
line segments and arrive at the difference
FIGURE 125.Bongard problems 85-87. [From M. Bongard, Pattern
Recognition.]
between Class I and Class II in BP 85. Now it goes on to BP 86.
A general heuristic which it uses is to try out recent ideas which
have worked. Successful repetition of recent methods is very common
in the real world, and Bongard does not try to outwit this kind of
heuristic in his collection-in fact, he reinforces it, fortunately.
So we plunge right into problem 86 with two ideas ("count" and
"line segment") fused into one: "count line segments". But as it
happens, the trick of BP 86 is to count line trains rather than
line segments, where "line train" means an end-to-end concatenation
of (one or more) line segments. One way the program might figure
this out is if the concepts "line train" and "line segment" are
both known, and are close in the concept network. Another way is if
it can invent the concept of "line train"-a tricky proposition, to
say the least.
Then comes BP 87, in which the notion of "line segment" is
further played with. When is a line segment three line segments?
(See box II-A.) The program must be sufficiently flexible that it
can go back and forth between such different representations for a
given part of a drawing. It is wise to store old representations,
rather than forgetting them and perhaps having to reconstruct them,
for there is no guarantee that a newer representation is better
than an old one. Thus, along with each old representation should be
stored some of the reasons for liking it and disliking it. (This
begins to sound rather complex, doesn't it?)
Meta- Descriptions
Now we come to another vital part of the recognition process,
and that has to do with levels of abstraction and
meta-descriptions. For this let us consider BP 91 (Fig. 121) again.
What kind of template could be constructed here? There is such an
amount of variety that it is hard to know where to begin. But this
is in itself a clue! The clue says, namely, that the class
distinction very likely exists on a higher level of abstraction
than that of geometrical description. This observation clues the
program that it should construct descriptions of descriptions-that
is, meta-descriptions. Perhaps on this second level some common
feature will emerge; and if we are lucky, we will discover enough
commonality to guide us towards the formulation of a template for
the meta-descriptions! So we plunge ahead without a template, and
manufacture descriptions for various boxes; then, once these
descriptions have been made, we describe them. What kinds of slot
will our template for meta-descriptions have? Perhaps these, among
others:
concepts used: ----
recurring concepts-----:
names of slots: -----
filters used:----
There are many other kinds of slots which might be needed in
metadescriptions, but this is a sample. Now suppose we have
described box I-E of BP 91. Its (template-less) description might
look like this:
horizontal line segment
vertical line segment mounted on the horizontal line segment
vertical line segment mounted on the horizontal line segment
vertical line segment mounted on the horizontal line segment
Of course much information has been thrown out: the fact that
the three vertical lines are of the same length, are spaced
equidistantly, etc. But it is plausible that the above description
would be made. So the meta description might look like this:
concepts used: vertical-horizontal, line segment, mounted on
repetitions in description: 3 copies of "vertical line segment
mounted on the horizontal line segment"
names of slots-----
filters used:-----
Not all slots of the meta-description need be filled in;
information can be thrown away on this level as well as on the
Just- plain-description" level.
Now if we were to make a description for any of the other boxes
of Class I, and then a meta-description of it, we would wind up
filling the slot "repetitions in description" each time with the
phrase "3 copies of ..." The sameness-detector would notice this,
and pick up three-ness as a salient feature, on quite a high level
of abstraction, of the boxes of Class I. Similarly, four-ness would
be recognized, via the method of metadescriptions, as the mark of
Class II.
Flexibility Is Important
Now you might object that in this case, resorting to the method
of metadescriptions is like shooting a fly with an elephant gun,
for the three-ness versus four-ness might as easily have shown up
on the lower level if we had
constructed our descriptions slightly differently. Yes, true-but
it is important to have the possibility of solving these problems
by different routes. There should be a large amount of flexibility
in the program; it should not be doomed if, malaphorically
speaking, it "barks up the wrong alley" for a while. (The amusing
term "malaphor" was coined by the newspaper columnist Lawrence
Harrison; it means a cross between a malapropism and a metaphor. It
is a good example of "recombinant ideas".) In any case, I wanted to
illustrate the general principle that says: When it is hard to
build a template because the preprocessor finds too much diversity,
that should serve as a clue that concepts on a higher level of
abstraction are involved than the preprocessor knows about.
Focusing and Filtering
Now let us deal with another question: ways to throw information
out. This involves two related notions, which I call "focusing" and
"filtering". Focus-
FIGURE 126. Bongard problem 55. [From M. Bongard, Pattern
Recognition.]
FIGURE 127. Bongard problem 22. [From M. Bongard, Pattern
Recognition.]
ing involves making a description whose focus is some part of
the drawing in the box, to the exclusion of everything else.
Filtering involves making a description which concentrates on some
particular way of viewing the contents of the box, and deliberately
ignores all other aspects. Thus they are complementary: focusing
has to do with objects (roughly, nouns), and filtering has to do
with concepts (roughly, adjectives). For an example of focusing,
let's look at BP 55 (Fig. 126). Here, we focus on the indentation
and the little circle next to it, to the exclusion of the
everything else in the box. BP 22 (Fig. 127) presents an example of
filtering. Here, we must filter out every concept but that of size.
A combination of focusing and filtering is required to solve
problem BP 58 (Fig. 128).
One of the most important ways to get ideas for focusing and
filtering is by another sort of "focusing": namely, by inspection
of a single particularly simple box-say one with as few objects in
it as possible. It can be
FIGURE 128. Bongard problem 58. [From M. Bongard, Pattern
Recognition.]
FIGURE 129. Bongard problem 61. [From M. Bongard, Pattern
Recognition.]
extremely helpful to compare the starkest boxes from the two
Classes. But how can you tell which boxes are stark until you have
descriptions for them? Well, one way of detecting starkness is to
look for a box with a minimum of the features provided by the
preprocessor. This can be done very early, for it does not require
a pre-existing template; in fact, this can be one useful way of
discovering features to build into a template. BP 61 (Fig. 129) is
an example where that technique might quickly lead to a
solution.
Science and the World of Bongard Problems
One can think of the Bongard-problem world as a tiny place where
"science" is done-that is, where the purpose is to discern patterns
in the world. As patterns are sought, templates are made, unmade,
and remade;
FIGURE 130. Bongard problems 70-71. [From M. Bongard, Pattern
Recognition.]
slots are shifted from one level of generality to another:
filtering and focusing are done; and so on. There are discoveries
on all levels of complexity. The Kuhnian theory that certain rare
events called "paradigm shifts" mark the distinction between
"normal" science and "conceptual revolutions" does not seem to
work, for we can see paradigm shifts happening all throughout the
system, all the time. The fluidity of descriptions ensures that
paradigm shifts will take place on all scales.
Of course, some discoveries are more "revolutionary" than
others, because they have wider effects. For instance, one can make
the discovery that problems 70 and 71 (Fig. 130) are "the same
problem", when looked at on a sufficiently abstract level. The key
observation is that both involve depth-2 versus depth-1 nesting.
This is a new level of discovery that can he made about Bongard
problems. There is an even higher level, concerning the collection
as a whole. If someone has never seen the collection, it can be a
good puzzle just to figure out what it is. To figure it out is a
revolutionary insight, but it must be pointed out that the
mechanisms of thought which allow such a discovery to be made are
no different from those which operate in the solution of a single
Bongard problem.
By the same token, real science does not divide up into "normal"
periods versus "conceptual. revolutions"; rather, paradigm shifts
pervade-there are just bigger and smaller ones, paradigm shifts on
different levels. The recursive plots of INT and Gplot (Figs. 32
and 34) provide a geometric model for this idea: they have the same
structure full of discontinuous jumps on every level, not just the
top level-only the lower the level, the smaller the jumps
Connections to Other Types of Thought
To set this entire program somewhat in context, let me suggest
two ways in which it is related to other aspects of cognition. Not
only does it depend on other aspects of cognition, but also they in
turn depend on it. First let me comment on how it depends on other
aspects of cognition. The intuition which is required for knowing
when it makes sense to blur distinctions, to try redescriptions, to
backtrack, to shift levels, and so forth, is something which
probably comes only with much experience in thought in general.
Thus it would be very hard to define heuristics for these crucial
aspects of the program. Sometimes one's experience with real
objects in the world has a subtle effect on how one describes or
redescribes boxes. For instance, who can say how much one's
familiarity with living trees helps one to solve BP 70% It is very
doubtful that in humans, the subnetwork of concepts relevant to
these puzzles can be easily separated out from the whole network.
Rather, it is much more likely that one's intuitions gained from
seeing and handling real objects-combs, trains, strings, blocks,
letters, rubber bands, etc., etc.-play an invisible but significant
guiding role in the solution of these puzzles.
Conversely, it is certain that understanding real-world
situations heavily depends on visual imagery and spatial intuition,
so that having a powerful and flexible way of representing patterns
such as these Bongard patterns can only contribute to the general
efficiency of thought processes.
It seems to me that Bongard's problems were worked out with
great care, and that they have a quality of universality to them,
in the sense that each one has a unique correct answer. Of course
one could argue with this and say that what we consider "correct"
depends in some deep way on our being human, and some creatures
from some other star system might disagree entirely. Not having any
concrete evidence either way, I still have a certain faith that
Bongard problems depend on a sense of simplicity which is not just
limited to earthbound human beings. My earlier comments about the
probable importance of being acquainted with such surely
earth-limited objects as combs, trains, rubber bands, and so on,
are not in conflict with the idea that our notion of simplicity is
universal, for what matters is not any of these individual objects,
but the fact that taken together they span a wide space. And it
seems likely that any other civilization would have as vast a
repertoire of artifacts and natural objects and varieties of
experience on which to draw as we do. So I believe that the skill
of solving Bongard
problems lies very close to the core of "pure" intelligence, if
there is such a thing. Therefore it is a good place to begin if one
wants to investigate the ability to discover "intrinsic meaning" in
patterns or messages. Unfortunately we have reproduced only a small
selection of his stimulating collection. I hope that many readers
will acquaint themselves with the entire collection, to be found in
his book (see Bibliography).
Some of the problems of visual pattern recognition which we
human beings seem to have completely "flattened" into our
unconscious are quite amazing. They include:
recognition of faces (invariance of faces under age change,
expression change, lighting change, distance change, angle change,
etc.)
recognition of hiking trails in forests and mountains-somehow
this has always impressed me as one of our most subtle acts of
pattern recognition-and yet animals can do it, too
reading text without hesitation in hundreds if not thousands of
different typefaces
Message-Passing Languages, Frames, and Symbols
One way that has been suggested for handling the complexities of
pattern recognition and other challenges to Al programs is the
so-called "actor" formalism of Carl Hewitt (similar to the language
"Smailtalk", developed by Alan Kay and others), in which a program
is written as a collection of interacting actors, which can pass
elaborate messages back and forth among themselves. In a way, this
resembles a heterarchical collection of procedures which can call
each other. The major difference is that where procedures usually
only pass a rather small number of arguments back and forth, the
messages exchanged by actors can be arbitrarily long and
complex.
Actors with the ability to exchange messages become somewhat
autonomous agents-in fact, even like autonomous computers, with
messages being somewhat like programs. Each actor can have its own
idiosyncratic way of interpreting any given message; thus a
message's meaning will depend on the actor it is intercepted by.
This comes about by the actor having within it a piece of program
which interprets messages; so there may be as many interpreters as
there are actors. Of course, there may be many actors with
identical interpreters; in fact, this could be a great advantage,
just as it is extremely important in the cell to have a multitude
of identical ribosomes floating throughout the cytoplasm, all of
which will interpret a message-in this case, messenger RNA-in one
and the same way.
It is interesting to think how one might merge the frame-notion
with the actor-notion. Let us call a frame with the capability of
generating and interpreting complex messages a symbol:
frame + actor = symbol
We now have reached the point where we are talking about ways or
implementing those elusive active symbols of Chapters XI and XII;
henceforth in this Chapter, "symbol" will have that meaning. By the
way, don't feel dumb if you don't immediately see just how this
synthesis is to be made. It is not clear, though it is certainly
one of the most fascinating directions to go in AI. Furthermore, it
is quite certain that even the best synthesis of these notions will
turn out to have much less power than the actual symbols of human
minds. In that sense, calling these frame-actor syntheses "symbols"
is premature, but it is an optimistic way of looking at things.
Let us return to some issues connected with message passing.
Should each message be directed specifically at a target symbol, or
should it be thrown out into the grand void, much as mRNA is thrown
out into the cytoplasm, to seek its ribosome? If messages have
destinations, then each symbol must have an address, and messages
for it should always be sent to that address. On the other hand,
there could be one central receiving dock for messages, where a
message would simply sit until it got picked up by some symbol that
wanted it. This is a counterpart to General Delivery. Probably the
best solution is to allow both types of message to exist; also to
have provisions for different classes of urgency-special delivery,
first class, second class, and so on. The whole postal system
provides a rich source of ideas for message-passing languages,
including such curios as selfaddressed stamped envelopes (messages
whose senders want answers quickly), parcel post (extremely long
messages which can be sent some very slow way), and more. The
telephone system will give you more inspiration when you run out of
postal-system ideas.
Enzymes and AI
Another rich source of ideas for message passing-indeed, for
information processing in general-is, of course, the cell. Some
objects in the cell are quite comparable to actors-in particular,
enzymes. Each enzyme's active site acts as a filter which only
recognizes certain kinds of substrates (messages). Thus an enzyme
has an "address", in effect. The enzyme is "programmed" (by virtue
of its tertiary structure) to carry out certain operations upon
that "message", and then to release it to the world again. Now in
this way, when a message is passed from enzyme to enzyme along a
chemical pathway, a lot can be accomplished. We have already
described the elaborate kinds of feedback mechanisms which can take
place in cells (either by inhibition or repression). These kinds of
mechanisms show that complicated control of processes can arise
through the kind of message passing that exists in the cell.
One of the most striking things about enzymes is how they sit
around idly, waiting to be triggered by an incoming substrate.
Then, when the substrate arrives, suddenly the enzyme springs into
action, like a Venus's flytrap. This kind of "hair-trigger" program
has been used in Al, and goes by the name of demon. The important
thing here is the idea of having many different "species" of
triggerable subroutines just lying around waiting to
be triggered. In cells, all the complex molecules and organelles
are built up, simple step by simple step. Some of these new
structures are often enzymes themselves, and they participate in
the building of new enzymes, which in turn participate in the
building of yet other types of enzyme, etc. Such recursive cascades
of enzymes can have drastic effects on what a cell is doing. One
would like to see the same kind of simple step-by-step assembly
process imported into AI, in the construction of useful
subprograms. For instance, repetition has a way of burning new
circuits into our mental hardware, so that oft-repeated pieces of
behavior become encoded below the conscious level. It would be
extremely useful if there were an analogous way of synthesizing
efficient pieces of code which can carry out the same sequence of
operations as something which has been learned on a higher level of
"consciousness". Enzyme cascades may suggest a model for how this
could be done. (The program called "HACKER", written by Gerald
Sussman, synthesizes and debugs small subroutines in a way not too
much unlike that of enzyme cascades.)
The sameness-detectors in the Bongard problem-solver (Sams)
could be implemented as enzyme-like subprograms. Like an enzyme, a
Sam would meander about somewhat at random, bumping into small data
structures here and there. Upon filling its two "active sites" with
identical data structures, the Sam would emit a message to other
parts (actors) of the program. As long as programs are serial, it
would not make much sense to have several copies of a Sam, but in a
truly parallel computer, regulating the number of copies of a
subprogram would be a way of regulating the expected waiting-time
before an operation gets done, just as regulating the number of
copies of an enzyme in a cell regulates how fast that function gets
performed. And if new Sams could be synthesized, that would be
comparable to the seepage of pattern detection into lower levels of
our minds.
Fission and Fusion
Two interesting and complementary ideas concerning the
interaction of symbols are "fission" and "fusion". Fission is the
gradual divergence of a new symbol from its parent symbol (that is,
from the symbol which served as a template off of which it was
copied). Fusion is what happens when two (or more) originally
unrelated symbols participate in a "joint activation", passing
messages so tightly back and forth that they get bound together and
the combination can thereafter be addressed as if it were a single
symbol. Fission is a more or less inevitable process, since once a
new symbol has been "rubbed off" of an old one, it becomes
autonomous, and its interactions with the outside world get
reflected in its private internal structure; so what started out as
a perfect copy will soon become imperfect, and then slowly will
become less and less like the symbol off of which it was "rubbed".
Fusion is a subtler thing. When do two concepts really become 'one?
Is there some precise instant when a fusion takes place?
This notion of joint activations opens up a Pandora's box of
questions. For instance, how much coo we hear "dough" and "nut"
when we say "doughnut"? Does a German who thinks of gloves
("Handschuhe") hear "hand-shoes" or not? How about Chinese people,
whose word "dong-xi" ("East-West") means "thing"? It is a matter of
some political concern, too, since some people claim that words
like "chairman" are heavily charged with undertones of the male
gender. The degree to which the parts resonate inside the whole
probably varies from person to person and according to
circumstances.
The real problem with this notion of "fusion" of symbols is that
it is very hard to imagine general algorithms which will create
meaningful new symbols from colliding symbols. It is like two
strands of DNA which come together. How do you take parts from each
and recombine them into a meaningful and viable new strand of DNA
which codes for an individual of the same species? Or a new kind of
species? The chance is infinitesimal that a random combination of
pieces of DNA will code for anything that will survive-something
like the chance that a random combination of words from two books
will make another book. The chance that recombinant DNA will make
sense on any level but the lowest is tiny, precisely because there
are so many levels of meaning in DNA. And the same goes for
"recombinant symbols".
Epigenesis of the Crab Canon
I think of my Dialogue Crab Canon as a prototype example where
two ideas collided in my mind, connected in a new way, and suddenly
a new kind of verbal structure came alive in my mind. Of course I
can still think about musical crab canons and verbal dialogues
separately-they can still be activated independently of each other;
but the fused symbol for crab canonical dialogues has its own
characteristic modes of activation, too. To illustrate this notion
of fusion or "symbolic recombination" in some detail, then, I would
like to use the development of my Crab Canon as a case study,
because, of course, it is very familiar to me, and also because it
is interesting, yet typical of how far a single idea can be pushed.
I will recount it in stages named after those of meiosis, which is
the name for cell division in which "crossing-over", or genetic
recombination, takes place-the source of diversity in
evolution.
PROPHASE: I began with a rather simple idea-that a piece of
music, say a canon, could be imitated verbally. This came from the
observation that, through a shared abstract form, a piece of text
and a piece of music may be connected. The next step involved
trying to realize some of the potential of this vague hunch; here,
I hit upon the idea that "voices" in canons can be mapped onto
"characters" in dialogues-still a rather obvious idea.
Then I focused down onto specific kinds of canons, and
remembered that there was a crab canon in the Musical Offering. At
that time, I had just
begun writing Dialogues, and there were only two characters:
Achilles and the Tortoise. Since the Bach crab canon has two
voices, this mapped perfectly: Achilles should be one voice, the
Tortoise the other, with the one doing forwards what the other does
backwards. But here I was faced with a problem: on what level
should the reversal take place? The letter level? The word level?
The sentence level? After some thought, I concluded that the
"dramatic line" level would be most appropriate.
Now that the "skeleton" of the Bach crab canon had been
transplanted, at least in plan, into a verbal form, there was just
one problem. When the two voices crossed in the middle, there would
be a short period of extreme repetition: an ugly blemish. What to
do about it? Here, a strange thing happened, a kind of
level-crossing typical of creative acts: the word "crab" in "crab
canon" flashed into my mind, undoubtedly because of some abstract
shared quality with the notion of "tortoise"-and immediately I
realized that at the dead center, I could block the repetitive
effect, by inserting one special line, said by a new character: a
Crab! This is how, in the "prophase" of the Crab Canon, the Crab
was conceived: at the crossing over of Achilles and the Tortoise.
(See Fig. 131.)
FIGURE 131. A schematic diagram of the Dialogue Crab Canon.
METAPHASE: This was the skeleton of my Crab Canon. I then
entered the second stage-the "metaphase"-in which I had to fill in
the flesh, which was of course an arduous task. I made a lot of
stabs at it, getting used to the way in which pairs of successive
lines had to make sense when read from either direction, and
experimenting around to see what kinds of dual meanings would help
me in writing such a form (e.g., "Not at all"). There were two
early versions both of which were interesting, but weak. I
abandoned work on the book for over a year, and when I returned to
the Crab Canon, I had a few new ideas. One of them was to mention a
Bach canon inside it. At first my plan was to mention the "Canon
per augmentationem, contrario motu", from the Musical Offering
(Sloth Canon, as I call it). But that started to seem a little
silly, so reluctantly I decided that inside my Crab Canon, I could
talk about Bach's own Crab Canon instead. Actually, this was a
crucial turning point, but I didn't know it then.
Now if one character was going to mention a Bach piece, wouldn't
it be awkward for the other to say exactly the same thing in the
corresponding place? Well, Escher was playing a similar role to
Bach in my thoughts and my book, so wasn't there some way of just
slightly modifying the line so that it would refer to Escher? After
all, in the strict art of canons, note-perfect imitation is
occasionally foregone for the sake of elegance or beauty. Andno
sooner did that idea occur to me than the picture Day and Night
(Fig. 49) popped into my mind. "Of course!" I thought, "It is a
sort of pictorial crab canon, with essentially two complementary
voices carrying the same theme both leftwards and rightwards, and
harmonizing with each other!" Here again was the notion of a single
"conceptual skeleton" being instantiated in two different media-in
this case, music and art. So I let the Tortoise talk about Bach,
and Achilles talk about Escher, in parallel language; certainly
this slight departure from strict imitation retained the spirit of
crab cano.is.
At this point, I began realizing that something marvelous was
happening namely, the Dialogue was becoming self-referential,
without my even having intended it! What's more, it was an indirect
self-reference, in that the characters did not talk directly about
the Dialogue they were in, but rather about structures which were
isomorphic to it (on a certain plane of abstraction). To put it in
the terms I have been using, my Dialogue now shared a "conceptual
skeleton" with Gdels G, and could therefore be mapped onto G in
somewhat the way that the Central Dogma was, to create in this case
a "Central Crabmap". This was most exciting to me, since out of
nowhere had come an esthetically pleasing unity of Gdel, Escher,
and Bach.
ANAPHASE: The next step was quite startling. I had had Caroline
MacGillavry's monograph on Escher's tessellations for years, but
one day, as I flipped through it, my eye was riveted to Plate 23
(Fig. 42), for I saw it in a way I had never seen it before: here
was a genuine crab canon-crab-like in both form and content! Escher
himself had given the picture no title, and since he had drawn
similar tessellations using many other animal forms, it is probable
that this coincidence of form and content was just something which
I had noticed. But fortuitous or not, this untitled plate was a
miniature version of one main idea of my book: to unite form and
content. So with delight I christened it Crab Canon, substituted it
for Day and Night, and modified Achilles' and the Tortoise's
remarks accordingly.
Yet this was not all. Having become infatuated with molecular
biology, one day I was perusing Watson's book in the bookstore, and
in the index saw the word "palindrome". When I looked it up, I
found a magical thing: crab-canonical structures in DNA. Soon the
Crab's comments had been suitably modified to include a short
remark to the effect that he owed his predilection for confusing
retrograde and forward motion to his genes.
TELOPHASE: The last step came months later, when, as I was
talking about the picture of the crab-canonical section of DNA
(Fig. 43), 1 saw that the 'A', 'T', 'C' of Adenine, Thymine,
Cytosine coincided- mirabile dictu-with the 'A', 'T', 'C' of
Achilles, Tortoise, Crab; moreover, just as Adenine and Thymine are
paired in DNA, so are Achilles and the Tortoise paired in the
Dialogue. I thought for a moment and, in another of those
level-crossings, saw that 'G', the letter paired with 'C' in DNA,
could stand for "Gene". Once again, I jumped back to the Dialogue,
did a little surgery on the Crab's speech to reflect this new
discovery, and now I had a mapping between the DNA's structure, and
the Dialogue's structure. In that sense, the DNA could be said to
be a genotype coding for a phenotype: the
Structure of the Dialogue. This final touch dramatically
heightened the self-reference, and gave the Dialogue a density of
meaning which I had never anticipated.
Conceptual Skeletons and Conceptual Mapping
That more or less summarizes the epigenesis of the Crab Canon.
The whole process can be seen as a succession of mappings of ideas
onto each other, at varying levels of abstraction. This is what I
call conceptual mapping, and the abstract structures which connect
up two different ideas are conceptual skeletons. Thus, one
conceptual skeleton is that of the abstract notion of a crab
canon:
a structure having two parts which do the same thing,
only moving in opposite directions.
This is a concrete geometrical image which can be manipulated by
the mind almost as a Bongard pattern. In fact, when I think of the
Crab Canon today, I visualize it as two strands which cross in the
middle, where they are joined by a "knot" (the Crab's speech). This
is such a vividly pictorial image that it instantaneously maps, in
my mind, onto a picture of two homologous chromosomes joined by a
centromere in their middle, which is an image drawn directly from
meiosis, as shown in Figure 132.
FIGURE 132.
In fact, this very image is what inspired me to cast the
description of the Crab Canon's evolution in terms of meiosis-which
is itself, of course, vet another example of conceptual
mapping.
Recombinant Ideas
There are a variety of techniques of fusion of two symbols. One
involves lining the two ideas up next to each other (as if ideas
were linear!), then judiciously choosing pieces from each one, and
recombining them in a new symbol. This strongly recalls genetic
recombination. Well, what do chromosomes exchange, and how do they
do it? They exchange genes. What in a symbol is comparable to a
gene? If symbols have frame-like slots, then slots, perhaps. But
which slots to exchange, and why? Here is where the crabcanonical
fusion may offer some ideas. Mapping the notion of "musical crab
canon" onto that of "dialogue" involved several auxiliary mappings;
in
fact it induced them. That is, once it had been decided that
these two notions ,ere to be fused, it became a matter of looking
at them on a level where analogous parts emerged into view, then
going ahead and mapping the parts onto each other, and so on,
recursively, to any level that was found desirable. Here, for
instance, "voice" and "character" emerged as corresponding slots
when "crab canon" and "dialogue" were viewed abstractly. Where did
these abstract views come from, though? This is at the crux of the
mapping-problem-where do abstract views come from? How do you make
abstract views of specific notions?
Abstractions, Skeletons, Analogies
A view which has been abstracted from a concept along some
dimension is what I call a conceptual skeleton. In effect, we have
dealt with conceptual skeletons all along, without often using that
name. For instance, many of the ideas concerning Bongard problems
could be rephrased using this terminology. It is always of
interest, and possibly of importance, when two or more ideas are
discovered to share a conceptual skeleton. An example is the
bizarre set of concepts mentioned at the beginning of the
Contrafactus: a Bicyclops, a tandem unicycle, a teeter-teeter, the
game of ping-ping, a one-way tie, a two-sided Mobius strip, the
"Bach twins", a piano concerto for two left hands, a one-voice
fugue, the act of clapping with one hand, a two-channel monaural
phonograph, a pair of eighth-backs. All of these ideas are
"isomorphic" because they share this conceptual skeleton:
a plural thing made singular and re-pluralized wrongly.
Two other ideas in this book which share that conceptual
skeleton are (1) the Tortoise's solution to Achilles' puzzle,
asking for a word beginning and ending in "HE" (the Tortoise's
solution being the pronoun "HE", which collapses two occurrences
into one), and (2) the Pappus-Gelernter proof of the Pons As' norum
Theorem, in which one triangle is reperceived as two. Incidentally,
these droll concoctions might be dubbed "demi-doublets".
A conceptual skeleton is like a set of constant features (as
distinguished from parameters or variables)-features which should
not be slipped in a subjunctive instant replay or
mapping-operation. Having no parameters or variables of its own to
vary, it can be the invariant core of several different ideas. Each
instance of it, such as "tandem unicycle", does have layers of
variability and so can be "slipped" in various ways.
Although the name "conceptual skeleton" sounds absolute and
rigid, actually there is a lot of play in it. There can be
conceptual skeletons on several different levels of abstraction.
For instance, the "isomorphism" between Bongard problems 70 and 71,
already pointed out, involves a higher-level conceptual skeleton
than that needed to solve either problem in isolation.
Multiple Representations
Not only must conceptual skeletons exist on different levels of
abstraction; also, they must exist along different conceptual
dimensions. Let us take the following sentence as an example:
"The Vice President is the spare tire on the automobile of
government."
How do we understand what it means (leaving aside its humor,
which is of course a vital aspect)? If you were told, "See our
government as an automobile" without any prior motivation, you
might come up with any number of correspondences: steering wheel =
president, etc.. What are checks and balances? What are seat belts?
Because the two things being mapped are so different, it is almost
inevitable that the mapping will involve functional aspects.
Therefore, you retrieve from your store of conceptual skeletons
representing parts of automobiles, only those having to do with
function, rather than, say, shape. Furthermore, it makes sense to
work at a pretty high level of abstraction, where "function" isn't
taken in too narrow a context. Thus, of the two following
definitions of the function of a spare tire: (1) "replacement for a
flat tire", and (2) "replacement for a certain disabled part of a
car", certainly the latter would be preferable, in this case. This
comes simply from the fact that an auto and a government are so
different that they have to be mapped at a high level of
abstraction.
Now when the particular sentence is examined, the mapping gets
forced in one respect-but it is not an awkward way, by any means.
In fact, you already have a conceptual skeleton for the Vice
President, among many others, which says, "replacement for a
certain disabled part of government". Therefore the forced mapping
works comfortably. But suppose, for the sake of contrast, that you
had retrieved another conceptual skeleton for "spare tire"-say, one
describing its physical aspects. Among other things, it might say
that a spare tire is "round and inflated". Clearly, this is not the
right way to go. (Or is it? As a friend of mine pointed out, some
Vice Presidents are rather portly, and most are quite
inflated!)
Ports of Access
One of the major characteristics of each idiosyncratic style of
thought is how new experiences get classified and stuffed into
memory, for that defines the "handles" by which they will later be
retrievable. And for events, objects, ideas, and so on-for
everything that can be thought about-there is a wide variety of
"handles". I am struck by this each time I reach down to turn on my
car radio, and find, to my dismay, that it is already on! What has
happened is that two independent representations are being used for
the radio. One is "music producer", the other is "boredom
reliever". I am aware that the music is on, but I am bored anyway,
and before the two realizations have a chance to interact, my
reflex to reachdown has been triggered. The same reaching-down
reflex one day occurred just after I'd left the radio at a repair
shop and was driving away, wanting to hear some music. Odd. Many
other representations for the same object exist, such as
shiny silver-knob haveroverheating-problems haver
lying-on-my-back-over-hump-to-fix thing buzz-maker
slipping-dials object
multidimensional representation example
All of them can act as ports of access. Though they all are
attached to my symbol for my car radio, accessing that symbol
through one does not open up all the others. Thus it is unlikely
that I will be inspired to remember lying on my back to fix the
radio when I reach down and turn it on. And conversely, when I'm
lying on my back, unscrewing screws, I probably won't think about
the time I heard the Art of the Fugue on it. There are "partitions"
between these aspects of one symbol, partitions that prevent my
thoughts from spilling over sloppily, in the manner of free
associations. My mental partitions are important because they
contain and channel the flow of my thoughts.
One place where these partitions are quite rigid is in sealing
off words for the same thing in different languages. If the
partitions were not strong, a bilingual person would constantly
slip back and forth between languages, which would be very
uncomfortable. Of course, adults learning two new languages at once
often confuse words in them. The partitions between these languages
are flimsier, and can break down. Interpreters are particularly
interesting, since they can speak any of their languages as if
their partitions were inviolable and yet, on command, they can
negate those partitions to allow access to one language from the
other, so they can translate. Steiner, who grew up trilingual,
devotes several pages in After Babel to the intermingling of
French, English, and German in the layers of his mind, and how his
different languages afford different ports of access onto
concepts.
Forced Matching
When two ideas are seen to share conceptual skeletons on some
level of abstraction, different things can happen. Usually the
first stage is that you zoom in on both ideas, and, using the
higher-level match as a guide, you try to identify corresponding
subideas. Sometimes the match can be extended recursively downwards
several levels, revealing a profound isomorphism. Sometimes it
stops earlier, revealing an analogy or similarity. And then there
are times when the high-level similarity is so compelling that,
even if there is no apparent lower-level continuation of the map,
you just go ahead and make one: this is the forced match.Forced
matches occur every day in the political cartoons of newspapers: a
political figure is portrayed as an airplane, a boat, a fish, the
Mona Lisa; a government is a human, a bird, an oil rig; a treaty is
a briefcase, a sword, a can of worms; on and on and on. What is
fascinating is how easily we can perform the suggested mapping, and
to the exact depth intended. We don't carry the mapping out too
deeply or too shallowly.
Another example of forcing one thing into the mold of another
occurred when I chose to describe the development of my Crab Canon
in terms of meiosis. This happened in stages. First, I noticed the
common conceptual skeleton shared by the Crab Canon and the image
of chromosomes joined by a centromere; this provided the
inspiration for the forced match. Then I saw a high-level
resemblance involving "growth", "stages", and "recombination". Then
I simply pushed the analogy as hard as I could. Tentativity-as in
the Bongard problem-solver-played a large role: I went forwards and
backwards before finding a match which I found appealing.
A third example of conceptual mapping is provided by the Central
Dogmap. I initially noticed a high-level similarity between the
discoveries of mathematical logicians and those of molecular
biologists, then pursued it on lower levels until I found a strong
analogy. To strengthen it further, I chose a Godel-numbering which
imitated the Genetic Code. This was the lone element of forced
matching in the Central Dogmap.
Forced matches, analogies, and metaphors cannot easily be
separated out. Sportscasters often use vivid imagery which is hard
to pigeonhole. For instance, in a metaphor such as "The Rams
[football team are spinning their wheels", it is hard to say just
what image you are supposed to conjure up. Do you attach wheels to
the team as a whole% Or to each player? Probably neither one. More
likely, the image of wheels spinning in mud or snow simply flashes
before you for a brief instant, and then in some mysterious way,
just the relevant parts get lifted out and transferred to the
team's performance. How deeply are the football team and the car
mapped onto each other in the split second that you do this?
Recap
Let me try to tie things together a little. I have presented a
number of related ideas connected with the creation, manipulation,
and comparison of symbols. Most of them have to do with slippage in
some fashion, the idea being that concepts are composed of some
tight and some loose elements, coming from different levels of
nested contexts (frames). The loose ones can be dislodged and
replaced rather easily, which, depending on the circumstances, can
create a "subjunctive instant replay", a forced match, or an
analogy. A fusion of two symbols may result from a process in which
parts of each symbol are dislodged and other parts remain.
Creativity and Randomness
It is obvious that we are talking about mechanization of
creativity. But the this not a contradiction in terms? Almost, but
not really. Creativity s essence of that which is not mechanical.
Yet every creative act is mechanical-it has its explanation no less
than a case of the hiccups does. The mechanical substrate of
creativity may be hidden from view, but it exists. Conversely,
there is something unmechanical in flexible programs, even today.
It may not constitute creativity, but when programs cease to be
transparent to their creators, then the approach to creativity has
begun.
It is a common notion that randomness is an indispensable
ingredient of creative acts. This may be true, but it does not have
any bearing on the mechanizability-or rather, programmability!-of
creativity. The world is a giant heap of randomness; when you
mirror some of it inside your head, your head's interior absorbs a
little of that randomness. The triggering patterns of symbols,
therefore, can lead you down the most randomseeming paths, simply
because they came from your interactions with a crazy, random
world. So it can be with a computer program, too. Randomness is an
intrinsic feature of thought, not something which has to be
"artificially inseminated", whether through dice, decaying nuclei,
random number tables, or what-have-you. It is an insult to human
creativity to imply that it relies on such arbitrary sources.
What we see as randomness is often simply an effect of looking
at something symmetric through a "skew" filter. An elegant example
was provided by Salviati's two ways of looking at the number it/4.
Although the decimal expansion of 7r/4 is not literally random, it
is as random as one would need for most purposes: it is
"pseudorandom". Mathematics is full of pseudorandomness-plenty
enough to supply all would-be creators for all time.
Just as science is permeated with "conceptual revolutions" on
all levels at all times, so the thinking of individuals is shot
through and through with creative acts. They are not just on the
highest plane; they are everywhere. Most of them are small and have
been made a million times before-but they are close cousins to the
most highly creative and new acts. Computer programs today do not
yet seem to produce many small creations. Most of what they do is
quite "mechanical" still. That just testifies to the fact that they
are not close to simulating the way we think-but they are getting
closer.
Perhaps what differentiates highly creative ideas from ordinary
ones is some combined sense of beauty, simplicity, and harmony. In
fact, I have a favorite "meta-analogy", in which I liken analogies
to chords. The idea is simple: superficially similar ideas are
often not deeply related; and deeply related ideas are often
superficially disparate. The analogy to chords is natural:
physically close notes are harmonically distant (e.g., E-F-G); and
harmonically close notes are physically distant (e.g., G-E-B).
Ideas that share a conceptual skeleton resonate in a sort of
conceptual analogue to harmony; these harmonious "idea-chords" are
often widely separated, as
measured on an imaginary "keyboard of concepts". Of course, it
doesn't suffice to reach wide and plunk down any old way-you may
hit a seventh or a ninth! Perhaps the present analogy is like a
ninth-chord-wide but dissonant.
Picking up Patterns on All Levels
Bongard problems were chosen as a focus in this Chapter because
when you study them, you realize that the elusive sense for
patterns which we humans inherit from our genes involves all the
mechanisms of representation of knowledge, including nested
contexts, conceptual skeletons and conceptual mapping,
slippability, descriptions and meta-descriptions and their
interactions, fission and fusion of symbols, multiple
representations (along different dimensions and different levels of
abstraction), default expectations, and more.
These days, it is a safe bet that if some program can pick up
patterns in one area, it will miss patterns in another area which,
to us, are equally obvious. You may remember that I mentioned this
back in Chapter 1, saying that machines can be oblivious to
repetition, whereas people cannot. For instance, consider SHRDLU.
If Eta Oin typed the sentence "Pick up a big red block and put it
down" over and over again, SHRDLU would cheerfully react in the
same way over and over again, exactly as an adding machine will
print out "4" over and over again, if a human being has the
patience to type "2+2" over and over again. Humans aren't like
that; if some pattern occurs over and over again, they will pick it
up. SHRDLU wasn't built with the potential for forming new concepts
or recognizing patterns: it had no sense of over and overview.
The Flexibility of Language
SHRDLU's language-handling capability is immensely
flexible-within limits. SHRDLU can figure out sentences of great
syntactical complexity, or sentences with semantic ambiguities as
long as-they can- be resolved by inspecting the data base-but it
cannot handle "hazy" language. For instance, consider the sentence
"How many blocks go on top of each other to make a steeple?" We
understand it immediately, yet it does not make sense if
interpreted literally. Nor is it that some idiomatic phrase has
been used. "To go on top of each other" is an imprecise phrase
which nonetheless gets the desired image across quite well to a
human. Few people would be misled into visualizing a paradoxical
setup with two blocks each of which is on top of the other-or
blocks which are "going" somewhere or other.
The amazing thing about language is how imprecisely we use it
and still manage to get away with it. SHRDLU uses words in a
"metallic" way, while people use them in a "spongy" or "rubbery" or
even "Nutty-Puttyish" way. If words were nuts and bolts, people
could make any bolt fit into any nut: they'd just squish the one
into the other, as in some surrealistic
painting where everything goes soft. Language, in human hands,
becomes almost like a fluid, despite, the coarse grain of its
components.
Recently, Al research in natural language understanding has
turned away somewhat from the understanding of single sentences in
isolation, and more towards areas such as understanding simple
children's stories. Here is a well-known children's joke which
illustrates the open-endedness of real-life situations:
A man took a ride in an airplane.
Unfortunately, he fell out.
Fortunately, he had a parachute on.
Unfortunately, it didn't work.
Fortunately, there was a haystack below him.
Unfortunately, there was a pitchfork sticking out of it.
Fortunately, he missed the pitchfork.
Unfortunately, he missed the haystack.
It can be extended indefinitely. To represent this silly story
in a frame-based system would be extremely complex, involving
jointly activating frames for the concepts of man, airplane, exit,
parachute, falling, etc., etc.
Intelligence and Emotions
Or consider this tiny yet poignant story:
Margie was holding tightly to the string of her beautiful new
balloon. Suddenly, a gust of wind caught it. The wind carried it
into a tree. The balloon hit a branch and burst. Margie cried and
cried.'
To understand this story, one needs to read many things between
the lines. For instance: Margie is a little girl. This is a toy
balloon with a string for a child to hold. It may not be beautiful
to an adult, but in a child's eye, it is. She is outside. The "it"
that the wind caught was the balloon. The wind did not pull Margie
along with the balloon; Margie let go. Balloons can break on
contact with any sharp point. Once they are broken, they are gone
forever. Little children love balloons and can be bitterly
disappointed when they break. Margie saw that her balloon was
broken. Children cry when they are sad. "To cry and cry" is to cry
very long and hard. Margie cried and cried because of her sadness
at her balloon's breaking.
This is probably only a small fraction of what is lacking at the
surface level. A program must have all this knowledge in order to
get at what is going on. And you might object that, even if it
"understands" in some intellectual sense what has been said, it
will never really understand, until it, too, has cried and cried.
And when will a computer do that? This is the kind of humanistic
point which Joseph Weizenbaum is concerned with making in his book
Computer Power and Human Reason, and I think it is an important
issue; in fact, a very, very deep issue. Unfortunately, many Al
workers at this time are unwilling, for various reasons, to take
this sort of point
seriously. taut in some ways, those Al workers are right: it is
a little premature to think about computers crying; we must first
think about rules for computers to deal with language and other
things; in time, we'll find ourselves face to face with the deeper
issues.
AI Has Far to Go
Sometimes it seems that there is such a complete absence of
rule-governed behavior that human beings just aren't rule-governed.
But this is an illusion-a little like thinking that crystals and
metals emerge from rigid underlying laws, but that fluids or
flowers don't. We'll come back to this question in the next
Chapter.
The process of logic itself working internally in the brain may
be more analogous to a succession of operations with symbolic
pictures, a sort of abstract analogue of the Chinese alphabet or
some Mayan description of events-except that the elements are not
merely words but more like sentences or whole stories with linkages
between them forming a sort of meta- or super-logic with its own
rules.'
It is hard for most specialists to express vividly-perhaps even
to remember-what originally sparked them to enter their field.
Conversely, someone on the outside may understand a field's special
romance and may be able to articulate it precisely. I think that is
why this quote from Ulam has appeal for me, because it poetically
conveys the strangeness of the enterprise of Al, and yet shows
faith in it. And one must run on faith at this point, for there is
so far to go!
Ten Questions and Speculations
To conclude this Chapter, I would like to present ten "Questions
and Speculations" about Al. I would not make so bold as to call
them "Answers"-these are my personal opinions. They may well change
in some ways, as I learn more and as Al develops more. (In what
follows, the term "Al program" means a program which is far ahead
of today's programs; it means an "Actually Intelligent" program.
Also, the words "program" and "computer" probably carry overly
mechanistic connotations, but let us stick with them anyway.)
Question: Will a computer program ever write beautiful
music?
Speculation: Yes, but not soon. Music is a language of emotions,
and until programs have emotions as complex as ours, there is no
way a program will write anything beautiful. There can be
"forgeries shallow imitations of the syntax of earlier music-but
despite what one might think at first, there is much more to
musical expression than can be captured in syntactical rules. There
will be no new kinds of beauty turned up for a long time by
computer music-composing programs. Let me carry this thought a
little further. To think-and I have heard this suggested-that we
might soon be able to command a preprogrammed mass-produced
mail-order twenty-dollar desk-model "music box" to bring forth from
its sterile circuitry pieces which Chopin or Bach might have
written had they lived longer is a grotesque and shameful
misestimation of the depth of the human spirit. A "program" which
could produce music as they did would have to wander around the
world on its own, fighting its way through the maze of life and
feeling every moment of it. It would have to understand the joy and
loneliness of a chilly night wind, the longing for a cherished
hand, the inaccessibility of a distant town, the heartbreak and
regeneration after a human death. It would have to have known
resignation and worldweariness, grief and despair, determination
and victory, piety and awe. In it would have had to commingle such
opposites as hope and fear, anguish and jubilation, serenity and
suspense. Part and parcel of it would have to be a sense of grace,
humor, rhythm, a sense of the unexpected-and of course an exquisite
awareness of the magic of fresh creation. Therein, and therein
only, lie the sources of meaning in music.
Question: Will emotions be explicitly programmed into a
machine?
Speculation: No. That is ridiculous. Any direct simulation of
emotions-PARRY, for example-cannot approach the complexity of human
emotions, which arise indirectly from the organization of our
minds. Programs or machines will acquire emotions in the same way:
as by-products of their structure, of the way in which they are
organized-not by direct programming. Thus, for example, nobody will
write a "falling-in-love" subroutine, any more than they would
write a "mistake-making" subroutine. "Falling in love" is a
description which we attach to a complex process of a complex
system; there need be no single module inside the system which is
solely responsible for it, however!
Question: Will a thinking computer be able to add fast?
Speculation: Perhaps not. We ourselves are composed of hardware
which does fancy calculations but that doesn't mean that our symbol
level, where "we" are, knows how to carry out the same fancy
calculations. Let me put it this way: there's no way that you can
load numbers into your own neurons to add up your grocery bill.
Luckily for you, your symbol level (i.e., you) can't gain access to
the neurons which are doing your thinking-otherwise you'd get
addle-brained. To paraphrase Descartes again:
"I think; therefore I have no access
to the level where I sum."
Why should it not be the same for an intelligent program? It
mustn't be allowed to gain access to the circuits which are doing
its thinking otherwise it'll get addle-CPU'd. Quite seriously, a
machine that can pass the Turing test may well add as slowly as you
or I do, and forsimilar reasons. It will represent the number 2 not
just by the two bits "10", but as a full-fledged concept the way we
do, replete with associations such as its homonyms "too" and "to",
the words "couple" and "deuce", a host of mental images such as
dots on dominos, the shape of the numeral '2', the notions of
alternation, evenness, oddness, and on and on ... With all this
"extra baggage" to carry around, an intelligent program will become
quite slothful in its adding. Of course, we could give it a '
pocket calculator , so to speak (or build one in). Then it could
answer very fast, but its performance would be just like that of a
person with a pocket calculator. There would be two separate parts
to the machine: a reliable but mindless part and an intelligent but
fallible part. You couldn't rely on the composite system to be
reliable, any more than a composite of person and machine is
necessarily reliable. So if it's right answers you're after, better
stick to the pocket calculator alone-don't