Mind Reading computer
INTRODUCTIONPeople express their mental states, including
emotions, thoughts, and desires, all the time through facial
expressions, vocal nuances and gestures. This is true even when
they are interacting with machines. Our mental states shape the
decisions that we make, govern how we communicate with others, and
affect our performance. The ability to attribute mental states to
others from their behavior and to use that knowledge to guide our
own actions and predict those of others is known as theory of mind
or mind-reading.
Existing human-computer interfaces are mind-blind oblivious to
the users mental states and intentions. A computer may wait
indefinitely for input from a user who is no longer there, or
decide to do irrelevant tasks while a user is frantically working
towards an imminent deadline. As a result, existing computer
technologies often frustrate the user, have little persuasive power
and cannot initiate interactions with the user. Even if they do
take the initiative, like the now retired Microsoft Paperclip, they
are often misguided and irrelevant, and simply frustrate the user.
With the increasing complexity of computer technologies and the
ubiquity of mobile and wearable devices, there is a need for
machines that are aware of the users mental state and that
adaptively respond to these mental states.
WHAT IS MIND READING?A computational model of mind-reading
Drawing inspiration from psychology, computer vision and machine
learning, the team in the Computer Laboratory at the University of
Cambridge has developed mind-reading machines computers that
implement a computational model of mind-reading to infer mental
states of people from their facial signals. The goal is to enhance
human-computer interaction through empathic responses, to improve
the productivity of the user and to enable applications to initiate
interactions with and on behalf of the user, without waiting for
explicit input from that user. There are difficult challenges:
Fig: Processing stages in the mind-reading system
Using a digital video camera, the mind-reading computer system
analyzes a persons facial expressions in real time and infers that
persons underlying mental state, such as whether he or she is
agreeing or disagreeing, interested or bored, thinking or
confused.
Prior knowledge of how particular mental states are expressed in
the face is combined with analysis of facial expressions and head
gestures occurring in real time. The model represents these at
different granularities, starting with face and head movements and
building those in time and in space to form a clearer model of what
mental state is being represented. Software from Nevenvision
identifies 24 feature points on the face and tracks them in real
time. Movement, shape and colour are then analyzed to identify
gestures like a smile or eyebrows being raised. Combinations of
these occurring over time indicate mental states. For example, a
combination of a head nod, with a smile and eyebrows raised might
mean interest. The relationship between observable head and facial
displays and the corresponding hidden mental states over time is
modeled using Dynamic Bayesian Networks.
WHY MIND READING?
Current projects in Cambridge are considering further inputs
such as body posture and gestures to improve the inference. We can
then use the same models to control the animation of cartoon
avatars. We are also looking at the use of mind-reading to support
on-line shopping and learning systems
Fig:Monitoring a car driver
The mind-reading computer system presents information about your
mental state as easily as a keyboard and mouse present text and
commands. Imagine a future where we are surrounded with mobile
phones, cars and online services that can read our minds and react
to our moods. How would that change our use of technology and our
lives? We are working with a major car manufacturer to implement
this system in cars to detect driver mental states such as
drowsiness, distraction and anger.
Current projects in Cambridge are considering further inputs
such as body posture and gestures to improve the inference. We can
then use the same models to control the animation of cartoon
avatars. We are also looking at the use of mind-reading to support
on-line shopping and learning systems.
The mind-reading computer system may also be used to monitor and
suggest improvements in human-human interaction. The Affective
Computing Group at the MIT Media Laboratory is developing an
emotional-social intelligence prosthesis that explores new
technologies to augment and improve peoples social interactions and
communication skills.
HOW DOES IT WORK?
Fig: Futuristic headbandThe mind reading actually involves
measuring the volume and oxygen level of the blood around the
subject's brain, using technology called functional near-infrared
spectroscopy (fNIRS).
The user wears a sort of futuristic headband that sends light in
that spectrum into the tissues of the head where it is absorbed by
active, blood-filled tissues. The headband then measures how much
light was not absorbed, letting the computer gauge the metabolic
demands that the brain is making. The results are often compared to
an MRI, but can be gathered with lightweight, non-invasive
equipment.
Wearing the fNIRS sensor, experimental subjects were asked to
count the number of squares on a rotating onscreen cube and to
perform other tasks. The subjects were then asked to rate the
difficulty of the tasks, and their ratings agreed with the work
intensity detected by the fNIRS system up to 83 percent of the
time."We don't know how specific we can be about identifying users'
different emotional states," cautioned Sergio Fantini, a biomedical
engineering professor at Tufts. "However, the particular area of
the brain where the blood-flow change occurs should provide
indications of the brain's metabolic changes and by extension
workload, which could be a proxy for emotions like
frustration.""Measuring mental workload, frustration and
distraction is typically limited to qualitatively observing
computer users or to administering surveys after completion of a
task, potentially missing valuable insight into the users' changing
experiences.A computer program which can read silently spoken words
by analyzing nerve signals in our mouths and throats has been
developed by NASA.Preliminary results show that using button-sized
sensors, which attach under the chin and on the side of the Adam's
apple, it is possible to pick up and recognize nerve signals and
patterns from the tongue and vocal cords that correspond to
specific words.
"Biological signals arise when reading or speaking to oneself
with or without actual lip or facial movement," says Chuck
Jorgensen.
HEAD AND FACIAL ACTION UNIT ANALYSIS
Twenty four facial landmarks are detected using a face template
in the initial frame, and their positions tracked across the video.
The system builds on Facestation [1], a feature point tracker that
supports both real time and offline tracking of facial features on
a live or recorded video stream. The tracker represents faces as
face bunch graphs [23] or stack-like structures which efficiently
com- bine graphs of individual faces that vary in factors such as
pose, glasses, or physiognomy. The tracker outputs the position of
twenty four feature points, which we then use for head pose
estimation and facial feature extraction.
EXTRACTING HEAD ACTION UNITS
Natural human head motion typically ranges between 70-90o of
downward pitch, 55o of upward pitch, 70o of yaw (turn), and 55o of
roll (tilt), and usually occurs as a combination of all three
rotations [16].The output positions of the localized feature points
are sufficiently accurate to permit the use of efficient,
image-based head pose estimation. Expression invariant points such
as the nose tip, root, nostrils, inner and outer eye corners are
used to estimate the pose. Head yaw is given by the ratio of left
to right eye widths. A head roll is given by the orientation angle
of the two inner eye corners. The computation of both head yaw and
roll is invariant to scale variations that arise from moving toward
or away from the camera. Head pitch is determined from the vertical
displacement of the nose tip normalized against the distance
between the two eye corners to account for scale variations. The
system supports up to 50o , 30o and 50o of yaw, roll and pitch
respectively. Pose estimates across consecutive frames are then
used to identify head action units. For example, a pitch of 20o
degrees at time t followed by 15o at time t + 1 indicates a
downward head action, which is AU54 in the FACS coding.
EXTRACTING FACIAL ACTION UNITS Facial actions are identified
from component-based facial features (e.g.mouth) comprised of
motion, shape and color descriptors. Motion and shape-based
analysis are particularly suitable for a real time video system, in
which motion is inherent and places a strict upper bound on the
computational complexity of methods used in order to meet time
constraints. Color-based analysis is computationally efficient, and
is invariant to the scale or viewpoint of the face, especially when
combined with feature localization (i.e. limited to regions already
defined by feature point tracking). The shape descriptors are first
stabilized against rigid head motion. For that, we imagine that the
initial frame in the sequence is a reference frame attached to the
head of the user. On that frame, let (Xp , Yp ) be an anchor
point.2D projection of the approximated real point around which the
head rotates in 3D space. The anchor point is initially defined as
the midpoint between the two mouth corners when the mouth is at
rest, and is at a distance d from the line joining the two inner
eye corners l. In subsequent frames the point is measured at
distance d from l, after accounting for head turns.Fig 2: Polar
distance in determining a lip corner pull and lip puckerOn each
frame, the polar distance between each of the two mouth corners and
the anchor point is computed. The average percentage change in
polar distance calculated with respect to an initial frame is used
to discern mouth displays. An increase or decrease of 10% or more,
determined empirically, depicts a lip pull or lip pucker
respectively (Figure 2). In addition, depending on the sign of the
change we can tell whether the display is in its onset, apex,
offset. The advantages of using polar distances over geometric
mouth width and height (which is what is used in Tian et al [20])
are support for head motion and resilience to inaccurate feature
point tracking, especially with respect to lower lip points.
Fig 3 : Plot of aperture (red) and teeth (green) in
luminance-saturation spaceThe mouth has two color regions that are
of interest: aperture and teeth. The extent of aperture present
inside the mouth depicts whether the mouth is closed, lips parted,
or jaw dropped, while the presence of teeth indicates a mouth
stretch. Figure 3 shows a plot of teeth and aperture samples in
luminance-saturation space .Luminance given by the relative
lightness or darkness of the color, acts as a good discriminator
for the two types of mouth regions. A sample of n=125000 pixels was
used to learn the probability distribution functions of aperture
and teeth. A lookup table defining the probability of a pixel being
aperture given its luminance is computed for the range of possible
luminance values (0% for black to 100% for white). A similar lookup
table is computed for teeth. Online classification into mouth
actions proceeds as follows: For every frame in the sequence, we
compute the luminance value of each pixel in the mouth polygon. The
luminance value is then looked up to determine the probability of
the pixel being aperture or teeth. Depending on empirically
determined thresholds the pixel is classified as aperture or teeth
or neither. Finally, the total number of teeth and aperture pixels
are used to classify the mouth region into closed (or lips part),
jaw drop, or mouth stretch. Figure 4 shows classification results
of 1312 frames into closed, jaw drop and mouth stretch.
Fig 4: Classifying 1312 mouth regions into closed, jaw drop or
stretchCOGNITIVE MENTAL STATE INFERENCEThe HMM level outputs
likelihood for each of the facial expressions and head displays
.However, on their own, each display is a weak classifier that does
not entirely capture an underlying cognitive mental state. Bayesian
networks have successfully been used as an ensemble of classifiers,
where the combined classifier performs much better than any
individual one in the set [15]. In such probabilistic graphical
models, hidden states (the cognitive mental states in our case)
influence a number of observation nodes, which describe the
observed facial and head displays. In dynamic Bayesian networks
(DBN), temporal dependency across previous states is also encoded.
Training the DBN model entails determining the param- eters and
structure of a DBN model from data. Maximum likelihood estimates is
used to learn the parameters, while sequential backward elimination
picks the (locally) optimal network structure for each mental state
model. More details on how the parameters and structure are learnt
can be found in [13].EXPERIMENTAL EVALUATIONFor our experimental
evaluation we use the Mind reading dataset (MR) [3]. MR is a
computer-based guide to emotions primarily collected to help
individuals diagnosed with Autism recognize facial expressions of
emotion. A total of 117 videos, recorded at 30 fps with durations
varying between 5 to 8 seconds, were picked for testing. The videos
conveyed the following cognitive mental states: agreement,
concentrating, disagreement, thinking and un- sure and interested.
There are no restrictions on the head or body movement of actors in
the video. The process of labeling involved a panel of 10 judges
who were asked could this be the emotion name. ? When 8 out of 10
agree, a statistically significant majority, the video is included
in MR. To our knowledge MR is the only available, labeled
Fig 5: ROC curves for head and facial displaysresource with such
a rich collection of mental states and emotions, even if they are
posed.
We first evaluate the classification rate of the display
recognition layer and then the overall classification ability of
the system.DISPLAY RECOGNITION We evaluate the classification rate
of the display recognition component of the system on the following
6 displays: 4 head displays (head nod, head shake, tilt display,
turn display) and 2 facial displays (lip pull, lip pucker). The
classification results for each of the displays are shown using the
Receiver Operator Characteristic (ROC) curves (Figure 5). ROC
curves depict the relationship between the rate of correct
classifications and number of false positives (FP). The
classification rate of display d is computed as the ratio of
correct detections to that of all occurrences of d in the sampled
videos. The FP rate for d is given by the ratio of samples falsely
classified as d to that of all non-d occurrences. Table 2 shows the
classification rate that the system uses, and the respective FP
rate for each display.
A non-neutral initial frame is the main reason behind undetected
and falsely detected displays. To illustrate this, consider a
sequence that starts as a lip pucker. If the lip pucker persists
(i.e. no change in polar distance) the pucker display will pass
undetected. If on the other hand, the pucker returns to neutral
(i.e. increase in polar distance). It will be falsely classified as
a lip pull display. This problem could be solved by using the polar
angle and color analysis to approximate the initial mouth state.
The other reason accounting for misclassified mouth display is that
of inconsistent illumination. Possible solutions to dealing with
illumination changes include extending the color-based analysis to
account for overall brightness changes or having different models
for each possible lighting condition.MENTAL STATE RECOGNITIONWe
then evaluate the overall system by testing the inference of
cognitive mental states, using leave-5-out cross validation. Figure
6 shows the results of the various stages of the mind reading
system for a video portraying the mental state choosing, which
belongs to the mental state group thinking. The mental state with
the maximum likelihood over the entire video (in this case
thinking) is taken as the classification of the system.
87.4% of the videos were correctly classified.The recognition
rate of a mental class m is given by the total number of videos of
that class whose most likely class (summed over the entire video)
matched the label of the class m. The false positive rate for class
m (given by the percentage of files misclassified as m) was highest
for agreement (5.4%) and lowest for thinking (0%). Table 2
summarizes the results of recognition and false positive rates for
6 mental states.
A closer look at the results reveals a number of interesting
points. First, onset frames of a video occasionally portray a
different mental state than that of the peak. For example, the
onset of disapproving videos were misclassified as unsure .Although
this incorrectly biased the overall classification to unsure, one
could argue that this result is not entirely incorrect and that the
videos do indeed start off with the person being unsure. Second,
subclasses that do not clearly exhibit the class signature are
easily misclassified. For example, the assertive and decided videos
in the agreement group were misclassified as concentrating, as they
exhibit no smiles, and only very weak head nods. Finally, we found
that some mental states were closer to each other and could
co-occur. For example, a majority of the unsure files scored high
for thinking too.WEB SEARCH
For the first test of the sensors, scientists trained the
software program to recognize six words - including "go", "left"
and "right" - and 10 numbers. Participants hooked up to the sensors
silently said the words to themselves and the software correctly
picked up the signals 92 per cent of the time.
Then researchers put the letters of the alphabet into a matrix
with each column and row labeled with a single-digit number. In
that way, each letter was represented by a unique pair of number
co-ordinates. These were used to silently spell "NASA" into a web
search engine using the program.
"This proved we could browse the web without touching a
keyboard.
MIND-READING COMPUTERS TURN HEADS AT HIGH-TECH FAIR
Devices allowing people to write letters or play pinball using
just the power of their brains have become a major draw at the
world's biggest high-tech fair.Huge crowds at the CeBIT fair
gathered round a man sitting at a pinball table, wearing a cap
covered in electrodes attached to his head, who controlled the
flippers with great proficiency without using hands."He thinks:
left-hand or right-hand and the electrodes monitor the brain waves
associated with that thought, send the information to a computer,
which then moves the flippers," said Michael Tangermann, from the
Berlin Brain Computer Interface. But the technology is much more
than a fun gadget, it could one day save your lifeScientists are
researching ways to monitor motorists' brain waves to improve
reaction times in a crash. In an emergency stop situation, the
brain activity kicks in on average around 200 milliseconds before
even an alert driver can hit the brake. There is no question of
braking automatically for a driver -- "we would never take away
that kind of control," "However, there are various things the car
can do in that crucial time, tighten the seat belt, for example,"
he added. Using this brain-wave monitoring technology, a car can
also tell whether the driver is drowsy or not, potentially warning
him or her to take a break. At the g.tec stall, visitors watched a
man with a similar "electrode cap" sat in front of a screen with a
large keyboard, with the letters flashing in an ordered
sequence.The user concentrates hard when the chosen letter flashes
and the brain waves stimulated at this exact moment are registered
by the computer and the letter appears on the screen. The
technology takes a long time at present -- it took the man around
four minutes to write a five-lettered word -- but researchers hope
to speed it up in the near future. Another device allows users to
control robots by brain power. The small box has lights flashing at
differentADVANTAGES AND USES Mind Controlled Wheelchair1. This
prototype mind-controlled wheelchair developed from the University
of Electro Communications in Japan lets you feel like half
Professor X and half Stephen Hawkingexcept with the theoretical
physics skills of the former and the telekinetic skills of the
latter.2. A little different from the Brain-Computer Typing
machine, this thing works by mapping brain waves when you think
about moving left, right, forward or back, and then assigns that to
a wheelchair command of actually moving left, right, forward or
back.3. The result of this is that you can move the wheelchair
solely with the power of your mind. This device doesn't give you
MIND BULLETS (apologies to Tenacious D) but it does allow people
who can't use other wheelchairs get around easier.
4. The sensors have already been used to do simple web searches
and may one day help space-walking astronauts and people who cannot
talk. The system could send commands to rovers on other planets,
help injured astronauts control machines, or aid disabled
people.
5. In everyday life, they could even be used to communicate on
the sly - people could use them on crowded buses without being
overheard
6. The finding raises issues about the application of such tools
for screening suspected terrorists -- as well as for predicting
future dangerousness more generally. We are closer than ever to the
crime-prediction technology of Minority Report.7. The day when
computers will be able to recognize the smallest units in the
English languagethe 40-odd basic sounds (or phonemes) out of which
all words or verbalized thoughts can be constructed. Such skills
could be put to many practical uses. The pilot of a high-speed
plane or spacecraft, for instance, could simply order by thought
alone some vital flight information for an all-purpose cockpit
display. DISADVANTAGES AND PROBLEMSTapping Brains for Future
Crimes1. Researchers from the Max Planck Institute for Human
Cognitive and Brain Sciences, along with scientists from London and
Tokyo, asked subjects to secretly decide in advance whether to add
or subtract two numbers they would later are shown. Using computer
algorithms and functional magnetic resonance imaging, or fMRI, the
scientists were able to determine with 70 percent accuracy what the
participants' intentions were, even before they were shown the
numbers. The popular press tends to over-dramatize scientific
advances in mind reading. FMRI results have to account for heart
rate, respiration, motion and a number of other factors that might
all cause variance in the signal. Also, individual brains differ,
so scientists need to study a subject's patterns before they can
train a computer to identify those patterns or make predictions.2.
While the details of this particular study are not yet published,
the subjects' limited options of either adding or subtracting the
numbers means the computer already had a 50/50 chance of guessing
correctly even without fMRI readings. The researchers indisputably
made physiological findings that are significant for future
experiments, but we're still a long way from mind reading.
3. Still, the more we learn about how the brain operates, the
more predictable human beings seem to become. In the Dec. 19, 2006,
issue of The Economist, an article questioned the scientific
validity of the notion of free will: Individuals with particular
congenital genetic characteristics are predisposed, if not
predestined, to violence.
4. Studies have shown that genes and organic factors like
frontal lobe impairments, low serotonin levels and dopamine
receptors are highly correlated with criminal behavior. Studies of
twins show that heredity is a major factor in criminal conduct.
While no one gene may make you a criminal, a mixture of biological
factors, exacerbated by environmental conditions, may well do
so.
5. Looking at scientific advances like these, legal scholars are
beginning to question the foundational principles of our criminal
justice system.
6. For example, University of Florida law professor Christopher
Slobogin, who is visiting at Stanford this year, has set forth a
compelling case for putting prevention before retribution in
criminal justice.
7. It's a tempting thought. If there is no such thing as free
will, then a system that punishes transgressive behavior as a
matter of moral condemnation does not make a lot of sense. It's
compelling to contemplate a system that manages and reduces the
risk of criminal behavior in the first place.
8. Max Planck Institute, neuroscience and bioscience are not at
a point where we can reliably predict human behavior. To me, that's
the most powerful objection to a preventative justice system -- if
we aren't particularly good at predicting future behavior, we risk
criminalizing the innocent.9. We aren't particularly good at
rehabilitation, either, so even if we were sufficiently accurate in
identifying future offenders, we wouldn't really know what to do
with them.
10. Nor is society ready to deal with the ethical and practical
problems posed by a system that classifies and categorizes people
based on oxygen flow, genetics and environmental factors that are
correlated as much with poverty as with future criminality.
11. In time, neuroscience may produce reliable behavior
predictions. But until then, we should take the lessons of science
fiction to heart when deciding how to use new predictive
techniques.
12. The preliminary tests may have been successful because of
the short lengths of the words and suggests the test be repeated on
many different people to test the sensors work on everyone.13. The
initial success "doesn't mean it will scale up", he told New
Scientist. "Small-vocabulary, isolated word recognition is a quite
different problem than conversational speech, not just in scale but
in kind."14. that genes and organic factors like frontal lobe
impairments, low serotonin levels and dopamine receptors are highly
correlated with criminal behavior. Studies of twins show that
heredity is a major factor in criminal conduct. While no one gene
may make you a criminal, a mixture of biological factors,
exacerbated by environmental conditions, may well do so.15. Using
computer algorithms and functional magnetic resonance imaging, or
fMRI, the scientists were able to determine with 70 percent
accuracy what the participants' intentions were, even before they
were shown the numbers. CONCLUSIONTufts University researchers have
begun a three-year research project which, if successful, will
allow computers to respond to the brain activity of the computer's
user. Users wear futuristic-looking headbands to shine light on
their foreheads, and then perform a series of increasingly
difficult tasks while the device reads what parts of the brain are
absorbing the light. That info is then transferred to the computer,
and from there the computer can adjust it's interface and functions
to each individual.
One professor used the following example of a real world use:
"If it knew which air traffic controllers were overloaded, the next
incoming plane could be assigned to another controller."Hence if we
get 100% accuracy these computers may find various applications in
many fields of electronics where we have very less time to
react.
BIBILOGRAPHY
www.eurescom.de/message/default_Dec2004.aspblog.marcelotoledo.org/2007/10
www.newscientist.com/article/dn4795-nasa-develops-mindreading-system
http://blogs.vnunet.com/app/trackback/95409
22