UNCORRECTED PROOF Mirror neurons and imitation: A computationally guided review Erhan Oztop a,b, * , Mitsuo Kawato a,b , Michael Arbib c a JST-ICORP Computational Brain Project, Kyoto, Japan b ATR, Computational Neuroscience Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto 619-0288, Japan c Computer Science, Neuroscience and USC Brain Project, University of Southern California, Los Angeles, CA 90089-2520, USA Abstract Neurophysiology reveals the properties of individual mirror neurons in the macaque while brain imaging reveals the presence of ‘mirror systems’ (not individual neurons) in the human. Current conceptual models attribute high level functions such as action understanding, imitation, and language to mirror neurons. However, only the first of these three functions is well-developed in monkeys. We thus distinguish current opinions (conceptual models) on mirror neuron function from more detailed computational models. We assess the strengths and weaknesses of current computational models in addressing the data and speculations on mirror neurons (macaque) and mirror systems (human). In particular, our mirror neuron system (MNS), mental state inference (MSI) and modular selection and identification for control (MOSAIC) models are analyzed in more detail. Conceptual models often overlook the computational requirements for posited functions, while too many computational models adopt the erroneous hypothesis that mirror neurons are interchangeable with imitation ability. Our meta-analysis underlines the gap between conceptual and computational models and points out the research effort required from both sides to reduce this gap. q 2006 Published by Elsevier Ltd. Keywords: Mirror neuron; Action understanding; Imitation; Language; Computational model 1. Introduction Many neurons in the ventral premotor area F5 in macaque monkeys show activity in correlation with the grasp 1 type being executed (Rizzolatti, 1988). A subpopulation of these neurons, the mirror neurons (MNs), exhibit multi-modal properties responding to the observation of goal directed movements performed by another monkey or an experimenter (e.g. precision or power grasping) for grasps more or less congruent with those associated with the motor activity of the neuron (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Rizzolatti, Fadiga, Gallese, & Fogassi, 1996). The same area includes auditory mirror neurons (Kohler et al., 2002) that respond not only to the view but also to the sound of actions with typical sounds (e.g. breaking a peanut, tearing paper). The actions associated with mirror neurons in the monkey seem to be transitive, i.e. to involve action upon an object and apply even to an object just recently hidden from view (e.g. Umilta et al., 2001). It is not possible to find individual mirror neurons in humans since electrophysiology is only possible in very rare cases and at specific brain sites in humans. Therefore, one usually talks about a ‘mirror region’ or a ‘mirror system’ for grasping identified by brain imaging (PET, fMRI, MEG, etc.). Other regions of the brain may support mirror systems for other classes of actions. An increasing number of human brain mapping studies now refer to a mirror system (although not all are conclusive). Collectively these data indicate that action observation activates certain regions involved in the execution of actions of the same class. However, in contrast to monkeys, intransitive actions have also been shown to activate motor regions in humans. The existence of a (transitive and intransitive) mirror system in the human brain has also been supported by behavioral experiments illustrating the so-called ‘motor interference’ effect where observation of a movement degrades the performance of a concurrently executed incon- gruent movement (Brass, Bekkering, Wohlschlager, & Prinz, 2000; Kilner, Paulignan, & Blakemore, 2003; see also Sauser & Billard, this issue, for functional models addressing this phenomenon). Because of the overlapping neural substrate for action execution and observation in humans as well as other primates, many researchers have attributed high level cognitive Neural Networks xx (xxxx) 1–18 www.elsevier.com/locate/neunet 0893-6080/$ - see front matter q 2006 Published by Elsevier Ltd. doi:10.1016/j.neunet.2006.02.002 * Corresponding author. Address: Department of Cognitive Neuroscience, ATR, Computational Neuroscience Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto 619-0288, Japan. Tel.: C81 774 95 1215; fax: C81 774 95 1236. E-mail address: [email protected](E. Oztop). 1 We restrict our discussion to hand-related neurons; F5 contains mouth- related neurons as well. NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18 + model ARTICLE IN PRESS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
18
Embed
+ model ARTICLE IN PRESS - CNSerhan/Papers/OztopKawatoArbib2006.pdf · UNCORRECTED PROOF Mirror neurons and imitation: A computationally guided review Erhan Oztop a,b,*, Mitsuo Kawato
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
+ model ARTICLE IN PRESS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
PROOF
Mirror neurons and imitation: A computationally guided review
Erhan Oztop a,b,*, Mitsuo Kawato a,b, Michael Arbib c
a JST-ICORP Computational Brain Project, Kyoto, Japanb ATR, Computational Neuroscience Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
c Computer Science, Neuroscience and USC Brain Project, University of Southern California, Los Angeles, CA 90089-2520, USA
Abstract
Neurophysiology reveals the properties of individual mirror neurons in the macaque while brain imaging reveals the presence of ‘mirror
systems’ (not individual neurons) in the human. Current conceptual models attribute high level functions such as action understanding, imitation,
and language to mirror neurons. However, only the first of these three functions is well-developed in monkeys. We thus distinguish current
opinions (conceptual models) on mirror neuron function from more detailed computational models. We assess the strengths and weaknesses of
current computational models in addressing the data and speculations on mirror neurons (macaque) and mirror systems (human). In particular, our
mirror neuron system (MNS), mental state inference (MSI) and modular selection and identification for control (MOSAIC) models are analyzed in
more detail. Conceptual models often overlook the computational requirements for posited functions, while too many computational models adopt
the erroneous hypothesis that mirror neurons are interchangeable with imitation ability. Our meta-analysis underlines the gap between conceptual
and computational models and points out the research effort required from both sides to reduce this gap.
q 2006 Published by Elsevier Ltd.
Keywords: Mirror neuron; Action understanding; Imitation; Language; Computational model
T88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
ORREC
1. Introduction
Many neurons in the ventral premotor area F5 in macaque
monkeys show activity in correlation with the grasp1 type
being executed (Rizzolatti, 1988). A subpopulation of these
neurons, the mirror neurons (MNs), exhibit multi-modal
properties responding to the observation of goal directed
movements performed by another monkey or an experimenter
(e.g. precision or power grasping) for grasps more or less
congruent with those associated with the motor activity of the
initial mirror neuron formation.7 Although MNS proposes
that MNs are initially evolved to support motor control, it does
not provide computational mechanisms showing this.
R900
901
902
903
904
CO10.1. The Model
MNS is a systems level model of the (monkey) mirror
neuron system for grasping. The computational focus of the
UN6 Only grasp related visual mirror neurons were addressed. A subsequent
study (Bonaiuto et al., 2005) has introduced a recurrent network learning
architecture that not only reproduces key results of Oztop and Arbib (2002) but
also addresses the data of Umilta et al. (2001) on grasping of recently obscured
objects and of Kohler et al. (2002) on audiovisual mirror neurons.7 Note again that monkeys have a mirror system but do not imitate. It is thus a
separate question to ask “How, in primates that do imitate, does the imitation
system build (both structurally and temporally) on the mirror system?”.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
905
906
907
908
ED PROOF
model is the development of mirror neurons by self-
observation; the motor production component of the system
is assumed to be in place and not modeled using neural
modules. The schemas8 (Arbib, 1981) of the model are
implemented with different level of granularity. Conceptually
those schemas correspond to brain regions as follows (see
Fig. 6). The inferior premotor cortex plays a crucial role when
the monkey itself reaches for an object. Within the inferior
premotor cortex area F4 is located more caudally than area F5,
and appears to be primarily involved in the control of proximal
movements (Gentilucci et al., 1988), whereas the neurons of F5
are involved in distal control (Rizzolatti et al., 1988). Areas IT
(inferotemporal cortex) and cIPS (caudal intraparietal sulcus)
provide visual input concerning the nature of the observed
object and the position and orientation of the object’s surfaces,
respectively, to AIP. The job of AIP is to extract the
affordances the object offers for grasping. By affordance we
mean the object properties that are relevant for grasping such as
the width, height and orientation. The upper diagonal in Fig. 6
corresponds to the basic pathway AIP/F5 canonical/M1
(primary motor cortex) for distal (reach) control. The lower
right diagonal (MIP/LIP/VIP/F4) of Fig. 6 provides the
proximal (reach) control portion of the MNS model. The
remaining modules of Fig. 6 constitute the sensory processing
(STS and area 7a) and the core mirror circuit (F5 mirror and
area 7b).
Mirror neurons do not fire when the monkey sees the hand
movement or the object in isolation; the sight of the hand
moving appropriately to grasp or manipulate a seen (or recently
seen) object is necessary for the mirror neurons tuned to the
given action to fire (Umilta et al., 2001). This requires schemas
for the recognition of the shape of the hand and the analysis of
its motion (performed by STS in the model), and for the
analysis of the hand-object relation (Fig. 7a). The information
gathered at STS and areas 7a are captured in the ‘hand state’ at
any instant during movement observation and serves as an
input to the core mirror circuit (F5 mirror and area Fig. 7b).
Although visual feedback control was not built into MNS, the
hand state components track the position of hand and fingers
relative to the object’s affordance (see Oztop and Arbib (2002)
for the full definition of the hand state) and can thus be used in
monitoring the successful progress of a grasping action
supporting motor control. The crucial point is that the
information provided by the hand state allows action
recognition because relations encoded in the hand state form
an invariant of the action regardless of the agent of the action.
This allows self-observation to train a system that can be used
for detecting the actions of others and recognizing them as one
of the actions of the self.
During training, the motor code represented by active F5
canonical neurons was used as the training signal for the
core mirror circuit to enable mirror neurons to learn which
8 A schema refers to a functional unit that can be instantiated as a modular
unit, or as a mode of operation of a network of modules, to fulfill a desired
input/output requirement (Arbib, 1981; Arbib et al., 1998).
909
910
911
912
OFFig. 6. A schematic view of the mirror neuron system (MNS) model. The MNS model learning mechanisms and simulations focus on the core mirror circuit marked
by the central diagonal rectangle (7b and F5 mirror), see text for details (Oztop & Arbib, 2002; reproduced with kind permission of Springer Science and Business
Media).
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 9
+ model ARTICLE IN PRESS
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
hand-object trajectories corresponded to the canonically
encoded grasps. We reiterate that the input to the F5 mirror
neurons is not the visual stimuli as created by the hand and the
object in the visual field but the ‘hand state trajectory’
(trajectory of the relation of the hand and the object) extracted
from these stimuli. Thus, training tunes the F5 mirror neurons
to respond to hand-object relational trajectories independent of
the owner of the action (‘self’ or ‘other’).
T1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
UNCORREC10.2. Relation to MNs
The focus of the simulations was the 7b-F5 complex (core
mirror circuit). The input and outputs of this circuit was
computed using various schemas providing a context to
analyze the circuit. The core mirror circuit was implemented
as a feedforward neural network (1-hidden layer back-
propagation network with sigmoidal activation units; hidden
layer: area 7b; output layer: F5 mirror) responding to
increasingly long initial segments of the hand-state trajectory.
The network could be trained to recognize the grasp type from
the hand state trajectory, with correct classification often being
achieved well before the hand reached the object. For the
preprocessing and training details the reader is referred to
Oztop and Arbib (2002).
Despite the use of a non-physiological neural network,
simulations with the model generated a range of predictions
about mirror neurons that suggest new neurophysiological
experiments. Notice that the trained network responded not
only to hand state trajectories from the training set, but also
showed interesting responses to novel grasping modes. For
example Fig. 7 shows one prediction of the MNS model. An
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PRO(power and precision grasp responsive neurons) during the
early portion of the movement observation. Only later does the
activity of the precision pinch neuron dominate and the power
grasp neuron’s activity diminish.
Other predictions were derived from the spatial perturbation
experiment where the hand did not reach the goal (i.e. a ‘fake’
grasp), and the altered kinematics experiment where the hand
moved with constant velocity profile. The former case showed
a non-sharp decrease in the mirror neuron activity while the
latter showed a sharp decrease. The reader is referred to Oztop
and Arbib (2002) for the details and other simulation
experiments.
Recently, Bonaiuto, Rosta, and Arbib (2005) developed the
MNS2 model, a new version of the MNS model of action
recognition learning by mirror neurons of the macaque brain,
using a recurrent architecture that is biologically more
plausible than that of the original model. Moreover, MNS2
extends the capacity of the model to address data on audio–
visual mirror neurons (Kohler et al., 2002) and on response of
mirror neurons when the target object was recently visible but
is currently hidden (Umilta et al., 2001).
11. The mental state inference (MSI) model: forward
model hypothesis for MNs
The anatomical location (i.e. premotor cortex) and motor
response of mirror neurons during grasping suggest that the
fundamental function of mirror neurons may be rooted in grasp
control. The higher cognitive functions of mirror neurons, then
should be seen as a later utilization of this system, augmented
with additional neural circuits. Although MNS of the previous
T
Powergrasp
Normalized time
1.00.0
Firi
ng r
ate
0.0
1.0
(a)
(b)
Precisiongrasp
Fig. 7. Power and precision grasp resolution. (a) The left panel shows the initial
configuration of the hand while the right panel shows the final configuration of
the hand, with circles showing positions of the wrist in consecutive frames of
the trajectory. (b) The distinctive feature of this trajectory is that the hand
initially opens wide to accommodate the length of the object, but then thumb
and forefinger move into position for a precision grip. Even though the model
had been trained only on precision grips and power grips separately, its
response to this input reflects the ambiguities of this novel trajectory—the
curves for power and precision cross towards the end of the action, showing the
resolution of the initial ambiguity by the network. (Oztop & Arbib, 2002,
reproduced with kind permission of Springer Science and Business Media).
E. Oztop et al. / Neural Networks xx (xxxx) 1–1810
+ model ARTICLE IN PRESS
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
section adopted this view, it did not model the motor
component, which is addressed by the MSI model.
C 1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
UNCORRE11.1. Visual feedback control of grasping and the forward
model hypothesis for the mirror neurons
The mental state inference (MSI) model builds upon a visual
feedback circuit involving the parietal and motor cortices, with
a predictive role assigned to mirror neurons in area F5. For
understanding others’ intentions, this circuit is extended into a
mental state inference mechanism (Oztop et al., 2005). The
global functioning of the model for visual feedback control
proceeds as follows. The parietal cortex extracts visual features
relevant to the control of a particular goal-directed action (X,
the control variable) and relays this information to the
premotor cortex. The premotor cortex computes the motor
signals to match the parietal cortex output (X) to the desired
neural code (Xdes) relayed by prefrontal cortex. The ‘desired
change’ generated by the premotor cortex is relayed to
dynamics related motor centers for execution (Fig. 8, upper
panel). The F5 mirror neurons implement a forward prediction
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
circuit (forward model) estimating the sensory consequences of
F5 motor output related to manipulation, thus compensating for
the sensory delays involved in a visual feedback circuit. This is
in contrast to the generally suggested idea that mirror neurons
serve solely to retrieve an action representation that matches
the observed movement. During observation mode, these F5
mirror neurons are used to create motor imagery or mental
simulation of the movement for mental state inference (see
below). Although MSI does not specify the region within
parietal cortex that performs control variable computation,
recent findings suggest that a more precise delineation is
possible. Experiments with macaque monkeys indicate that
parietal area PF may be involved in monitoring the relation of
the hand with respect to an object during grasping. Some of the
PF neurons that do not respond to vision of objects become
active when the monkey (without any arm movement) watches
movies of moving hands (of the experimenter or the monkey)
for manipulation, suggesting that the neural responses may
reflect the visual feedback during observed hand movements
(Murata, 2005). It is also possible that a part of AIP may be
involved in monitoring grasping as shown by transcranial
magnetic stimulation (TMS) with humans (Tunik, Frey, &
Grafton, 2005). As in the MNS model, area F5 (canonical) is
involved in converting the parietal output (PF/AIP) into motor
signals, which are used by primary motor cortex and spinal
cord for actual muscle activation. In other words, area F5 non-
mirror neurons implement a control policy (assumed to be
learned earlier) to reduce the error represented by area PF/AIP
output.
11.2. Mental state inference
The ability to predict enables the feedback circuit of Fig. 8
(upper panel) to be extended into a system for inferring the
intentions of others based on the kinematics of goal directed
actions (see Fig. 8 lower panel). In fact, the full MSI model
involves a ‘mental simulation loop’ that is built around a
forward model (Blakemore & Decety, 2001; Wolpert &
Kawato, 1998), which in turn is used by a ‘mental state
inference loop’ to estimate the intentions of others. The MSI
model is described for generic goal-directed actions, however
here we look at the model in relation to a tool grasping
framework where two agents can each grasp a virtual hammer
with different intentions (holding, nailing or prying a nail).
Depending on the planned subsequent use of a hammer,
grasping requires differential alignment of the hand and the
thumb. Thus, the kinematics of the action provides information
about the intention of the actor. For this task, the mental state
was modeled as the intention in grasping the hammer. Within
this framework an observer ‘guesses’ the target object (one of
various objects in the demonstrator’s workspace) and the type
of grasp and produce an appropriate F5 motor signal that is
inhibited for actual muscle activation but used by the forward
model (MNs) (see Fig. 8 lower panel). With the sensory
outcome predicted by the MNs, the movement can be
simulated as if it were executed in an online feedback mode.
The match of the simulated sensations of a simulated
TD P
ROOF
Fig. 8. Upper panel: the MSI model is based on the illustrated visual feedback control organization. Lower panel: observer’s mental state inference mechanism.
Mental simulation of movement is mediated by utilizing the sensory prediction from the forward model and by inhibiting motor output. The difference module
computes the difference between the visual control parameters of the simulated movement and the observed movement. The mental state estimate indicates the
current guess of the observer about the mental state of the actor. The difference output is used to update the estimate or to select the best mental state. (Adapted from
Oztop et al., 2005).
Knuckle vector
Handle vector
Distance to handle
Via point forhandle grasping
Hammer normal
Distance tometal headVia point for
A
B
Handle Grasping
Metal-head Grasping
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 11
+ model ARTICLE IN PRESS
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
ORREC
movement with the sensation of observed movement will then
signal the correctness of the guess. The simulated mental
sensations and actual perception of movement is compared in a
mental state search mechanism. If the observer model ‘knows’
the possible mental states in terms of discrete items an
exhaustive search in the mental state space can be performed.
However, if the mental state space is not discrete then a
gradient based search strategy must be applied. The mental
state correction (i.e. the gradient) requires the parietal output
(PF) based errors (the Difference box in Fig. 8) to be converted
into ‘mental state’ space adjustments, for which a stochastic
gradient search can be applied (see Oztop et al. (2005) for the
details).
CPalm normal
metal-head grasping
Fig. 9. (A) The features used for nailing task (orientation and normalized distance)
are depicted in the right two arm drawings. The path of the hand is constrained with
appropriate via-points avoiding collision. The arm drawing on the left shows an
example of a handle grasping for driving a nail. The prying task is same as A except
7 that the Handle vector points towards the opposite direction (not shown). (B) The
features extracted for metal-head grasping is depicted (conventions are the same as
the upper panel). (Adapted from Oztop et al., 2005).
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
UN11.3. Relation to MNs and Imitation
A tool-use experiment was set-up in a kinematics simulation
where two agents could grasp a virtual hammer. The visual
parameters used to implement the feedback servo for grasping
(i.e. normalized distance and the orientation difference—see
Fig. 9) were object centered and provided generalization
regardless of the owner of the action (self vs. other). The time
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
Evarying (mental simulation)!(observation) matrix shown in
Fig. 10 represents the dynamics of an agent observing an actor
performing tasks of holding, nailing and prying (rows of the
matrix). Each column of the matrix represents the belief of the
observer as to whether the other is holding, nailing or prying.
TD P
ROOF
Fig. 10. The degree of similarity between visually extracted control variables and control variables obtained by mental simulation can be used to infer the intention of
an actor. Each subplot shows the probability that the observed movement (rows) is the same as the mentally simulated one (columns). The horizontal axis represents
the simulation time from movement start to end. The control variables extracted for the comparison are based on the mentally simulated movement. Thus, for
example, the first column inferences require the control parameters for holding (normalized distance to metal head and the angle between the palm normal and
hammer plane). The convergence to unity of the belief curves on the main diagonal indicates correct mental state inference. (Adapted from Oztop et al., 2005).
E. Oztop et al. / Neural Networks xx (xxxx) 1–1812
+ model ARTICLE IN PRESS
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
CORREC
Each cell shows the degree of similarity between mentally
simulated movement and the observed movement from move-
ment onset to movement end (as a belief or probability). The
observer can infer the mental state of the actor before the midpoint
of the observed movements as evidenced by the convergence of
belief curves to unity along the diagonal plots. Thus, although not
the focus of the study, the MSI model offers a basic imitation
ability that is based on reproducing the inferred intention (mental
state) of an observer. However, the actions that can be imitated are
thus limited to the ones in the existing repertoire and may not
respect the full details of the observed act. With MSI the dual
activation of MNs (forward model) is explained by the automatic
engagement of mental state inference during an action
observation, and by the forward prediction task undertaken by
the MNs for motor control during action execution.
N 1361
1362
1363
1364
1365
1366
1367
U12. Modular selection and identification for control
(MOSAIC) model
The MOSAIC model (Haruno et al., 2001; Wolpert &
Kawato, 1998) was introduced initially for motor control,
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
Eproviding mechanisms for decentralized automatic module
selection so as to achieve best control for the current task. In
this sense, compared to the earlier models surveyed,
MOSAIC is a sophisticated motor control architecture. The
key ingredients of MOSAIC are modularity and the
distributed cooperation and competition of the internal
models. The basic functional units of the model are multiple
predictor–controller (forward–inverse model) pairs where
each pair competes to contribute to the overall control (cf.
Jacobs, Jordan, Nowlan, and Hinton (1991), where the
emphasis is on selection of a single processor). The
controllers with better predicting forward models (i.e. with
higher responsibility signals) become more influential in the
overall control (Fig. 11). The responsibility signals are
computed with the normalization step shown in Fig. 11 based
on the prediction errors of the forward models via the
softmax function (see Table 1, last row). The responsibility
signals are constrained to be between 0 and 1, and add up to
1, so that that they can be considered as probabilities
indicating the likelihood of the controllers being effective for
the current task.
1368
T
Controller 1
Controller 2
Controller 3
Predictor 1
Predictor 2
Predictor 3
Body
norm
aliz
atio
n
+
+
+
+
x
x
x
Motorcommand
Sensory feedback (state)
Predicted state
Predictionerror
Responsibility signalsDesiredtrajectory
Fig. 11. The functioning of the MOSAIC model in the control mode. The responsibility signals indicate how well the control modules are suited for the control task at
hand. The overall control output is the sum of the output of controller modules as weighted by the responsibility signals.
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 13
+ model ARTICLE IN PRESS
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
The aim of motor control is to produce motor commands
G(t) at time t such that a desired state9 xdes(t) is attained by the
controlled system dynamics j. The net motor output G of the
MOSAIC model is determined by a set of adaptive controller–
predictor pairs (ji,fi) via the responsibility signals li which are
computed using the predictor outputs and the current state of
the system. The equations given in Table 1 describe the control
mechanism more rigorously (for simplicity, we use a discrete
time representation). The adaptive nature of the controller–
predictor pairs is shown with semicolons as ji($;wi) fi($;yi)meaning that ji and fi are functions determined by the
parameters wi and yi, which are typically the weights of a
function approximator or a neural network.
Rather than presenting the details of how the controller–
predictor pairs can be adapted (trained) for a variety of the
tasks we note that MOSAIC is described without strict
attachment to a particular learning method, so it is possible
to derive various learning algorithms for adapting controller–
predictor pairs. In particular, gradient descent (Wolpert &
Kawato, 1998) and expectation maximization (Haruno et al.,
2001) learning algorithms have been derived and applied for
motor control learning.
Table 1
The equations describing the control function of the MOSAIC model
Dynamics of the controlled system xðtC1ÞZJðxðtÞ;GðtÞÞ
The MOSAIC (net) control output GðtC1ÞZP
i lijiðxdðtC1Þ;xðtÞ;wiÞ
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
NCORREC12.1. Imitation and action recognition with MOSAIC
Although MOSAIC was initially proposed for motor
control, it is possible to utilize it for imitation and action
recognition. This dual use of the model establishes some
parallels between the model and the mirror neuron system. The
realization of imitation (and action recognition) with MOSAIC
requires three stages: first, the visual information of the actor’s
movement must be converted into a format that can be used as
inputs to the motor system of the imitator (Wolpert et al.,
2003). This requires that the visual processing system extracts
variables akin to state (e.g. joint angles) which can be fed to the
imitator’s MOSAIC as the ‘desired state’ of the demonstrator
(Wolpert et al., 2003). The second stage is that each controller
U9 The term ‘state’ generally represents the vector of variables that are
necessary to encapsulate the history of the system as a basis for describing the
system’s response to the external inputs, which then involves specification of
current output and of the updating of the state. For a point mass physical system
the state combines the position and velocity of the mass.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
OOF
generates the motor command required to achieve the observed
trajectory (i.e. the desired trajectory obtained from the
observation). In this ‘observation mode’, the outputs of the
controllers are not used for actual movement generation, but
serve as input to the predictors paired with the controllers (see
Fig. 12). Thus, the next likely states (of the observer) become
available as the output of the forward predictions. These
predictions then can be compared with the demonstrator’s
actual next state to provide prediction errors that indicate, via
responsibility signals, which of the controller modules of the
imitator must be active to generate the movement observed
(Wolpert et al., 2003).
ED PR12.2. Relation to MNs and imitation
The responsibility signals are computed by (softmax)
normalizing the prediction errors as shown in Fig. 12. Notice
that the responsibility signals can be treated as symbolic
representation describing the observed (continuous) action.
The temporal stream of responsibility signals representing the
observed action then can be used immediately or stored for
later reproduction of the observed action. Simulations with the
MOSAIC model indicate that the aforementioned imitation
mechanism can be used to imitate the task of swinging up a one
degree of freedom jointed stick against gravity through
observation of successful swing ups (Doya et al., 2000).
Within the MOSAIC framework the output of predictors
might be considered analogous to mirror neuron activity. This
would be compatible with the view of the MSI model, where it
is suggested that the mirror neurons may implement a motor-
to-sensory forward model. It is however necessary to point out
one difference. The MSI model deals with motor control
relying only on visual input and kinematics (leaving the
Individual controller (inverse model)
outputs
uiðtÞZjiðxdðtC1Þ;xðtÞ;wiÞ
Individual predictions (forward
model)
xiðtC1ÞZfiðxðtÞ;uðtÞ;yiÞ
The responsibility signals liðtÞZeðxðtÞKxi ðtÞÞ
2 =d2
Pk
eðxðtÞKxk ðtÞÞ2 =d2
1477
1478
1479
1480
1481
1482
T
Controller 1
Controller 2
Controller 3
Predictor 1
Predictor 2
Predictor 3
ACTOR
norm
aliz
atio
n+
+
+
Current state
Predicted state Prediction error
Responsibilitysignals
DesiredTrajectory State trajectory
Fig. 12. The functioning of the MOSAIC model in the observation mode is illustrated. For imitation, the responsibility signals indicate which of the controller
modules must be active to generate the movement observed.
E. Oztop et al. / Neural Networks xx (xxxx) 1–1814
+ model ARTICLE IN PRESS
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
dynamics to the lower motor areas). In contrast, MOSAIC is a
true control system that deals with dynamics. Thus, the nature
of forward models in the two architectures is slightly different.
The output of the forward model required by MSI is in visual-
like coordinates (e.g. the orientation difference of the hand axis
and the target object), whereas for MOSAIC the output of the
forward models are more closely related to the intrinsic
variables of the controlled limb (e.g. joint position and
velocities). However, it is possible to envision an additional
dynamics-to-visual forward model that can take MOSAIC
forward model output and convert it to some extrinsic
coordinates (e.g. distance to the goal). In a way, the MSI’s
forward model can be such an integrated prediction circuit
implemented by several brain areas.
A final note here is that the internal models envisioned in the
neuroscience literature (e.g. Carr et al., 2003; Iacoboni et al.,
1999; Miall, 2003) are usually at a much higher level than the
internal models of the MOSAIC or the MSI model introduced
here, which are much harder to learn from a computational
point of view.
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
UNCORREC13. A Taxonomy of models based on modeling methodology
When the system to be modeled is complex it is often
necessary to focus on one or two features of the system in any
one model. The focus of course is partly determined by the
modeling methodology followed by the modeler. Here, we
present a taxonomy of modeling methodologies one can
follow, and compare the models we have presented
accordingly.
The utility of a model increases with its generality and
ability to explain and predict observed and unobserved
behavior of the system modeled. The validity and utility of a
model is leveraged when all the known facts are incorporated
into the model. This is called data driven modeling where the
modeler’s task is to develop a computational mechanism
(equations and computer simulations) that replicates the
observed data with the hope that some interesting, non-trivial
predictions can be made. This is the main modeling approach
for cellular level neuron modeling. Although one would expect
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
that the neurophysiological data collected so far is to be widely
used as the basis for computational modeling, we unfortunately
lack sufficient quantitative data on the neurophysiology of the
mirror system. Most mirror system (related) modeling assumes
the generic properties of mirror neurons to build imitation
systems rather than addressing hard data. We have included
this type of model in the taxonomy because there is certain
utility of those models as they lead to questions about the
relation between mirror neurons and imitation.
Models based on evolutionary algorithms have been used in
modeling the behaviors of organisms, and developing neural
circuits to achieve a prespecified goal (e.g. central pattern
generators) in a simplified simulated environment. Although
this type of modeling, in general, does not make use of the
available data, the model of Borenstein and Ruppin (2005)
suggests how mirror neurons might have come to be involved
in imitation or other cognitive tasks. However, as we have
already noted, ‘real’ evolution may have exapted the mirror
system to support imitation, rather than starting from the need
to imitate and ‘discovering’ mirror neurons as a necessary tool.
An evolutionary point of view can also be adopted to build
models that do not employ evolutionary algorithms, which start
off by postulating a logical reason for the existence of the
mirror neurons. The logic can be based on the location of the
mirror neurons, or on the known general properties of neural
function. The former logic dictates that mirror neurons must be
involved in motor control. The latter logic (phenomenal) draws
on Hebbian plasticity mechanisms and dictates that represen-
tations of contingent events are associated in the cerebral
cortex. Fig. 13 illustrates how the models we have presented
fall into our taxonomy. However, note that this taxonomy
should not be taken as defining sharp borders between models.
Models focusing on imitation can be cast as being developed
following a ‘reason of existence’. For example DRAMA
architecture (Billard & Hayes, 1999) employs a Hebbian like
learning mechanism and thus can be considered in the ASSOC
category in Fig. 13. Similarly, although no motor control role
was emphasized for mirror neurons in Demiris’s imitation
system (Demiris & Johnson, 2003), it employs mechanisms
similar to those of the MOSAIC model.
UNCORRECTED PROOF
Modeling Methodology
AssumeExistence
Reason forExistence
MNS&
MNS2
MOSAIC
MSI
Data DrivenEvolutionaryAlgorithm
Phenomenal Motor ControlArtificial and
PhysicalImitation Systems
Virtual World andAgent Systems
?Borenstein &Ruppin
AnatomyPhysiology Behavior
ASSOC Demiris
RNNPB
DRAMA
Fig. 13. A taxonomy of modeling methodologies and the relation of the models presented. Dashed arrows indicate that the linked models are similar or can be cast to
be similar (see text).
Table 2
A very brief summary of models presented in terms of biological relevance, architecture and the relevant results and predictions