p213-takanoATS (1)

Affective Communication Aid using Wearable Devicesbased on Biosignals

Yuji TakanoUniversity of Tsukuba

1-1-1 TennodaiTsukuba, Japan

[email protected]

Kenji SuzukiUniversity of Tsukuba/JST

1-1-1 TennodaiTsukuba, [email protected]

ABSTRACTWe propose a novel wearable interface for sharing facial ex-pressions between children with autism spectrum disorders(ASD) and their parents, therapists, and caregivers. The de-veloped interface is capable of recognizing facial expressionsbased on physiological signal patterns taken from facial bio-electrical signals and displaying the results in real time. Thephysiological signals are measured from the forehead andboth sides of the head. We veried that the proposed classi-cation method is robust against facial movements, blinking,and the head posture. This compact interface can supportthe perception of facial expressions between children withASD and others to help improve their communication.

Categories and Subject DescriptorsI.5.5 [Implementation]: Interactive systems; K.4.2 [SocialIssues]: Assistive technologies for persons with disabilities

General TermsMeasurement

KeywordsFacial expression, Smile sharing, Autism Spectrum Disorder

1. INTRODUCTIONIn this paper, we propose a novel interaction method for

sharing children's facial expressions with their parents inorder to facilitate communication. In human communica-tion, facial expressions carry some of the most importantnon-verbal information. Facial expressions include psycho-logical information, such as emotions, which are very impor-tant aspects of communication. People can read a person'sthoughts simply by observing their expression. Psycholog-ical studies have found that facial expressions can projectemotions such as disgust, sadness, happiness, fear, anger,and surprise [2]. Expressing these emotions is a universal

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor prot or commercial advantage and that copies bear this notice and the full cita-tion on the rst page. Copyrights for components of this work owned by others thanACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specic permissionand/or a fee. Request permissions from [email protected], June 1720, 2014, Aarhus, Denmark.Copyright 2014 ACM 978-1-4503-2272-0/14/06 ...$15.00.http://dx.doi.org/10.1145/2593968.2610455.

communication skill common to all humankind and does notmuch depend on culture. Understanding facial expressionscorrectly is very important for communication with otherpeople. Daily communication between parents and childrenis very important to building their relationship and has a keyrole in children's mental and social development. However,there are some cases where it is dicult for parents or care-givers to consistently recognize their children's facial expres-sions. For example, children with autism spectrum disorders(ASD) have diculties with communicating and socially in-teracting through facial expressions, even with their parents.Autism comprises a wide range of neurodevelopmental dis-orders, and its intensity diers greatly in individuals. There-fore, setting a clear boundary between healthy and autisticpeople is dicult, and the mechanisms of autism have notyet been claried. A typical example of the communicationdiculty in the case of autism includes the lack of facialexpressions and eye contact [5, 9]. Facial expressions playan important role in communication with others, and wewant to know when and how much their facial expressionchanges based on events in their daily lives. Previously, wereported on the relationship between smiles and positive so-cial behavior [4]. The smiles of children with ASD can bequantitatively measured and analyzed by using a wearabledevice [6]. There are many situations where reading andunderstanding a child's facial expressions are desirable.Various classication methods of facial expressions have

been proposed based on dierent features. The facial actioncoding system (FACS) [3] describes facial expressions basedon physical and anatomical criteria, and many researchershave embraced FACS to classify facial expressions [8]. Thereare also many approaches to capturing facial expressions.One method is to extract physical variations in facial fea-tures from video by means of image processing. This is anon-contact method that is the most commonly used to rec-ognize facial expressions; it is also easy to use, with littleeort needed to install the equipment. However, it has thedisadvantage of spatial limitations as it depends on the cam-era position and eld of view, and its accuracy is aected bythe head posture, so the target user has to face the cameraconstantly. For use in actual situations outside a controlledenvironment, there is little possibility of the subject stay-ing in the same position constantly, especially for childrenwho are moving and playing around. Thus, using imageprocessing is dicult. Another possible approach is to usemotion-capture technology to extract the three-dimensionalshape of the face from the markers' coordinates and measurethe physical features more properly. However, placing the

213

LED Display

Vibrator

Looking

Vibration

Figure 1: Smile sharing: proposed interactionparadigm

markers requires preparation, which makes this approach la-borious, and the markers can be easily occluded. Thus, thedevelopment of a method for capturing facial expressionsthat is easy to use and does not depend on spatial orienta-tion is still a dicult challenge. We have been developinga tool to detect the facial expressions of a person who hasdiculty with expressing their intent in an accurate and con-tinuous manner through the use of a wearable device. Thisallows users to not only capture the facial expression butalso share them with others, even if the face is not alwaysobservable by sensors installed in the environment, such ascameras or depth sensors. In this paper, we propose the con-cept of smile sharing, where a wearable device|namely, anaective communication aid|is used that meets the abovecriteria to communicate facial expressions. We evaluated thedevice to verify its performance through several case studies.

2. METHODOLOGYThe proposed system provides a novel method of interac-

tion, particularly between children and their parents, thatconsiders use in daily life. Figure 1 shows a conceptual di-agram of the proposed interaction frame. We rst describethe method for capturing and classifying the facial expres-sions independent of spatial orientation and then the sharingof the facial expressions.

2.1 Wearable InterfaceOur proposed wearable interface can capture facial ex-

pressions independent of the spatial orientation. To realizethis system, we use surface electromyography (sEMG) onthe forefront and sides of the face. sEMG can be capturedby using small electrodes to measure the bioelectrical signalsemitted from muscles that are activated to generate facialexpressions. Conventionally, the electrodes must be accu-rately pasted on the skin on top of facial muscles, includ-ing the orbicular muscles of the mouth and eyes, for sEMGmeasurement of the face. However, pasting electrodes onthe skin has some disadvantages: The process takes a longtime, and the electrodes are prone to interference from fa-cial movements. A possible approach to overcoming theseobstacles is measuring sEMG on the sides of face, i.e., dis-tal EMG [6]. We have previously shown that distal electrodelocations on areas of low facial mobility have a strong ampli-tude and are correlated to signals captured in the traditional

Electrode

ElectrodeSocket

Headband

LED

Electrode

Socet

Figure 2: Overview of head-mounted interface

positions on top of facial muscles. In this study, we measuredsEMG on the sides of the head and forehead to reduce inhibi-tion against physical variations of the face and developed aneasy-to-wear interface. We used patterns of acquired sEMGsignals to classify the facial expressions. By regarding fa-cial expressions as specic patterns of activity by severalfacial muscles, the interface can classify them without need-ing to identify individual muscle activity. A support vectormachine (SVM) was used for pattern classication and todierentiate smiles from other facial expressions.

2.2 Smile SharingWe propose a method for sharing facial expressions so that

a child's ambiguous or hidden expressions can be perceivedin real time. In the current implementation, we only clas-sify the child's smile and communicate it to their parentsthrough various modalities. A smile is a facial expressionthat represents happiness [1], and the perception of smilingfacilitates communication between children and their par-ents. Specically, perceiving a child's smile helps in under-standing what makes the child happy. By knowing what achild is interested in, the parents can communicate with himor her more intensely and feel more encouraged in their un-derstanding. For smile sharing, we used both light-emittingand vibration devices. Using a light-emitting device helpsthe parents perceive a child's smile even if the child turnsaway his or her face. The parents can also perceive thechild's smile by using a wrist-mounted vibration device evenif they are not relatively close, which can happen when play-ing. These methods are also viable for autistic children andtheir parents when the parents cannot look at their child'sface directly.

3. SYSTEM CONFIGURATIONThe system consists of an interface unit and signal pro-

cessing unit. The interface unit measures signals and out-puts the classication results. The signal processing unitclassies facial expressions based on the measured sEMGsignals and sends the result to the interface unit via Blue-tooth wireless communication.

3.1 Interface unitWe developed two dierent wearable interfaces: head-

mounted and wrist-mounted devices. Figure 2 shows anoverview of the head-mounted interface. The wrist-mountedinterface is a simple vibration device that simply vibrateswhen the smile is detected by the head-mounted interface.

214

Figure 3: Appearances of LED interface

Filtering

FeatureVector

ClassLabeling

Training

LED

SVM

Input Output

Electrode

400

Calibration

Rectificationand

Smoothing

MeasuringSection

Signal ProcessingSection

Light EmittingSection

Bluetooth Bluetooth

Comb BandPass

ICA

A/Dfs=1000[Hz]

y(t)b(t)x(t) c(t)

=100[ms]

Figure 4: Overview of facial recognition by usinghead-mounted interface

The head-mounted interface is used both to acquire facialsEMG and to display the resulting facial expression classi-cation. It comprises dry electrodes and an LED embed-ded in a headband. sEMG is acquired by the interface andsent to the signal processing unit through Bluetooth wire-less communication. We decided to use dry electrodes inthe interface, although they are prone to noise contamina-tion in the case of unstable contact with the skin, becausethey are much easier to apply on the skin and enable fastmeasurement of sEMG with minimal preparation time. Theheadband is made of elastic material, and the position of theelectrodes inside the headband is adjustable. Therefore, theinterface can manage dierent head sizes and shapes, andit holds the dry electrodes steady in place to provide betterstability. Figure 3 shows the appearance of the LED, andthe LED colors of the interface are white and red. The LEDemits a red light if the wearer is smiling and a white lightfor any other facial expression. The LED is tted in a smalltube as shown in Figure 3 to make it easily noticeable byothers.The wrist-mounted interface comprises a vibration motor

and presents the facial expression of the headband wearerthrough vibration. This interface vibrates if the personwearing the head-mounted interface is smiling. By usingthis interface, parents can perceive their child's smile evenif they cannot look at his or her face directly. In particular,the wrist-mounted interface can help the parents of autisticchildren perceive their child's smile.

3.2 Signal processing unitThe signal processing unit handles digital ltering and

pattern recognition processes. The sEMG signals acquiredfrom the interface unit are pre-processed and then classied

by the SVM based on their patterns. Figure 4 shows anoverview of the signal processing. The sEMG signals areacquired every 1 ms, and the facial recognition is performedwithin a certain time window ( = 150 ms). sEMG signalsvary depending on individual dierences and electrode po-sition. The system rst needs to be calibrated for each userby recording some facial expressions in advance and learn-ing the wearer's signal pattern and intensity. However, it isdicult for children with ASD to participate in this calibra-tion session. In such cases, the system user simply gives theperiod of smiling time as a reference, which is used as thebasis for smile recognition.

4. EXPERIMENTWe conducted two experiments to evaluate the perfor-

mance of the proposed system. In this section, we presentthe classication accuracy and robustness against head mo-tion of the system.

4.1 Evaluation of classication accuracyIn order to evaluate the classication accuracy of the pro-

posed system, we compared it to the human cognitive abil-ity to recognize smiling in an experimental setting. Werecorded videos of three people (persons A{C); each alter-nately smiled and had a neutral expression for two or threetimes over about 20 s while wearing the headband interface.Nine subjects (eight male, one female) in their twenties andthirties were recruited for the experiment. Informed con-sent was obtained from the participants in advance. Videosof the smiling/neutral faces (A{C) were shown to the sub-jects, and they were asked to mark the smile intervals byclicking a button to indicate the start and stop of smiles.We covered the LED in the videos to avoid inuencing thesubject's judgment. We calculated the maximum, minimum,and median values of precision and recall based on the clas-sications by the subjects and proposed system. Figure 5shows the results.As shown in the gure, the precision of each subject was

above 0.95, but the recall varied among subjects dependingon who created the facial expressions. The dierences in re-call may have been due to the dierent facial features, someof which are more dicult to recognize than others. Thismade it more dicult to set a threshold for smiling (as forsubject B), which lowered the recall. However, in terms ofclassication accuracy, the results were positive because theprecision average was sucient for potential applications.We also calculated the intra-class correlation coecient toevaluate the degree of coincidence between the classicationby the interface and the judgment of the subjects. The av-erage intra-class correlation coecient was more than 0.936,which is also sucient.

4.2 Evaluation of robustnessWe then conducted an experiment to investigate the ro-

bustness of the system against head motion artifacts. Inthis experiment, we investigated whether the system is ca-pable of classifying facial expressions when there are distur-bances such as head motions. The classication accuracyagainst head nodding (forward and back head movement),head tilting (left and right head tilting), head shaking (rightand left rotation), and blinking was checked to evaluate therobustness of the system. The three motions we investigated(nod, tilt, and shake) correspond to all possible motions

215

0.7

0.8

0.9

1

Precision

Recall

A B C

n=9Precision / Recall

Figure 5: Maximum, minimum, and median valuesof precision and recall

80

90

100

neutral

smile

n=8

no motion blink nod tilt shake

70

Acc

ura

cy[%

]

Figure 6: Maximum, minimum, and median valuesof robustness

(roll, pitch, and yaw); therefore, a positive result means thatthe system is likely to be robust against any combination ofhead motions. We asked the eight subjects to perform thisexperiment while wearing the headband. Each subject per-formed each motion for about 5 s while smiling or havinga neutral face. Blink represents 10 blinks, and Nod andTile were done twice each. Shake represents random headshaking along the yaw axis. Figure 6 shows the maximum,minimum, and median values of the classication accuracyfor each motion in the experiment. The results showed thatthe system classied neutral expression with no motion witha probability of 100%. The system was able to classify theneutral expression of most subjects with an accuracy of morethan 95% even when there were some disturbances. In thecase of smiles, there were some cases where the smile wasoccasionally not detected properly. In the most prominentcase, subjects reported that it was dicult to smile andblink at the same time, which probably contributed to theclassication accuracy for blinking being lower than others.However, the interface was capable of classifying smiles bythe majority of the subjects with an accuracy of more than90%.

5. DISCUSSION AND CONCLUSIONSIn this study, we considered the scenario of daily com-

munication between children and their parents and focusedon facial expressions, which are non-verbal information thatis important to facilitating communication. We proposedwearable interfaces to classify facial expressions based onfacial muscle activities and share them through light andvibration. We evaluated the classication accuracy and ro-bustness of the system through experiments and veried that

the acquired signals from the sides of the head and foreheadcan be used for facial expression classication. Through sev-eral experiments, we veried the classication accuracy ofthe developed system. The results demonstrated that theinterface can be used in real environments with some dis-turbances to classify facial expressions with high accuracyand to present smiles in real-time. Further investigation willinclude the implementation of adaptive ltering to removemotion artifacts.So far, we have presented the concept of a novel interac-

tion design between children and their parents and devel-oped interfaces that enable the realization of such interac-tion. We have already conducted a feasibility study withchildren having ASD during robot-assisted activities andconrmed that the proposed device is acceptable [7]. In thefuture, we plan to conduct a user study with children andfamilies to verify that the interfaces can support the sharingand perception of facial expressions in the given scenario.

6. REFERENCES[1] P. Ekman. An argument for basic emotions. Cognition

and Emotion, 6(3):169{200, 1992.

[2] P. Ekman. Emotions Revealed: Recognizing Faces andFeelings to Improve Communication and EmotionalLife. Times Books, 2003.

[3] P. Ekman and W. Friesen. Facial Action CodingSystem: A Technique for the Measurement of FacialMovement. Consulting Psychologists Press, 1978.

[4] A. Funahashi, A. Gruebler, T. Aoki, H. Kadone, andK. Suzuki. The smiles of a child with autism spectrumdisorder during an animal-assisted activity mayfacilitate social positive behaviors - quantitativeanalysis with smile-detecting interface. J Autism DevDisord, 44(3):685{693, 2014.

[5] K. Gray and B. Tonge. Are there early features ofautism in infants and preschool children? J PaediatrChild Health, 37(3):221{226, June 2001.

[6] A. Gruebler and K. Suzuki. Design of a wearable devicefor reading positive expressions from facial emg signals.IEEE Trans. on Aective Comput., (in press).

[7] M. Hirokawa, A. Funahashi, and K. Suzuki. A doll-typeinterface for real-time humanoid teleoperation inrobot-assisted activity: A case study. In ACM/IEEEIntl. Conf. on Human-Robot Interaction, pages174{175, 2014.

[8] J. J. Lien, T. Kanade, J. F. Cohn, and C. C. Li.Automated facial expression recognition based on facsaction units. In IEEE. Published in the Proceedings ofFG098, April 1998.

[9] F. R. Volkmar and L. C. Mayes. Gaze behavior inautism. Development and Psychopathology, 2(1):61{69,January 1990.

216

p213-takanoATS (1)

Documents

facial movements

facial expressions

facial expressions

facial bioelectrical

daily communication

autism spectrum disordersasd

parents inorder

affective communication