Top Banner
VOCAL VIBRATIONS A Multisensory Experience of the Voice Charles Holbrow MIT Media Lab E14-333B, 75 Amherst Street Cambridge, MA 02139 [email protected] Elena Jessop MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139 [email protected] Rébecca Kleinberger MIT Media Lab E14-333B, 75 Amherst Street Cambridge, MA 02139 [email protected] ABSTRACT Vocal Vibrations is a new project by XXX’s research group that seeks to engage the public in thoughtful singing and vocalizing, while exploring the relationship between human physiology and the resonant vibrations of the voice. This paper describes the motivations, the technical implemen- tation, and the experience design of the Vocal Vibrations public installation. This installation will consist of a space for reflective listening to a vocal composition (the Chapel) and an interactive space for personal vocal exploration (the Cocoon). In the interactive experience, the musical envi- ronment is shaped live by the participant’s vocal gestures. Simultaneously, the participant experiences a tangible ex- teriorization of his voice by holding the ORB, a handheld device that translates his voice and singing into tactile vi- brations. This installation encourages visitors to explore the physicality and expressivity of their voices in a rich mu- sical context. Through these vocal explorations, they can discover new aspects of their voices and gain a deeper ap- preciation of and for our everyday instrument. Keywords Voice, Vibrations, Expressive Interfaces, XXX, Signal Pro- cessing, Public Installations, Tactile Interfaces ... 1. INTRODUCTION : VOCAL VIBRATIONS The experience that a person has of his or her voice is quite intimate. It is infinitely expressive and individually defin- ing. However, many people today do not generally explore their vocal abilities, do not typically pay close attention to their voices, and do not feel comfortable “singing” or imagine they could participate in a rich musical experience through their voice. Our brain even responds less to our own voice than it does to other voices [16]. In the Vocal Vibrations project, we aim to guide participants in exploring a wide range of vocal sounds and vibrations. To address this, we are developing techniques to engage the public in the reg- ular practice of thoughtful singing and vocalizing, both as an individual experience and as part of a community. The first experience resulting from this work will be a The three co-authors are presented in alphabetical order, their contributions to the project are comparable Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. NIME’14, June 30 – July 03, 2014, Goldsmiths, University of London, UK. Copyright remains with the author(s). public interactive installation at a collaborative design stu- dio. We are designing the installation to provide two con- trasting spaces: one for a group experience, one for a private experience. Visitors will arrive in a larger room we call the Chapel, a venue for listening closely to a recorded electroa- coustic composition centered on the singing voice. The inner room, the Cocoon, provides a private isolated environment, inviting visitors to sing and participate in an interactive vocal experience. This initial Vocal Vibrations installation aims to raise awareness of the influence of the voice on our body and environment, as well as to give participants the ability to experience their voices in a new light, by enabling anyone to control a rich multi-sensory experience with only his or her voice. One of the goals of this project is to help even novices dis- cover the potential of their voice while providing them with direct access to the richness of a full musical experience. In- deed, with appropriate analysis and interactive systems, a user can become part of a complex vocal performance and have an active role in shaping the musical result through even the simplest use of his voice, such as exploring varia- tions on a single pitch. While a person vocalizes alone in the Cocoon, the experience oers two components. First, the musical environment reacts to and accompanies the user’s vocal gestures. Second, people experience an exteriorization of their voice through a handheld device, the Oral Resonant Ball (ORB), that maps their voice into the tactile sensation of vibration. For the interactive auditory and tactile expe- rience, we are designing the system such that there is no “right” or “wrong” way to use it. The system built for this project processes the user’s voice signal in real time to ex- tract features that are transposed into control parameters for an interactive experience. The system accompanies the user’s vocal explorations, while also guiding the user to ex- tend those explorations. We seek to expand our group’s work in technologies for sophisticated measurement and ex- tension of the singing voice in performance to create new kinds of vocal experiences in which everybody can partici- pate. We also seek to bring attention to the nature of the voice as an incredibly physical instrument. The act of singing and vocalizing creates vibrations throughout the body that can be altered through modifications to vocal production. However, people are generally not aware of or focused on these vibrations. The awareness of vocally-produced vibra- tions can be a source of meditative focus, as well as a way for everyone from novices to trained singers to understand their instrument better. Vocal coaches, singers, and voice professionals have very rich terminology to characterize dif- ferent voices. However, because the vocal apparatus is hid- den from sight, the vocabulary that is used is abstract and hard to grasp for the non-initiated [25]. The focus provided by tactile and physical feedback can help to give intimate,
6

A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

Jul 12, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

VOCAL VIBRATIONS

A Multisensory Experience of the Voice

Charles Holbrow⇤

MIT Media LabE14-333B, 75 Amherst Street

Cambridge, MA [email protected]

Elena JessopMIT Media Lab

E14-333A, 75 Amherst StreetCambridge, MA 02139

[email protected]

Rébecca KleinbergerMIT Media Lab

E14-333B, 75 Amherst StreetCambridge, MA 02139

[email protected]

ABSTRACTVocal Vibrations is a new project by XXX’s research groupthat seeks to engage the public in thoughtful singing andvocalizing, while exploring the relationship between humanphysiology and the resonant vibrations of the voice. Thispaper describes the motivations, the technical implemen-tation, and the experience design of the Vocal Vibrationspublic installation. This installation will consist of a spacefor reflective listening to a vocal composition (the Chapel)and an interactive space for personal vocal exploration (theCocoon). In the interactive experience, the musical envi-ronment is shaped live by the participant’s vocal gestures.Simultaneously, the participant experiences a tangible ex-teriorization of his voice by holding the ORB, a handhelddevice that translates his voice and singing into tactile vi-brations. This installation encourages visitors to explorethe physicality and expressivity of their voices in a rich mu-sical context. Through these vocal explorations, they candiscover new aspects of their voices and gain a deeper ap-preciation of and for our everyday instrument.

KeywordsVoice, Vibrations, Expressive Interfaces, XXX, Signal Pro-cessing, Public Installations, Tactile Interfaces ...

1. INTRODUCTION : VOCAL VIBRATIONSThe experience that a person has of his or her voice is quiteintimate. It is infinitely expressive and individually defin-ing. However, many people today do not generally exploretheir vocal abilities, do not typically pay close attention totheir voices, and do not feel comfortable“singing”or imaginethey could participate in a rich musical experience throughtheir voice. Our brain even responds less to our own voicethan it does to other voices [16]. In the Vocal Vibrationsproject, we aim to guide participants in exploring a widerange of vocal sounds and vibrations. To address this, weare developing techniques to engage the public in the reg-ular practice of thoughtful singing and vocalizing, both asan individual experience and as part of a community.The first experience resulting from this work will be a

⇤The three co-authors are presented in alphabetical order,their contributions to the project are comparable

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

NIME’14, June 30 – July 03, 2014, Goldsmiths, University of London, UK.

Copyright remains with the author(s).

public interactive installation at a collaborative design stu-dio. We are designing the installation to provide two con-trasting spaces: one for a group experience, one for a privateexperience. Visitors will arrive in a larger room we call theChapel, a venue for listening closely to a recorded electroa-coustic composition centered on the singing voice. The innerroom, the Cocoon, provides a private isolated environment,inviting visitors to sing and participate in an interactivevocal experience. This initial Vocal Vibrations installationaims to raise awareness of the influence of the voice on ourbody and environment, as well as to give participants theability to experience their voices in a new light, by enablinganyone to control a rich multi-sensory experience with onlyhis or her voice.

One of the goals of this project is to help even novices dis-cover the potential of their voice while providing them withdirect access to the richness of a full musical experience. In-deed, with appropriate analysis and interactive systems, auser can become part of a complex vocal performance andhave an active role in shaping the musical result througheven the simplest use of his voice, such as exploring varia-tions on a single pitch. While a person vocalizes alone in theCocoon, the experience o↵ers two components. First, themusical environment reacts to and accompanies the user’svocal gestures. Second, people experience an exteriorizationof their voice through a handheld device, the Oral ResonantBall (ORB), that maps their voice into the tactile sensationof vibration. For the interactive auditory and tactile expe-rience, we are designing the system such that there is no“right” or “wrong” way to use it. The system built for thisproject processes the user’s voice signal in real time to ex-tract features that are transposed into control parametersfor an interactive experience. The system accompanies theuser’s vocal explorations, while also guiding the user to ex-tend those explorations. We seek to expand our group’swork in technologies for sophisticated measurement and ex-tension of the singing voice in performance to create newkinds of vocal experiences in which everybody can partici-pate.

We also seek to bring attention to the nature of the voiceas an incredibly physical instrument. The act of singingand vocalizing creates vibrations throughout the body thatcan be altered through modifications to vocal production.However, people are generally not aware of or focused onthese vibrations. The awareness of vocally-produced vibra-tions can be a source of meditative focus, as well as a wayfor everyone from novices to trained singers to understandtheir instrument better. Vocal coaches, singers, and voiceprofessionals have very rich terminology to characterize dif-ferent voices. However, because the vocal apparatus is hid-den from sight, the vocabulary that is used is abstract andhard to grasp for the non-initiated [25]. The focus providedby tactile and physical feedback can help to give intimate,

Page 2: A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

while still objective, access to the voice. As the ORB ex-ternalizes the vibration of a user’s voice, it thus can helpthe user become more aware of the variation and range ofthat vibration. In exploring the relationships between hu-man physiology and the resonant vibrations of the voice, weseek to address many questions related to the voice and itsconnection with the body as well as its influence on mentaland physical health.

2. RELATED RESEARCH ON THE VOICE2.1 Voice, Body, Mind, and VibrationMost of us do not pay attention to the complex physicalprocesses involved in producing a vocal signal, particularlyone that is expressive or emotional. Additionally, the useof our voice is a goal-directed activity. All the complexpsychomotor sub-processes are activated without consciousseparation [23]. Yet, neurological research supports the ideathat the brain dissociates voice from speech when processingvocal information [45]. By comparing the auditory corticalresponse to voice in self-produced voiced sounds and in tape-recorded voice, it has been shown that the brain’s responseto self-produced voiced sounds is weaker [16]. This resultsuggests that during vocal production, there is an attenu-ation of the sensitivity of the auditory cortex and that thebrain modulates its activity as a function of the expectedacoustic feedback.Because the voice requires a perfect psychomotor syn-

chronization between many physical processes (such as thebreath, the tongue, the vocal tract muscles, the tension ofthe vocal folds, and the lips), the study of the voice canreveal details about a person’s health and mental state [1].Mental and emotional states are often apparent through thevoice because the physical procedure is closely shaped byemotions. Kenneth Stevens [40] describes those correlatesin terms of vocal modification in situations of strong arousal.For example, in the case of stress, variations in muscle con-traction and breathing patterns have a direct influence onthe sound of the voice. Max Little implemented a biologi-cally and mechanically based model that takes into accountnonlinearity, non-Gaussian and turbulent aspects present invocal production [31]. This work has been used in di↵erentclinical contexts from the evaluation of speech pathology,breath and lung problems, neuronal malfunctions, or mus-cular control di�culties, to the detection of early stages ofParkinson’s disease [30].Not only can studying the voice reveal information about

physical, mental and emotional states, but using the voicecan also a↵ect those states. In the subclinical domain, sev-eral studies have focused on the links between singing andthe physiological signs of wellbeing (heart rate, blood pres-sure, and stress level) [8, 14, 33]. Those studies generallyagree on the fundamental importance of breath control, in-duced by the use of the voice, as an important connectionbetween singing and physiology.However, very little work has been done on the e↵ects of

the vibrations produced in the body by singing, or on the re-laxation and meditation potential of the voice. Many stud-ies have shown that meditation training (especially mindful-ness meditation) may be an e↵ective component in treatingvarious disorders such as stress, anxiety, and chronic pain[18, 9]. Despite the voice being a major part of severalmeditation traditions, the e↵ects of the voice in meditationare mostly unexplored. In one study, medical imaging hasshown that cerebral blood flow changes during meditationthat incorporates chanting on resonant tones, in ways thatcannot be explained solely by breath control [19].

2.2 Measuring Vocal Emotion and ExpressionA key aspect of our research is how to recognize the af-fective, expressive, and personal content of an individual’svoice, and to determine what features of the voice help con-vey that information. Particularly in the domains of speechrecognition and detection of vocal dysfunction, studies haveexplored descriptive frameworks for vocal qualities [26, 39].Scherer identifies a variety of vocal features that conveyexpression, including vocal perturbation (short-term varia-tion), voice quality (timbre), intensity, tempo, and the rangeof the fundamental frequency over time. Other research hasfocused on a↵ective markers present in the voice. Cahn’swork on generating expression in a synthesized voice [6] of-fers a good overview of the acoustic and prosodic aspectsthat correlate to emotions, including basic voice parame-ters that are perceptually important to convey expressiv-ity (pitch, timing, voice quality and articulation). Fernan-dez explores the mathematical description of voice quality,highlighting the deconvolution of the speech signal into theglottal flow derivative and the vocal tract filter to be able tomathematically access the emotional quality of speech [11].

Other researchers have done analyses specific to the singingvoice with the goal of developing better algorithms for singingvoice synthesis [21] or using the singing voice as input toanother synthesis algorithm [42, 17, 12]. Kim separatesfeatures reflecting an individual’s vocal physiology (such asthe configuration of the vocal folds and vocal tract) fromfeatures reflecting an individual’s expressive performance(such as how those features vary over time). Features ofthe singing voice have also been used for synthesis models,such as Chreode, a computer generated tape piece using theCHANT program from IRCAM. Conceived as an interactiveinstrument, the CHANT synthesizer is based on physics butcontrolled by perceptual parameters, such as frequency ofthe fundamental, random variations of the fundamental, vi-brato, random variations of the vibrato, spectrum, formantsand fundamental, etc [38].

However, when it comes to extracting meanings or fea-tures from the voice, only a few feature extraction tools areadapted to the specific range, richness and complexity ofthe human voice. Our work is drawn from some research onextracting prosody features [24, 44], vocal quality elements[20, 21, 35], and a↵ective markers [2, 6, 11] from the vocalsignal.

2.3 Prior Expressive Vocal and Vibrational Ex-periences

Prior interfaces for manipulating the voice in performanceinclude handheld devices like Waisvisz’s The Hands [4]; sys-tems for changing vocal timbre such as those used by Lau-rie Anderson [36]; and wearable systems such as Sonami’sLady’s Glove [5], the Bodycoder system [3], and the BodySynthused by Pamela Z. [29]. However, the majority of these vo-cal manipulations are controlled by external buttons, key-boards, or a performer’s movement, rather than solely bythe parameters of the vocal input as occurs in the VocalVibrations installation.

Levin and Lieberman have also incorporated graphics shapedby vocal production into public installations in Messa diVoce, Hidden Worlds, and RE:MARK [28] (In these experi-ences, the amplitude and spectral content of visitors’ voiceswere used to a↵ect projected graphics. Another public in-stallation focusing on the voice is Oliver’s Singing Tree [34],with which visitors interacted through singing into a micro-phone. The “pitch, noisiness, brightness, volume, and for-mant frequencies” of their voices were measured, and theseparameters were used in real time to control a music gener-

Page 3: A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

ation engine and a video generation system. All of these in-stallations have a strong playful component, with the goal ofan interesting vocal experience. Focusing carefully on sub-tle variations of sound has been a component of new musiccompositions, such as Lucier’s “I Am Sitting in a Room”and Chowning’s “Phone” [32, 7] but not of an interactivevocal installation .Prior work also explores the possibilities of rich aesthetic

experiences centered around vibration. Skinscape [15] isa tool for composition in the tactile modality. Inspiredby knowing whether “the skin [is] capable of understand-ing and ultimately appreciating complex aesthetic informa-tion,” this work tackles the relatively uncharted field of tac-tile composition. Our work to create vibration experiencesderived from and driven by the voice is inspired by thisresearch.

3. THE VOCAL VIBRATIONS INSTALLA-TION

The Vocal Vibrations installation consists of two connectedspaces that encourage participants to experience and ex-plore the singing voice, especially their own voices, in thought-ful ways. When a participant first arrives at the Vocal Vi-brations installation, she will enter a communal space de-signed for close listening. In this space, which we call theChapel, the audio will be a precomposed electroacousticcomposition by XXX based on recordings of voices. Eachparticipant will then be approached individually by an as-sistant who will bring him or her to a smaller space in prepa-ration for the interactive vocal experience. This assistantwill instruct the participant to vocalize on one pitch, andwill help the participant to find a note that fits comfortablyin her range. This step will help the user get used to thenote and may also “tune” the system to the person’s voiceand specific note. The participant will then be brought to astructure specially designed by YYY, the Cocoon, where shewill be invited to sit, given headphones and the vibratingORB to hold, and left alone in the space. The participantwill then have a solo experience, approximately five min-utes long, where both the sound played in the headphonesand the behavior of the ORB will be controlled and shapedby her vocal explorations. The audio will be an interac-tive piece inspired by the composition in the Chapel, witha fixed structure but with the flow and variations of thecomposition controlled by the user. At the end of the soloexperience, the user will return to the Chapel, where she isfree to stay and listen as long as she wishes, as well as tovocally improvise along with the music if she desires. Allof the musical content in this installation is new materialcomposed by XXX.

3.1 The Chapel: Focused ListeningWhen visitors first arrive at Vocal Vibrations, they will en-ter the outer chamber, the Chapel, intended for a quiet,meditative experience. Here, in a new composition by XXX,singing voices will surround visitors and gently envelop themin sound. Visitors can remain in this space for as long orshort as they desire, choosing to join in through hummingor vocalizing or simply to listen. The composition in theChapel will be assembled from many layers of pre-recordedsolo and choral vocal material, designed such that a D isalmost always present in the score. Speakers will be lo-cated around the room such that the composition can bespatialized.An important part of this project, particularly in the

Chapel, is its use of surround sound techniques to put avisitor in the midst of the musical experience. Conventional

Figure 1: Vocal Vibrations installation: User Inter-action

sound reinforcement systems use loudspeakers to convey anelectronic audio signal as clearly as possible. For Vocal Vi-brations, we developed a conceptual model and software im-plementation that couples loudspeaker configuration withdigital instrument design. Each loudspeaker is treated as asignal processing node in 3D space that is connected to itsnearest neighbours. Each node has an identical set of soundprocessing capabilities, the ability to send instructions toneighboring nodes on how to apply these processing capa-bilities, and the ability to follow instructions received fromneighboring nodes.

For example, a speaker node might receive and followthese instructions:

• Begin playing AudioSample.wav

• Ramp volume of AudioSample.wav from 0. to 1. over300 milliseconds

• Wait 300 milliseconds

• Broadcast this instruction to all neighboring speakernodes that have not yet received this instruction

By following instructions like the one described above, wecreate surround sound experiences that swirl and envelop.

3.2 The Cocoon: Interactive Vocal ExperienceIn the second portion of the installation, a private environ-ment, the Cocoon, will allow individual visitors to have ameditative experience exploring the vibrations generated bytheir own voice, accented within this space through acous-tic and physical stimuli. The technical systems used forthis installation include: a real-time system for process-ing vocal signals, developed in Java; a flexible Java-basedmapping system that determines output control parametersfrom low-level extracted features and high-level parame-ters describing voice quality and vocal gesture; a Max/MSPpatch that controls the behavior of the sound system (in-cluding composition choices, samples, localisation, e↵ects);and a Max/MSP patch that controls the vibration behav-iors of the ORB. All systems communicate via the OpenSound Control protocol.

3.2.1 A Personal ExperienceFrom the Chapel, each participant will be guided by an as-sistant into the interactive experience. A short “training”session will follow, in which the participant is encouraged

Page 4: A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

to take the first step into producing vocal sounds. The assis-tant first will assess the frequency range of the participant’svoice and then give him a D in the most comfortable octave.The participant will be asked to hold the D and will be giventhe simple guidance to explore a range of vocal variationson a single pitch, such as di↵erent vowels, sounds, rhythms,textures, and timbres. We seek to free participants to ex-periment with a wide range of vocalizations, including vari-ations on extended vocal techniques (such as Sprechgesang,inhaling, tremolo, overtones, and changing the shape of themouth) as well as even non- or semi-voiced sounds (such asbreathing and whispering).From this simple entry point, vocalizing on a single note,

a participant can take control of an interactive musical piecebased on the longer composition in the Chapel, as well asa corresponding vibration experience in a handheld device,the ORB. At di↵erent moments, short sentences will ap-pear on a screen in front of the participant to invite him tovocally explore in a particular way (“Like the surroundingsound,”“In an unexpected way,” etc.). In this exploration,there is no “right” or “wrong” way to vocalize, as the inter-active experience is constrained by composition choices andmapping decisions.

Figure 2: Prototype of The Cocoon Designed byYYY

3.2.2 Vocal ProcessingFor the system to be interactive, the first step is to extracta certain number of meaningfull features as well as the rawsignal from the voice. One of the important steps was de-termining which kinds of parameters we wanted to measurefrom the voice. We determined that there are two cate-gories of relevant information: low-level parameters (suchas frequency, amplitude, and harmonicity) and high-levelparameters that can be abstracted from the voice (such asenergy and complexity).First, as the participant vocalizes in a microphone, the

raw signal of his or her voice is used in real time as partof the behaviours of the ORB. The first level of control pa-rameters we extract are: pitch, loudness, linearly averagedfrequency spectrum, and harmonicity. These are computedby spectral analysis. Our objective in this choice of pa-rameters was to underline the feeling of an instinctive andimmediate connection from the user to the system. Theseelements of the voice are perceived very strongly, so theycan aid in creating an obvious link between vocal variationand the resulting output of a system.

3.2.3 Interactivity and High-Level ParametersIn addition to the vocal analysis parameters listed above, wealso are interested in a variety of abstract, high-level param-eters describing the “quality” of the voice or vocal gestures,such as energy, complexity, fluidity, intensity, and rate. To

obtain these abstract parameters, we are incorporating theExpressive Performance Extension System, a tool designedfor flexible mapping of input data streams to output controlparameters through a node-based visual language [43, 10].This system has been extended for the analysis of move-ment and vocal qualities [10]. It allows users to obtain rawinput data, extract expressive features, define desired vocaland physical qualities, perform pattern recognition to iden-tify those qualities, and manually map information aboutthese high-level expressive parameter spaces to output con-trol parameters of the interactive experience. In all cases,the outputs of the pattern recognition algorithms are con-tinuous values, not classification; the goal is not to label avocal gesture “staccato” or “fast,” but to find a position ona set of continuous expressive axes. Using the ExpressivePerformance Extension System, we can explore how bestto interpret that data and turn it into expressive informa-tion that is useful for creating interactive performances andinstallations.

In Vocal Vibrations, the low-level vocal parameters areused as inputs to pattern recognition processes within EPESthat define a participant’s current vocal expressivity in termsof high-level parameters. A combination of intermediateand high-level parameters are then used to shape the wayin which the sonic environment responds to the user’s voice,with the goal of creating audio accompaniment that is notonly immediate and clear, but also satisfyingly complex.The format of the resulting experience will be a blend of apre-composed experiential structure with moment-to-momentvariations around that structure shaped by the vocal explo-rations of the solo participant.

3.3 The ORBAs part of this project, we have also built the Oral Reso-nance Ball (ORB), a voice-activated vibrating device, whichmaps a vocal signal into tactile sensations. This device isdesigned to provide awareness of the physical processes in-volved in vocal production by giving feedback about andenhancing the vibrations produced in a person’s body. Fin-gertips contain more sensor receptors than our vocal vibrat-ing chamber [22, 27]; thus, the same vibrational signal sentinto the hands will be felt di↵erently and with more de-tail than when sent into the body. We have found that thehands can detect many variations in vibration caused byamplitude, frequency, and timbre. Additionally, researchon the Tactaid (a tactile hearing aid that codes sound in-formation via a set of vibrators resting on the forearm) hasshown that vibration enhances lipreading performance inhearing impaired individuals [13].

Holding the ORB in one’s hands while vocalizing can giveone access in another medium to detailed elements from thevoice that often remain latent in one’s everyday experienceof voice. Additionally, making the vibration of the voicesomething that can be experienced externally is intended toconnect people to their voice in a new way. We o↵er users atool to exteriorize their voice and experience another formof connection with it, as well as to engage with their voiceas one engages with an external instrument.

3.3.1 HardwareThe ORB is a ovaloid shaped frosted glass shell measur-ing about 15 by 15 by 10 centimeters, with five transducersattached on the inside wall. The materials and precise set-tings are chosen to maximise the tactile feeling of vibrationwhile minimizing any audible resonance from the device.The object can be held in the hands in any orientation andangle, with di↵erent positions varying the perceived vibra-tional e↵ects on the hands and fingers. (((Because glass has

Page 5: A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

the microstructural property of presenting no directionalatomic order, the material o↵ers the beneficial propertiesof smoothly blending the vibration from one transducer toanother while keeping certain localized e↵ects.)))

3.3.2 BehaviorThe control system for the ORB consists of a Max/MSPpatch that sends a processed signal to each of the 5 localizedchannels based on a set of control parameters. In additionto the raw vocal signal that is sent to the top channel forintuitive tactile feedback, the ORB also vibrates with addi-tional textures of dynamic tactile patterns on the survaceof the shell. The textures are made of granular tactile sig-nals with specific behavior of location, speed and scatter-ing around the surface, creating abstract impressions likepurring, hatching, whirling, etc. The real time feature ex-traction system and the Expressive Performance ExtensionSystem described above are used to control the dynamicsand localization of the vibrations in the device. The ORBcontrol parameters also shape additional e↵ects on the sig-nal such as delay, attenuation and a light feedback. Themapping of the ORB will change at moments during theinteractive experience.

Figure 3: The ORB: System Diagram

The skin’s response to stimuli is not linear. When codingthe behaviors of the Orb, we have had to take into accountthat the signal sent to the Orb is subjected to three serial,non-linear sources of physical alterations before being per-ceived: the transducers, the material of the shell, and theskin of the user’s fingers and palm. The nonlinearity of thetransducers is resolved by tuning them through applying adi↵erent gain to each of the five signals. We also will ad-just the strength of the signals in accordance with Stevens’Power Law [41]. In studies of sensitivity to vibrations andother tactile stimulus, it has been established that tactilesensitivity to vibration is highly dependent on the frequencyof that vibration. Additionally, the range of the vibrotactilefrequency response to which skin is sensitive is 20 - 1000 Hz.This range is much narrower than the auditory frequencyrange our ears can detect (20 - 20,000 Hz). Thus, the fre-quencies of signals sent to the Orb should di↵er from anaudio signal in order to be perceived through touch.

4. CONCLUSIONS AND FUTURE DIREC-TIONS

In this paper, we have described the Vocal Vibrations project,designed to encourage people to explore and pay thoughtfulattention to the range of their vocal sounds and vibrations,to have a rich musical experience centered on their voice,and to experience their voices in a new way. We have alsodescribed the first public installation being developed forthe project, including a space for careful, meditative lis-tening and a space for interactive vocal exploration. Thisinitial Vocal Vibrations installation will premiere at ZZZ inParis in March 2014, and remain installed for five months.

The Vocal Vibrations project is part of a larger researchinitiative around the human voice that our group is under-taking. This initial installation will serve as guidance as wedevelop our future research directions. Through examin-ing the experience of the audience, we will observe to whatextent the installation’s flow, technologies and interactionsallow the public to feel engaged. The design of the instal-lation will include preparatory stages of testing the systemon peers and iteration on our design based on those tests.Our discussions and assessment of both the initial experi-ments and the final installation will revolve around severalaspects: the quality of the overall experience; the reactivityof the system; the feeling of connection to and control bythe participant’s voice; and the coherence of the experience.The final development of the March installation will also re-veal the weaknesses and strengths of the overall experience.Those observations will guide us as we design a second ver-sion of the Vocal Vibrations installation intended to openin Cambridge in the fall of 2014.

We also seek to expand our explorations of the vibrationstied to the voice and methods for transforming a partici-pant’s experience of those vibrations. In this e↵ort, toolsbuilt for the deaf community can also be an interestingsource of inspiration, such as the Tadoma method of “tac-tile lip reading” [37], where a deaf person uses their hand topick up vibrations and movement from the speaker’s lips,jaw, cheek, and throat. This use of alternative senses to getas close as possible to the physical process voice productionis inspirational because it also brings people closer to theemotion and liveness of the voice.

Future work tied to Vocal Vibrations will also include aseries of multimedia experiences, including more individual“meditations” group singing experiences, and even personalmobile applications, all designed to help participants ex-plore their individual, expressive voice and its e↵ects on thebody through an enveloping context of immersive, respon-sive music.

5. ACKNOWLEDGMENTSAcknowledgements to be added when paper is no longeranonymized

6. REFERENCES[1] C. Adnene, B. Lamia, and M. Mounir. Analysis of

pathological voices by speech processing. In SignalProcessing and Its Applications, 2003. Proceedings.Seventh International Symposium on, volume 1, pages365–367 vol.1, July 2003.

[2] C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos.Features and classifiers for emotion recognition fromspeech: a survey from 2000 to 2011. ArtificialIntelligence Review, Nov. 2012.

[3] M. A. Bokowiec. V’oct (ritual): An interactive vocalwork for bodycoder system and 8 channelspatialization. In Proceedings of the InternationalConference on New Interfaces for Musical Expression,2011.

[4] A. Bongers. Tactual display of sound properties inelectronic musical instruments. Displays,18(3):129–133, 1998.

[5] B. Bongers. Physical interfaces in the electronic arts.Trends in gestural control of music, pages 41–70, 2000.

[6] J. F. Cahn. Generating expression in synthesizedspeech. Technical report, Speech Research Group,Media Laboratory, MIT, 1990.

[7] J. Chowning. Phone (1980-81). 1981.

Page 6: A Multisensory Experience of the Voice - MIT Media Labweb.media.mit.edu/~tod/media/pdfs/nime-VV_VN.pdf · 2014-04-18 · MIT Media Lab E14-333A, 75 Amherst Street Cambridge, MA 02139

[8] S. Clift, G. Hancox, I. Morrison, B. Hess, G. Kreutz,and D. Stewart. Choral singing and psychologicalwellbeing: Quantitative and qualitative findings fromEnglish choirs in a cross-national survey. Journal ofApplied Arts and Health, 1(1):19–34, Jan. 2010.

[9] R. J. Davidson, J. Kabat-Zinn, J. Schumacher,M. Rosenkranz, D. Muller, S. F. Santorelli,F. Urbanowski, A. Harrington, K. Bonus, and J. F.Sheridan. Alterations in brain and immune functionproduced by mindfulness meditation. Psychosomaticmedicine, 65(4):564–570, 2003.

[10] EEE. Capturing the body live: A framework fortechnological recognition and extension of physicalexpression in performance. (in press).

[11] R. Fernandez. A Computational Model for theAutomatic Recognition of A↵ect in Speech. PhDthesis, MIT, 2004.

[12] J. Freeman, S. Ramakrishnan, K. Varnik,M. Neuhaus, P. Burk, and D. Birchfield. Thearchitecture of auracle: a voice-controlled, networkedsound instrument. network, 5(6):7, 2005.

[13] K. L. Galvin, G. Mavrias, A. Moore, R. S. Cowan,P. J. Blamey, and G. M. Clark. A comparison oftactaid ii and tactaid 7 use by adults with a profoundhearing impairment. Ear and hearing, 20(6):471, 1999.

[14] C. Grape, M. Sandgren, L.-O. Hansson, M. Ericson,and T. Theorell. Does singing promote well-being?:An empirical study of professional and amateursingers during a singing lesson. Integrativephysiological and behavioral science : the o�cialjournal of the Pavlovian Society, 38(1):65–74, 2003.

[15] E. Gunther. Skinscape : A Tool for Composition inthe Tactile Modality by. PhD thesis, MIT, 2001.

[16] J. F. Houde, S. S. Nagarajan, K. Sekihara, and M. M.Merzenich. Modulation of the auditory cortex duringspeech: an MEG study. Journal of cognitiveneuroscience, 14(8):1125–38, Nov. 2002.

[17] J. Janer. Singing- driven interface for soundsynthesizers. PhD thesis, Universitat Pompeu FabraOf Barcelona, 2008.

[18] J. Kabat-Zinn, L. Lipworth, and R. Burney. Theclinical use of mindfulness meditation for theself-regulation of chronic pain. Journal of BehavioralMedicine, 8(2):163–190, 1985.

[19] B. G. Kalyani, G. Venkatasubramanian, R. Arasappa,N. P. Rao, S. V. Kalmady, R. V. Behere, H. Rao,M. K. Vasudev, and B. N. Gangadhar.Neurohemodynamic correlates of ?om?chanting: apilot functional magnetic resonance imaging study.International journal of yoga, 4(1):3, 2011.

[20] A. P. Kestian and T. Smyth. Real-Time Estimation ofthe Vocal Tract Shape for Musical Control. 2010.

[21] Y. E. Kim. Singing Voice Analysis / Synthesis. PhDthesis, MIT, 2003.

[22] T. Kitamura. Measurement of vibration velocitypattern of facial surface during phonation usingscanning vibrometer. Acoustical Science andTechnology, 33(2):126–128, 2012.

[23] P. Ladefoged. A Figure of Speech: A Festschrift forJohn Laver edited by William J. Hardcastle, JanetMackenzie Beck. pages 1–14, 1992.

[24] J. Laroche. Autocorrelation method for high-qualitytime/pitch-scaling. Proceedings of IEEE Workshop onApplications of Signal Processing to Audio andAcoustics, pages 131–134, 1993.

[25] M. Latinus and P. Belin. Human voice perception.

Current biology : CB, 21(4):R143–5, Feb. 2011.[26] J. LAVER. The Phonetic Description of Voice

Quality. Cambridge Studies in Linguistics London,31:1–186.

[27] S. Levanen and D. Hamdorf. Feeling vibrations:enhanced tactile sensitivity in congenitally deafhumans. Neuroscience letters, 301(1):75–77, 2001.

[28] G. Levin and Z. Lieberman. In-situ speechvisualization in real-time interactive installation andperformance. In NPAR, volume 4, pages 7–14, 2004.

[29] G. Lewis. The virtual discourses of pamela z. Journalof the Society for American Music, 1(1):57–77, 2007.

[30] M. A. Little. Biomechanically Informed NonlinearSpeech Signal Processing. PhD thesis, University ofOxford, 2006.

[31] M. a. Little, P. E. McSharry, I. M. Moroz, and S. J.Roberts. Testing the assumptions of linear predictionanalysis in normal vowels. The Journal of theAcoustical Society of America, 119(1):549, 2006.

[32] A. Lucier. I am sitting in a room. 2000.[33] V. Muller and U. Lindenberger. Cardiac and

respiratory patterns synchronize between personsduring choir singing. PloS one, 6(9):e24893, Jan. 2011.

[34] W. D. Oliver. The Singing Tree: a novel interactivemusical experience. PhD thesis, MassachusettsInstitute of Technology, 1997.

[35] Y. Qi and R. E. Hillman. Temporal and spectralestimations of harmonics-to-noise ratio in humanvoice signals. The Journal of the Acoustical Society ofAmerica, 102(1):537–43, July 1997.

[36] H. Rapaport. Can you say hello?” laurie anderson’s”united states. Theatre Journal, 38(3):339–354, 1986.

[37] C. Reed, W. Rabinowitz, N. Durlach, L. Braida,S. Conway-Fithian, and M. Schultz. Research on thetadoma method of speech communication. TheJournal of the Acoustical society of America, 77:247,1985.

[38] X. Rodet, Y. Potard, and J.-B. Barriere. The chantproject: from the synthesis of the singing voice tosynthesis in general. Computer Music Journal,8(3):15–31, 1984.

[39] K. R. Scherer. Vocal a↵ect expression: a review and amodel for future research. Psychological bulletin,99(2):143–65, Mar. 1986.

[40] K. N. Stevens. Acoustic phonetics, volume 30. TheMIT press, 2000.

[41] S. S. Stevens. Tactile vibration : Change of exponentwith frequency ’. Perception & Psychophysics, 3(38),1968.

[42] D. Stowell. Making music through real-time voicetimbre analysis : machine learning and timbralcontrol. PhD thesis, Queen Mary University ofLondon, 2010.

[43] P. A. Torpey and E. N. Jessop. Disembodiedperformance. In CHI’09 Extended Abstracts onHuman Factors in Computing Systems, pages3685–3690. ACM, 2009.

[44] N. Tsakalos and E. Zigouris. Autocorrelation-basedpitch determination algorithms for realtime vocoderswith the TMS32020/C25. Microprocessors andMicrosystems, 14(8):511–516, Oct. 1990.

[45] K. von Kriegstein, E. Eger, A. Kleinschmidt, andA. L. Giraud. Modulation of neural responses tospeech by directing attention to voices or verbalcontent. Brain research. Cognitive brain research,17(1):48–55, June 2003.