-
Multilevel Auditory Displays forMobile Eyes-Free
Location-BasedInteraction
Yolanda Vazquez-AlvarezGISTSchool of Computing ScienceUniversity
of Glasgow,Glasgow G12 8QQ,
[email protected]
Rocio von JungenfeldEdinburgh College of ArtUniversity of
EdinburghEdinburgh EH8 9DF, [email protected]
Matthew P. AylettCereProc, Ltd.University of EdinburghEdinburgh
EH8 9LE, [email protected]
Antti VirolainenNokia Research CentreHelsinki,
[email protected]
Stephen A. BrewsterGISTSchool of Computing ScienceUniversity of
GlasgowGlasgow G12 8QQ, [email protected]
Permission to make digital or hard copies of part or all of this
work forpersonal or classroom use is granted without fee provided
that copies are notmade or distributed for profit or commercial
advantage and that copies bearthis notice and the full citation on
the first page. Copyrights for third-partycomponents of this work
must be honored. For all other uses, contact theOwner/Author.
Copyright is held by the owner/author(s).CHI 2014 , Apr 26 - May 01
2014, Toronto, ON, CanadaACM
978-1-4503-2474-8/14/04.http://dx.doi.org/10.1145/2559206.2581254
AbstractThis paper explores the use of multilevel auditory
displaysto enable eyes-free mobile interaction with
location-basedinformation in a conceptual art exhibition space.
Multi-level auditory displays enable user interaction with
con-centrated areas of information. However, it is necessaryto
consider how to present the auditory streams withoutoverloading the
user. We present an initial study in whicha top-level exocentric
sonification layer was used to adver-tise information present in a
gallery-like space. Then, in asecondary interactive layer, three
different conditions wereevaluated that varied in the presentation
(sequential ver-sus simultaneous) and spatialisation
(non-spatialised ver-sus egocentric spatialisation) of multiple
auditory sources.Results show that 1) participants spent
significantly moretime interacting with spatialised displays, 2)
there wasno evidence that a switch from an exocentric to an
ego-centric display increased workload or lowered satisfaction,and
3) there was no evidence that simultaneous presen-tation of
spatialised Earcons in the secondary display in-creased
workload.
Author KeywordsEyes-free interaction; auditory displays; spatial
audio
ACM Classification KeywordsH.5.2 [User Interfaces]: Interaction
styles, evaluation.
Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON,
Canada
1567
-
IntroductionAudio-Augmented Reality (AAR) enables users to
inter-act with location-based information purely through soundwhile
on the move. This is particularly useful when theusers’ visual
attention is already being compromised byreal visual objects in the
surrounding environment. Con-sider the following scenario: There is
a conceptual artexhibition in London and art lover, David, has
arranged avisit with his friend Rocio. Before they enter the
gallery,they download an application onto their mobile phonethat
will enable them to listen to information about theart pieces using
their headphones while walking aroundthe exhibition. As they get
close to an audio-augmentedlocation, different sounds allow users
to browse the audioinformation available. This varies between
comments leftby visitors, the artist herself and an art critic. At
one ar-tifact, Rocio selects a comment left by a previous
visitorthat says the piece reminds him of a circulatory
system.David selects a comment left by the artist, which de-scribes
how the frame squeezes wool of different coloursto contrast the 2D
nature of the photo frame with the3D element of materials. David
and Rocio have a livelydiscussion based on these comments. They
agree thatcomments provided by the artist helped them appreci-ate
the ideas in the work, while the opinions left by othervisitors
mentioned things they would never thought ofthemselves. Overall,
the result is a personalised museumexperience, which has responded
to the individual user in-terests and encouraged them to appreciate
and enjoy theart work in more depth in their own way.
As illustrated in our example, location-based informationcan be
presented using AAR. When using such an eyes-free auditory
interface, each location being augmentedrequires the use of an
audio stream which means it maybe necessary to discriminate between
them, especially
when they could overlap if locations are close to eachother.
Spatial audio has been used in previous research[8, 12] as a
successful technique to segregate multiple au-dio streams by
placing each audio stream at a differentlocation around the user’s
head, mirroring how humansperceive sounds in real life. When
designing auditory dis-plays for mobile audio-augmented
environments, choiceshave to be made on both the audio presentation
and thespatial arrangement of the audio streams. A
multilevelauditory display allows concentrated areas of
informationto be structured in a location-based system.
However,given a top-level spatial auditory display, should a
sec-ondary display also be spatialised and if so, how?
Shouldinformation be provided sequentially or simultaneously?While
simultaneous presentation is important to create arich immersive
audio environment, high levels of workloadmay affect exploration
and selection between differentlocations and also the exploration
and selection of thevarious amounts of information provided at each
location.
Background WorkIn 1993 AAR was proposed as the action of
superimpos-ing virtual sound sources upon real world objects [4].
Thekey idea is that users can explore an acoustic virtual
en-vironment augmenting a physical space solely by listeningas they
walk [6, 12]. The majority of indoor AAR systemshave been developed
for museums, exhibitions or historicsites in order to replace
linear keypad-based audio tourguides that constrained users to a
linear access to infor-mation and could pull the visitor’s
attention away fromthe actual exhibits and disturb the overall user
experience.Bederson’s automated tour guide [1] was an early
exam-ple of an exploratory non-linear playback system. Sincethis
early prototype, indoor AAR applications have grownin complexity
providing a much greater amount of infor-mation about the
audio-augmented locations [13, 6].
Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON,
Canada
1568
-
User interaction with location-based information in a
ex-ploratory indoor mobile AAR application usually takesplace
within a proximity and an activation zone [9]. The
Figure 1: Experimental setup.Top: 1) IR tag, 2) JAKE sensor(both
mounted on headphones),3) SHAKE SK6 sensor pack and4) mobile
device. Bottom: Detailof SHAKE SK6 navigation switchinterface.
Figure 2: Schematic of thesystem architecture.
proximity zone advertises an audio-augmented locationand the
activation zone presents more detailed informa-tion. The amount of
information presented within the ac-tivation zone in previous
systems has varied greatly, fromone [1] to multiple [13, 6] audio
streams. As the amountof information presented increases, more
complex audi-tory displays are required. Spatial audio techniques
enableuser interaction with multiple audio streams by
providingorientational information that aids segregation and
atten-tion switching between the audio streams [10]. However,how
should a spatial auditory display be designed in orderto support
increasing amounts of information? Previouswork has used spatial
audio to design mobile auditory dis-plays presenting information
sequentially in an egocentric[5] or exocentric [11] display; or
simultaneously in an ego-centric [8] or exocentric display [3].
However, the usability of these designs has not been com-pared
against each other as part of an interactive audio-only
environment. In an egocentric display, elements arealways in a
fixed position relative to the user, which canbe particularly
useful for mobile users as changes in ori-entation when moving are
inevitable. In an exocentricdisplay, on the other hand, display
element positions haveto be updated real-time according to the user
orientationas they appear to be fixed to the world. This can aid
usernavigation but can be computationally intensive for a mo-bile
device. Previous research has shown that interactingwith an
egocentric display when mobile is faster but moreerror prone than
an exocentric display [7] and that simul-taneous presentation in an
exocentric display allows forfaster user interactions [3]. However,
to our knowledge,no previous research has compared the use of
combined
ego- and exocentric designs within the same spatial
hi-erarchical auditory display. This paper presents an
initialevaluation of a complex multilevel spatial auditory
displayincluding egocentric and exocentric designs and sequen-tial
and simultaneous presentation, tested within a mobileAAR
environment.
Audio-augmented Art ExhibitionA conceptual art exhibition was
used as the setting for allconditions in this study. The virtual
audio environmentwas run on a Nokia N95 8GB and the built-in head
re-lated transfer functions (HRTFs) were used to positionthe audio
sources. User position was determined using anInfrared (IR) camera
tracking an IR tag powered by a 9Vbattery and mounted on top of a
pair of headphones. Co-ordinate information was fed to the mobile
phone over anInternet connection and was used to activate the
zonesassociated with the art pieces.
User orientation was determined using a JAKE1 sensorpack
connected to the mobile phone via Bluetooth. Novisual aids were
provided on the mobile device and, toensure a full eyes-free
experience, the phone was placedon a lanyard around the user’s neck
(see Figure 1 - top).A SHAKE SK6 sensor pack2 was held the user. It
wasalso connected via Bluetooth and the navigation switchprovided
input. This navigation switch allowed users toactivate, browse,
select and deactivate audio content (seeFigure 1 - bottom). The
audio was played over a pair ofDT431 Beyerdynamic open headphones
with the aim toreduce the isolation of the listener from the
surroundingenvironment. The IR tag and JAKE sensor were placedin
the middle of the headphone’s headband. Figure 2shows the final
system setup. The augmented exhibits
1http://code.google.com/p/jake-drivers2http://code.google.com/p/shake-drivers
Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON,
Canada
1569
-
consisted of four different art pieces from the Weavingthe City
project (www.weavingthecity.eu) situated in anexhibition space
measuring 3m by 3.85m see Figure 3).
Figure 3: Illustration of thetop-level sonification layershowing
the location of thecircular proximity (radius 1.25m)and activation
zones (radius0.75m). Small squares with a dotat its centre identify
art piecesplaced on a table.
Top-level sonification layerThis layer used an exocentric design
to present a chat-tering voices sound within the proximity zone
advertis-ing content about each art piece. The sound increasedin
volume when the user reached the activation zone inwhich a
secondary interactive layer could then be acti-vated/deactivated
(see Figure 3 for an illustration of thesetup) by pressing down
(long press, > 2 secs) on theSHAKE navigation switch (see Figure
1 - bottom).
Secondary interactive layerFor each art piece, different Earcons
were used in an au-dio menu. Earcons provide an abstract and
symbolic re-lationship between the sounds and the information
theyare representing [2]. The Earcons identified 1-3 differ-ent
audio menu items as follows: “water waves”: for theartist’s
comments, “open crackling fire”: for positive non-expert reviews,
and “stormy wind”: for negative (cold)non-expert reviews. To select
one of the menu items theSHAKE navigation switch was pressed down
(short press< 2 secs). Once the menu item was selected, an
approx.25 secs long audio clip was played containing the com-ment
or review. User interaction with the audio menuitems varied for the
different experimental conditions:
Baseline: Each Earcon was always played sequentiallyat each push
of the navigation switch either right or left.There was no
spatialisation of the audio items so theyseemed to originate from
within the user’s head. The aimwas to recreate a traditional audio
guide style interactionin which users triggered the audio content
by the press ofa button in a sequential order (see Figure 4a).
Egocentric Sequential: Each Earcon was presented in a
radial menu (virtually located around the user’s head tothe
right, left or in front of the user’s nose) and playedone at a time
when selected by pushing the navigationswitch for the sequential
presentation group. Selectionwas performed by pushing the
navigation switch eitherright or left and the Earcons were located
at 0◦, -90◦ and+90◦ azimuth (see Figure 4b).
Egocentric Simultaneous: Similar to the egocentric se-quential
condition except that all of the Earcons wereplayed simultaneously.
When a menu item was selected,the volume increased for that
selected item to bring itinto focus and decreased for the rest (see
Figure 4c).
StudyThirty-two participants (21 males, 11 females, aged 18to 39
years) were recruited, all were studying or workingat the
University. They all reported normal hearing, wereright-handed and
paid £6 for participation, which lastedjust over an hour.
Experimental Design and MeasuresParticipants were split equally
into two groups: sequen-tial and simultaneous presentation, in a
between-subjectsdesign. The baseline condition was used as a
control inboth groups and the order of conditions was
randomised.Three dependent variables were analysed: perceived
work-load, overall user satisfaction and time spent while
in-teracting with the secondary display. In addition, userlocation
coordinates and head orientation data were alsocollected to
investigate user behaviour. The following hy-potheses were
tested:
H1: A spatialised secondary display will increase explo-ration.
We define increased exploration as usersspending significantly more
time without a signif-icant drop in user satisfaction or a
significant in-crease in perceived workload.
Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON,
Canada
1570
www.weavingthecity.eu
-
H2: Changing between the exocentric and the egocentriclayers
will increase perceived workload.
H3: Simultaneous presentation of Earcons in the ego-centric
secondary interactive layer will increase per-ceived workload.
Proximity zone Ac#va#on zone
Interactive layer
water wind fire
a Baseline 0°
water -90° 90°
fire
wind
b Egocentric sequential
0°
water -90° 90°
fire
wind
c Egocentric simultaneous
Figure 4: Schematic of theinteractive auditory
displaystested.
Figure 5: Mean time takeninteracting with the
secondaryinteractive layer per conditionand presentation group.
Baselinewas used as a control for bothegocentric conditions.Total
audiocontent: 9 audio clips x 25 secs= 225 secs. Error bars
showStandard Error of Mean ± 2.0.
ProcedureThe experiment started with a training session
beforethe test conditions to familiarise the participant with
themulti-level auditory displays around one of the art piecesin the
exhibition space. For each test condition, partici-pants were asked
to explore the exhibition space and findas much information as
possible about the art pieces byinteracting with the different
auditory displays. Partici-pants were given a maximum of 10 minutes
of explorationtime for each test condition. There was no
minimumtime and participants could choose to stop whenever
theywanted. All the participants had time to explore the artpieces
in the allocated time. After each test condition,participants were
asked to complete a NASA-TLX sub-jective workload assessment, a
satisfaction questionnairethat was modified from that used in
Wakkary and Hatala[13] and also provide some informal feedback.
ResultsTo test hypotheses 2 and 3, a two-way mixed-designANOVA
was performed on the overall perceived work-load and overall user
satisfaction mean scores with con-dition type as a within-subjects
factor and presentationgroup as a between-subjects factor. Overall
workload wascalculated across all six subscales to a maximum of
120mean score and overall user satisfaction was calculatedas an
average over 62 satisfaction questions rated on afive-point scale
where 5 was ’best’. No significant resultsare reported for the
different conditions, Baseline versusEgocentric display, or for
Simultaneous versus Sequentialpresentation; or for interactions
between these variables,
either for perceived workload or overall satisfaction.
Thus,hypotheses 2 and 3 can be rejected. Across all conditionsand
presentations mean overall user satisfaction was highat 3.9/5 and
mean overall workload was low at 32/120.
A two-way mixed-design ANOVA showed significantlymore time was
spent in the spatialised Egocentric con-ditions than in the
baseline condition (F (1,30)=8.21,p=0.008), see Figure 5. As there
was no evidence thatspatialisation increased perceived workload,
extra timespent can be attributed to an increase in user
explorationand therefore hypothesis 1 can be accepted.
The logged user behaviour data showed a much simplerpattern of
exploration for participants in the Baselinethan in the Egocentric
conditions. Figure 6 (top) showsan example of one of the
participants in the Baseline con-dition walking in a straighter
trajectory between the artpieces and then staying mainly stationary
once the sec-ondary interactive layer was activated. Figure 6
(bottom)shows the same participant taking more time to
explorearound the art pieces once the secondary interactive
layerwas activated in the Egocentric simultaneous condition.
Conclusions and Future WorkThis paper investigated the usability
of complex spatialauditory displays designed to enable user
interactions withconcentrated areas of information in an
exploratory mo-bile audio-augmented reality environment. Both
egocen-tric and exocentric designs were combined in the mul-tilevel
auditory display to test whether these 3D audiotechniques would
encourage an exploratory behaviour. In-formal feedback suggests
that the egocentric secondaryinteractive layer allowed for a more
exploratory expe-rience. Although overall the baseline condition
was re-ported as “easy to use”, it was also found to be “less
im-
Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON,
Canada
1571
-
mersive”. Users liked the control over the interaction with
Figure 6: Route taken aroundthe art pieces (0-3) by
oneparticipant from thesimultaneous presentation groupin the
Baseline (top) andEgocentric (bottom) conditions.Solid red and blue
lines showpath of exploration in thesonification layer and
thesecondary interactive layerrespectively. Short splinesillustrate
the participant’s headdirection every 0.5 second.
the location-based information provided by the egocen-tric
design and remarked on how this “spatialised inter-face was more
fun than simply scrolling through sounds”.Performance results
supported user feedback and helpedcharacterise the egocentric
secondary display as an ex-ploratory experience, where interaction
times increasewithout an increase in workload and a decrease of
usersatisfaction. The egocentric design performed well
acrosspresentation type with no evidence that the transitionfrom
the top-level exocentric layer into the egocentricsecondary
interactive layer had a negative impact.
These results show that spatial audio can encourage animmersive
experience and an exploratory behaviour. Fu-ture work will
investigate more closely how this sense ofimmersion could be
further encouraged, for example withthe use of an exocentric
secondary display, and how suchan immersive experience might be
shared across multi-ple users interacting with the same display. We
believethat a deeper understanding of the extent to which noveland
more complex auditory displays can impact the userexperience will
allow designers to make more informeddecisions when designing
eyes-free auditory interfaces formobile audio-augmented reality
environments.
References[1] Bederson, B. B. Audio augmented reality: A
proto-
type automated tour guide. In CHI’95, vol. 2 (1995),210–211.
[2] Blattner, M. M., Sumikawa, D. A., and Greenberg,R. M.
Earcons and icons: Their structure and com-mon design principles.
Human-Computer Interaction4, 1 (1989), 11–44.
[3] Brewster, S. A., Lumsden, J., Bell, M., Hall, M.,and Tasker,
S. Multimodal ’eyes-free’ interaction
techniques for wearable devices. In CHI’03 (2003),463–480.
[4] Cohen, M., Aoki, S., and Koizumi, N. Augmentedaudio reality:
Telepresence/vr hybrid acoustic envi-ronments. In 2nd IEEE Intern.
Wrkshp. on Robotand Human Comm. (1993), 361–364.
[5] Dicke, C., Wolf, K., and Tal, Y. Foogue: eyes-free
interaction for smartphones. In MobileHCI’10(2010), 455–458.
[6] Eckel, G. Immersive audio-augmented environments:The listen
project. In IV’01, IEEE Computer SocietyPress (2001), 571–573.
[7] Marentakis, G., and Brewster, S. Effects of
feedback,mobility and index of difficulty on deictic spatial au-dio
target acquisition in the horizontal plane. InCHI’06 (2006),
359–368.
[8] Sawhney, N., and Schmandt, C. Nomadic radio:Speech and audio
interaction for contextual messag-ing in nomadic environments.
Trans. on Computer-Human Interaction 7, 3 (2000), 353–383.
[9] Stahl, C. The roaring navigator: A group guide forthe zoo
with a shared auditory landmark display. InMobileHCI’07 (2007),
282–386.
[10] Stifelman, L. The cocktail party effect in auditory
in-terfaces: a study of simultaneous presentation. Tech.rep., MIT
Media Laboratory, 1994.
[11] Terrenghi, L., and Zimmermann, A. Tailored audioaugmented
environments for museums. In IUI’04(2004), 334–336.
[12] Vazquez-Alvarez, Y., Oakley, I., and Brewster, S.
A.Auditory display design for exploration in mobileaudio-augmented
reality. Personal and Ubiq. Com-puting 16, 8 (2012), 987–999.
[13] Wakkary, R., and Hatala, M. Situated play in a tan-gible
interface and adaptive audio museum guide.Personal and Ubiq. Comp.
11, 3 (2007), 171–191.
Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON,
Canada
1572
IntroductionBackground WorkAudio-augmented Art
ExhibitionTop-level sonification layerSecondary interactive
layer
StudyExperimental Design and MeasuresProcedure
ResultsConclusions and Future WorkReferences