Top Banner
1 A device for human ultrasonic echolocation Jascha Sohl-Dickstein, Santani Teng, Benjamin M. Gaub, Chris C. Rodgers, Crystal Li, Michael R. DeWeese and Nicol S. Harper Abstract—Representing and interacting with the external world constitutes a major challenge for persons who are blind or visually impaired. Nonvisual sensory modalities such as audition and somatosensation assume primary importance. Some non- human species, such as bats and dolphins, sense by active echolo- cation, in which emitted acoustic pulses and their reflections are used to sample the environment. Active echolocation is also used by some blind humans, who use signals such as tongue clicks or cane taps as mobility aids. The ultrasonic pulses employed by echolocating animals have several advantages over those used by humans, including higher spatial resolution and fewer diffraction artifacts. However, they are unavailable to unaided humans. Here we present a device that combines principles of ultrasonic echolocation and spatial hearing to present human users with environmental cues that are i) not otherwise available to the human auditory system and ii) richer in object and spatial information than the more heavily processed sonar cues of other assistive devices. The device consists of a wearable headset with an ultrasonic emitter and stereo microphones affixed with artificial pinnae. The echoes of ultrasonic pulses are recorded and time-stretched to lower their frequencies into the human auditory range before being played back to the user. We tested performance among naive and experienced sighted volunteers using a set of localization experiments. Naive subjects were able to make laterality and distance judgments, suggesting that the echoes provide innately useful information without prior training. Naive subjects were unable to make elevation judgments. However, in localization testing one trained subject demonstrated an ability to judge elevation as well. This suggests that the device can be used to effectively examine the environment and that the human auditory system can rapidly adapt to these artificial echolocation cues. We contend that this device has the potential to provide significant aid to blind people in interacting with their environment. Index Terms—echolocation, ultrasonic, blind, assistive device I. I NTRODUCTION In environments where vision is ineffective, some animals have evolved echolocation — perception using reflections of self-made sounds. Remarkably, some blind humans are also able to echolocate to an extent, frequently with vocal clicks. However, animals specialized for echolocation typically use Jascha Sohl-Dickstein is with Stanford University and Khan Academy. e-mail: [email protected]. Santani Teng is with Massachusetts Institute of Technology. e-mail: [email protected] Benjamin M. Gaub is with University of California, Berkeley. e-mail: [email protected] Chris C. Rodgers is with Columbia University. e-mail: [email protected] Crystal Li is with University of California, Berkeley. e-mail: [email protected] Michael R. DeWeese is with University of California, Berkeley. e-mail: [email protected] Nicol S. Harper is with University of Oxford. e-mail: [email protected] much higher sound frequencies for their echolocation, and have specialized capabilities to detect time delays in sounds. The most sophisticated echolocation abilities are found in microchiropteran bats (microbats) and odontocetes (dolphins and toothed whales). For example, microbats catch insects on the wing in total darkness, and dolphins hunt fish in opaque water. Arguably simpler echolocation is also found in oilbirds, swiftlets, Rousettas megabats, some shrews and tenrecs, and even rats [1]. Microbats use sound frequencies ranging from 25-150 kHz in echolocation, and use several different kinds of echolocation calls [2]. One call, the broadband call or chirp, a brief tone (< 5 ms) sweeping downward over a wide frequency range, is used for localization at close range. A longer duration call, the narrowband call, which with a narrower frequency range is used for detection and classification of objects typ- ically at longer range. In contrast to microbats, odontocetes use clicks; shorter in duration than bat calls and with sound frequencies up to 200 kHz [3]. Odontocetes may use shorter calls as sound travels 4 times faster in water, whereas bat calls may be longer to have sufficient energy for echolocation in air. Dolphins can even use echolocation to detect features that are unavailable via vision: for example, dolphins can tell visually identical hollow objects apart based on differences in thickness [4]. Humans are not typically considered among the echolocat- ing species. Remarkably, however, some blind persons have demonstrated the use of active echolocation in their daily lives, interpreting reflections from self-generated tongue clicks for such tasks as obstacle detection [5], distance discrimination [6], and object localization [7], [8]. The underpinnings of human echolocation in blind (and sighted) people remain poorly characterized, though some informative cues [9], neural correlates [10], and models [11] have been proposed. While the practice of active echolocation via tongue clicks is not commonly taught, it is recognized as an orientation and mobility method [12], [13]. However, most evidence in the existing literature suggests that human echolocation ability, even in blind, trained experts, does not approach the precision and versatility found in organisms with highly specialized echolocation mechanisms. An understanding of the cues underpinning human auditory spatial perception is crucial to the design of an artificial echolocation device. Lateralization of sound sources depends heavily on binaural cues in the form of timing and intensity differences between sounds arriving at the two ears. For ver- tical and front/back localization, the major cues are direction- dependent spectral transformations of the incoming sound induced by the convoluted shape of the pinna, the visible outer portion of the ear [14]. Auditory distance perception is less well characterized than the other dimensions, though evidence
7

A device for human ultrasonic echolocation

Dec 11, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A device for human ultrasonic echolocation

1

A device for human ultrasonic echolocationJascha Sohl-Dickstein, Santani Teng, Benjamin M. Gaub, Chris C. Rodgers, Crystal Li, Michael R. DeWeese

and Nicol S. Harper

Abstract—Representing and interacting with the externalworld constitutes a major challenge for persons who are blind orvisually impaired. Nonvisual sensory modalities such as auditionand somatosensation assume primary importance. Some non-human species, such as bats and dolphins, sense by active echolo-cation, in which emitted acoustic pulses and their reflections areused to sample the environment. Active echolocation is also usedby some blind humans, who use signals such as tongue clicksor cane taps as mobility aids. The ultrasonic pulses employedby echolocating animals have several advantages over thoseused by humans, including higher spatial resolution and fewerdiffraction artifacts. However, they are unavailable to unaidedhumans. Here we present a device that combines principles ofultrasonic echolocation and spatial hearing to present humanusers with environmental cues that are i) not otherwise availableto the human auditory system and ii) richer in object andspatial information than the more heavily processed sonar cuesof other assistive devices. The device consists of a wearableheadset with an ultrasonic emitter and stereo microphonesaffixed with artificial pinnae. The echoes of ultrasonic pulsesare recorded and time-stretched to lower their frequencies intothe human auditory range before being played back to the user.We tested performance among naive and experienced sightedvolunteers using a set of localization experiments. Naive subjectswere able to make laterality and distance judgments, suggestingthat the echoes provide innately useful information withoutprior training. Naive subjects were unable to make elevationjudgments. However, in localization testing one trained subjectdemonstrated an ability to judge elevation as well. This suggeststhat the device can be used to effectively examine the environmentand that the human auditory system can rapidly adapt to theseartificial echolocation cues. We contend that this device has thepotential to provide significant aid to blind people in interactingwith their environment.

Index Terms—echolocation, ultrasonic, blind, assistive device

I. INTRODUCTION

In environments where vision is ineffective, some animalshave evolved echolocation — perception using reflections ofself-made sounds. Remarkably, some blind humans are alsoable to echolocate to an extent, frequently with vocal clicks.However, animals specialized for echolocation typically use

Jascha Sohl-Dickstein is with Stanford University and Khan Academy.e-mail: [email protected].

Santani Teng is with Massachusetts Institute of Technology.e-mail: [email protected]

Benjamin M. Gaub is with University of California, Berkeley.e-mail: [email protected]

Chris C. Rodgers is with Columbia University.e-mail: [email protected]

Crystal Li is with University of California, Berkeley.e-mail: [email protected]

Michael R. DeWeese is with University of California, Berkeley.e-mail: [email protected]

Nicol S. Harper is with University of Oxford.e-mail: [email protected]

much higher sound frequencies for their echolocation, andhave specialized capabilities to detect time delays in sounds.

The most sophisticated echolocation abilities are found inmicrochiropteran bats (microbats) and odontocetes (dolphinsand toothed whales). For example, microbats catch insects onthe wing in total darkness, and dolphins hunt fish in opaquewater. Arguably simpler echolocation is also found in oilbirds,swiftlets, Rousettas megabats, some shrews and tenrecs, andeven rats [1]. Microbats use sound frequencies ranging from25-150 kHz in echolocation, and use several different kinds ofecholocation calls [2]. One call, the broadband call or chirp, abrief tone (< 5 ms) sweeping downward over a wide frequencyrange, is used for localization at close range. A longer durationcall, the narrowband call, which with a narrower frequencyrange is used for detection and classification of objects typ-ically at longer range. In contrast to microbats, odontocetesuse clicks; shorter in duration than bat calls and with soundfrequencies up to 200 kHz [3]. Odontocetes may use shortercalls as sound travels ∼ 4 times faster in water, whereas batcalls may be longer to have sufficient energy for echolocationin air. Dolphins can even use echolocation to detect featuresthat are unavailable via vision: for example, dolphins can tellvisually identical hollow objects apart based on differences inthickness [4].

Humans are not typically considered among the echolocat-ing species. Remarkably, however, some blind persons havedemonstrated the use of active echolocation in their daily lives,interpreting reflections from self-generated tongue clicks forsuch tasks as obstacle detection [5], distance discrimination[6], and object localization [7], [8]. The underpinnings ofhuman echolocation in blind (and sighted) people remainpoorly characterized, though some informative cues [9], neuralcorrelates [10], and models [11] have been proposed. Whilethe practice of active echolocation via tongue clicks is notcommonly taught, it is recognized as an orientation andmobility method [12], [13]. However, most evidence in theexisting literature suggests that human echolocation ability,even in blind, trained experts, does not approach the precisionand versatility found in organisms with highly specializedecholocation mechanisms.

An understanding of the cues underpinning human auditoryspatial perception is crucial to the design of an artificialecholocation device. Lateralization of sound sources dependsheavily on binaural cues in the form of timing and intensitydifferences between sounds arriving at the two ears. For ver-tical and front/back localization, the major cues are direction-dependent spectral transformations of the incoming soundinduced by the convoluted shape of the pinna, the visible outerportion of the ear [14]. Auditory distance perception is lesswell characterized than the other dimensions, though evidence

Page 2: A device for human ultrasonic echolocation

2

suggests that intensity and the ratio of direct-to-reverberantenergy play major roles in distance judgments [15]. Notably,the ability of humans to gauge distance using pulse-echodelays has not been well characterized, though these serveas the primary distance cues for actively echolocating animals[16].

Here we present a device, referred to as the Sonic Eye, thatuses a forehead-mounted speaker to emit ultrasonic “chirps”(FM sweeps) modeled after bat echolocation calls. The echoesare recorded by bilaterally mounted ultrasonic microphones,each mounted inside an artificial pinna, also modeled afterbat pinnae to produce direction-dependent spectral cues. Aftereach chirp, the recorded chirp and reflections are played backto the user at 1

m of normal speed, where m is an adjustablemagnification factor. This magnifies all temporally based cueslinearly by a factor of m and lowers frequencies into thehuman audible range. For empirical results reported here, mis 20 or 25 as indicated. That is, cues that are normally toohigh or too fast for the listener to use are brought into theusable range simply by replaying them more slowly.

Although a number of electronic travel aids that utilize sonarhave been developed (e.g., [17], [18], [19], [20]), very fewprovide information other than range-finding or a processedlocalization cue, and none appears to be in common use.Studies of human hearing suggest that it is very adaptableto altered auditory cues, such as those provided by differentbinaural spacing or altered pinna shapes. Additionally, in blindsubjects the visual cortex can be recruited to also representauditory cues [10]. Our device is designed to utilize theintrinsic cue processing and learning capacity in the humanauditory system [21], [22]. Rather than provide a heavilyprocessed input to the user as other devices have, which ulti-mately will be relatively impoverished in terms of informationabout the world, we provide a minimally processed inputthat, while initially challenging to use, has the capacity to bemuch more informative about the world and integrate betterwith the innate human spatial hearing system. The relativelyraw echoes contain not just distance information but verticaland horizontal location, as well as texture, geometric, andmaterial cues. Behavioral testing suggests that novice userscan quickly judge the laterality and distance of objects, andwith experience can also judge elevation, and that the SonicEye thus demonstrates potential as an assistive mobility device.

A preliminary version of this work was presented in posterform at the ICME 2013 MAP4VIP workshop [23].

II. SPECIFICATIONS AND SIGNAL PROCESSING

The flow of information through the Sonic Eye is illustratedin Figure 1a, and the device is pictured in Figure 1b. Record-ings of a sound waveform moving through the system arepresented in Figure 2. A video including helmet-cam video ofthe device experience is included in Supplemental Material.

The signal processing steps performed by the Sonic Eye,and the hardware used in each step, are as follows:Step 1: The computer generates a chirp waveform, consisting

of a 3 ms sweep from 25 kHz to 50 kHz with a constantsweep rate in log frequency. The initial and final 0.3 ms are

(a)

Device&layout&

Echo&

Ultrasonic&microphone&

Le6&ar7ficial&pinna&Computer&

1)  Sends&chirp&train&to&speaker&&

2)  Slows&sound&from&microphone&&25Cfold&a6er&each&click&and&sends&to&corresponding&earpiece& Echo&

Ultrasonic&microphone&

Right&ar7ficial&pinna&

Le6&earpiece&

Right&earpiece&

Ultrasonic&speaker& Object&Chirp&

(b) (c)

Fig. 1. (a) Diagram of components and information flow. (b) Photographof the current hardware. (c) Photograph of one of the artificial pinnae used,modeled after a bat ear.

Fig. 2. Schematic of waveforms at several processing stages, from ultrasonicspeaker output to stretched pulse-echo signal headphone output presented touser. Red traces correspond to the left ear signal, and blue traces to right earsignal.

tapered using a cosine ramp function. The computer, in asmall enclosure mini-ITX case, runs Windows 7 and per-forms all signal processing using a custom Matlab program.

Step 2: The chirp is played through the head-mounted tweeterspeaker. In order to play the chirp, it is output through anESI Juli@ soundcard with stereo 192 kHz input and output,amplified using a Lepai TRIPATH TA2020 12 Volt stereoamplifier, and finally emitted by a Fostex FT17H RealisticSuperTweeter speaker.

Step 3: The computer records audio through the helmetmounted B&K Type 4939 microphones. For all experiments,the recording duration was 30 ms, capturing the initialchirp and echoes from objects up to 5 m away. The signalfrom the microphones passes through a B&K 2670 preampfollowed by a B&K Nexus conditioning Amplifier before

Page 3: A device for human ultrasonic echolocation

3

being digitized by the ESI Juli@ soundcard.Step 4: The recorded signal is bandpass-filtered using Butter-

worth filters from 50 to 25 kHz, and time-dilated by a factorof m. For m = 25, the recorded ultrasonic chirp and echoesnow lie between 1 and 2 kHz.

Step 5: The processed signal is played to the user throughAirDrives open-ear headphones, driven by a Gigaport HDUSB soundcard. Critically, the open-ear design leaves theear canal unobstructed, ensuring safety in applied situations.(Note that in Experiments 1 and 2 described below, conven-tional headphones were used for stimulus delivery.)

In the current version of the device, the speaker and twoultrasonic microphones housed in artificial pinnae are mountedon a bicycle helmet. The pinnae are hand-molded from clayto resemble bat ears. The rest of the components are mountedwithin a baby carrier backpack, which provides portability,ventilation, and a sturdy frame. A lithium-ion wheelchairbattery is used to power the equipment. We note that in itscurrent form, the Sonic Eye prototype is a proof-of-principledevice whose weight and size make it unsuited to everyday useby blind subjects and extensive open-field navigation testing.To overcome these limitations we are developing a low-costminiaturized version that retains all the functionality, with auser interface specifically for the blind. However, user testingwith the current version has provided a proof of principle ofthe device’s capabilities, as we describe below.

A. Measurement of Transfer Functions

We measured angular transfer functions for the ultrasonicspeaker and microphone in an anechoic chamber (Figure 3).The full-width half-max (FWHM) angle for speaker power was∼ 50◦, and for the microphone ∼ 160◦. Power was measuredusing bandpass Gaussian noise between 25 kHz and 50 kHz.

III. EXPERIMENTAL METHODS

To explore the perceptual acuity afforded by the artificialechoes, we conducted three behavioral experiments: two inwhich we presented pulse-echo recordings (from the SonicEye) via headphones to naive sighted participants, and apractical localization test with one trained user wearing thedevice. In both Experiments 1 and 2, we tested spatial dis-crimination performance in separate two-alternative forced-choice (2AFC) tasks along three dimensions: i) laterality (left-right), ii) depth (near-far), and iii) elevation (high-low). Thedifference between Experiments 1 and 2 is that we providedtrial-by-trial feedback in Experiment 2, but not Experiment 1.This allowed us to assess both the intuitive discriminability ofthe stimuli (Experiment 1) as well as the benefit provided byfeedback (Experiment 2).

In Experiment 3 we tested horizontal and vertical local-ization performance in a separate task on a single user (NH)with approximately 6 hours of training experience over severaldays.

A. Methods, Experiment 1

1) Stimuli: For each of the three spatial discrimination tasks(laterality, depth and elevation), echoes were recorded from

(a)

(b)

0 10 20 30 40 50 60 70 80 90Angle (degrees)

20

15

10

5

0

dB

rela

tive t

o 0

degre

es

Ultrasonic power (25-50 kHz) vs. angle

SpeakerMicrophone

Fig. 3. Measurement of transfer functions for ultrasonic microphones andultrasonic speaker as a function of angle. (a) Angular transfer function setup.(b) Angular transfer function data. For the microphone, the sensitivity relativeto the sensitivity at zero degrees is plotted; for the speaker, the emission powerrelative to the emission power at zero degrees is plotted.

an 18-cm-diameter plastic disc placed in positions appropriateto the stimulus condition, as illustrated in Figure 4a. Forlaterality judgments, the disc was suspended from the testingroom ceiling via a thin (< 1 cm thick) wooden rod 148 cmin front of the emitter and 23.5 cm to the left or right ofthe midline. The “left” and “right” conditions were thus each∼ 9◦ from the midline relative to the emitter, with a center-to-center separation of ∼ 18◦. For depth judgments, the discwas suspended on the midline directly facing the emitter ata distance of 117 or 164 cm, separating the “near” and “far”conditions by 47 cm. Finally, for elevation judgments, the discwas suspended 142 cm in front and 20 cm above or below themidline, such that the “high” and “low” conditions were ∼ 8◦

above and below the horizontal median plane, respectively,separated by ∼ 16◦. In all cases, the helmet with microphonesand speakers was mounted on a styrofoam dummy head.

To reduce the likelihood of artifactual cues from any singleecho recording, we recorded five “chirp” pulses (3-ms risingfrequency sweeps, time dilation factor m = 25, truncated to 1s length) and the corresponding echoes from the disc for eachstimulus position. Additionally, echoes from each stimulusposition were recorded with and without the artificial pinnaeattached to the microphones. Thus, for each of the six stimuluspositions, we had 10 recorded pulse-echo exemplars, for a totalof 60 stimuli.

Page 4: A device for human ultrasonic echolocation

4

2) Procedure: Sighted participants (N = 13, 4 female,mean age 25.5 y) underwent 20 trials for each of the threespatial discrimination tasks, for a total of 60 trials per session.The trials were shuffled such that the tasks were randomlyinterleaved. Sound stimuli were presented on a desktop orlaptop PC using circum-aural headphones (Sennheiser HD202)at a comfortable volume, ∼ 70 dB. No visual stimuli werepresented; the screen remained a neutral gray during auditorystimulus presentation. On each trial, the participant listenedto a set of three randomly selected 1-s exemplars (pulse-echorecordings) for each of two stimulus conditions. Dependingon the spatial task, the participant then followed on-screeninstructions to select from two options; whether the secondexemplar represented an object to the left or right; nearer orfarther; or above or below relative to the echoic object fromthe first exemplar. Upon the participant’s response, a new trialbegan immediately, without feedback.

B. Methods, Experiment 2

1) Stimuli: Stimuli in Experiment 2 were nearly identicalto those in Experiment 1, except that we now provided trial-by-trial feedback. To prevent participants from improvingtheir performance based on artifactual noise that might bepresent in our specific stimulus set, we filtered backgroundnoise from the original recordings using the spectral noisegating function in the program Audacity (Audacity Team,http://audacity.sourceforge.net/). All other stimulus character-istics remained as in Experiment 1.

2) Procedure: Sighted volunteers (N = 12, 5 female, meanage 23.3 y) were tested on the same spatial discriminationtasks as in Experiment 1. After each response, participantswere informed whether they had answered correctly or incor-rectly. All other attributes of the testing remained the same asin Experiment 1.

C. Methods, Experiment 3

We conducted a psychophysical localization experimentwith one blindfolded sighted user (NH) who had approxi-mately 6 hours of self-guided practice in using the device,largely to navigate the corridors near the laboratory. The taskwas to localize a plate, ∼ 30 cm (17◦) in diameter, heldat one of 9 positions relative to the user (see Figure 5). Ineach of 100 trials, the plate was held at a randomly selectedposition at a distance of 1 m, or removed for a 10th “absent”condition. Each of the 10 conditions was selected with equalprobability. The grid of positions spanned 1 m on a side,such that the horizontal and vertical offsets from the centerposition subtended ∼ 18◦. The participant kept their headmotionless during the task. Responses consisted of a verbalreport of grid position. After each response the participantwas given feedback on the true position. The experiment tookplace in a furnished seminar room, a cluttered echoic space.Before doing the task the subject had ∼ 20 minutes of informalexperience of plate localization on previous days, though notwith the same plate locations or device settings as used in theexperiment.

(a) Depth

d = 164 cmd = 142 cm

θ = ±8.0°(20 cm)

d = 148 cm

θ = ±9.0°(23.5 cm)

Elevation

θ

Laterality

θd = 117 cm

(b) Depth Elevation Laterality

Prop

ortio

n co

rrec

t

**

Experiment 1 - No Feedback

PinnaNo pinna

00.10.20.30.40.50.60.70.80.9

1

(c)

PinnaNo pinnaAbove chance

Mean

Below chance

Prop

ortio

n co

rrec

t

Experiment 1 - No Feedback

00.10.20.30.40.50.60.70.80.9

1

Depth Elevation Laterality

(d)

Prop

ortio

n co

rrec

t

Experiment 2 - Feedback

****

00.10.20.30.40.50.60.70.80.9

1

Depth Elevation Laterality (e)

Prop

ortio

n co

rrec

t

Experiment 2 - Feedback

00.10.20.30.40.50.60.70.80.9

1

Depth Elevation Laterality

Fig. 4. Two alternative forced choice spatial localization testing. (a) Adiagram of the configurations used to generate stimuli for each of the depth,elevation, and laterality tasks. (b) The fraction of stimuli correctly classifiedwith no feedback provided to subjects (N = 13). Light gray bars indicateresults for stimuli recorded with artificial pinnae, while dark gray indicates thatpinnae were absent. The dotted line indicates chance performance level. Errorbars represent 95% confidence intervals, computed using Matlab’s binofitfunction. Asterisks indicate significant differences from 50% according toa two-tailed binomial test, with Bonferroni-Holm correction for multiplecomparisons. (c) The same data as in (b), but with each circle representingthe performance of a single subject, and significance on a two-tailed binomialtest determined after Bonferroni-Holm correction over 13 subjects. (d) and (e)The same as in (b) and (c), except that after each trial feedback was providedon whether the correct answer was given (N = 12).

Experiment 3 used an earlier hardware configuration thanExperiments 1 and 2. The output of the B&K Nexus condi-tioning amplifier was fed into a NIDAQ USB-9201 acquisitiondevice for digitization. Ultrasonic audio was output using anESI GIGAPORT HD soundcard. The temporal magnificationfactor m was set to 20. The backpack used was different, andpower was provided by extension cord. NH did not participatein Experiments 1 or 2.

IV. EXPERIMENTAL RESULTS

A. Experiment 1

Laterality judgments were robustly above chance for bothpinnae (mean 75.4% correct, p � 0.001, n = 260, two-tailed binomial test, Bonferroni-Holm multiple comparisoncorrection over 6 tests) and no-pinnae conditions (mean 86.2%correct, p � 0.001, n = 260, two-tailed binomial test,

Page 5: A device for human ultrasonic echolocation

5

Bonferroni-Holm multiple comparison correction over 6 tests),indicating that the binaural echo input produced reliable,intuitive cues for left-right judgments. Depth and elevationjudgments, however, proved more difficult; performance onboth tasks was not different from chance for the group. Thepresence or absence of the artificial pinnae did not significantlyaffect performance in any of the three tasks. Population andsingle-subject results are shown in Figure 4b-c.

B. Experiment 2

Results for laterality and elevation judgments replicatedthose from Experiment 1: strong above-chance performancefor laterality in both pinnae (76.7% correct, p � 0.001,n = 240, two-tailed binomial test, Bonferroni-Holm multiplecomparison correction over 6 tests) and no-pinnae (83.3%correct, p � 0.001, n = 240, two-tailed binomial test,Bonferroni-Holm multiple comparison correction over 6 tests)conditions. Because there appeared to be little benefit fromfeedback for these judgments, we conclude that it may beunnecessary for laterality judgments. Performance was still atchance for elevation, indicating that feedback over the courseof a single experimental session was insufficient for this task.

However, performance on depth judgments improvedmarkedly over Experiment 1, with group performance abovechance for both pinnae (70% correct, p � 0.001, n = 240,two-tailed binomial test, Bonferroni-Holm multiple compar-ison correction over 6 tests) and no-pinnae (68.3% correct,p � 0.001, n = 240, two-tailed binomial test, Bonferroni-Holm multiple comparison correction over 6 tests) conditions.Performance ranges were also lower (smaller variance) fordepth judgments compared to Experiment 1, suggesting thatfeedback aided a more consistent interpretation of depth cues.As in Experiment 1, the presence or absence of the artificialpinnae did not significantly affect performance in any of thethree tasks. Population and single subject results are shown inFigure 4d-e.

C. Experiment 3

As illustrated in the spatially arranged confusion matrixof Figure 5b, the correct position was indicated with highprobability. Overall performance was 48% correct, signifi-cantly greater than a chance performance of 10% (p � 0.001,n = 100, two-tailed binomial test, Bonferroni-Holm multiplecomparison correction over 4 tests). For all non-absent trials,72% of localization judgments were within one horizontalor one vertical position of the true target position. Figure5d shows the confusion matrix collapsed over spatial po-sition to show only the absence of presence of the plate.The present/absent state was reported with 98% accuracy,significantly better than chance (p < 0.001, n = 100, two-tailed binomial test, Bonferroni-Holm multiple comparisoncorrection over 4 tests). Figure 5e shows the confusion matrixcollapsed over the vertical dimension (for the 93 cases wherethe plate was present), thus showing how well the subjectestimated position in the horizontal dimension. The horizontalposition of the plate was correctly reported 56% of thetime, significantly above chance performance (p � 0.001,

n = 93, two-tailed binomial test, Bonferroni-Holm multiplecomparison correction over 4 tests). Figure 5f shows theconfusion matrix collapsed over the horizontal dimension,thus showing how well the subject estimated position in thevertical dimension. The vertical position of the plate wascorrectly reported 68% of the time, significantly above chanceperformance (p � 0.001, n = 93, two-tailed binomial test,Bonferroni-Holm multiple comparison correction over 4 tests).

V. DISCUSSION

In Experiments 1 and 2, we found that relatively precisespatial discrimination based on echolocation is possible withlittle or no practice in at least two of three spatial dimensions.Echoic laterality cues were clear and intuitive regardless offeedback, while echoic distance cues were readily discrim-inable with feedback. Depth judgments without feedback werecharacterized by very large variability compared to the othertasks: performance ranged from 0 - 100% (pinnae) and 10 -100% (no-pinnae) across subjects. This suggests the presenceof a cue that was discriminable but nonintuitive without trial-by-trial feedback. While we did not vary distances paramet-rically, as would be necessary to estimate psychophysicalthresholds, our results permit some tentative observationsabout the three-dimensional spatial resolution achieved withartificial echolocation. [24] tested echolocation in unaidedhumans. Direct comparison is difficult, as [24] tested absoluterather than relative laterality and included a third “center”condition. However, the reflecting object in that experimentwas a large rectangular board subtending 29◦ × 30◦, suchthat lateral center-to-center separation was 29◦, whereas discpositions in the present study were separated by a smalleramount, 19.9◦, and presented less than 10% of the reflectingsurface. [24] reported ∼ 60 - 65% correct laterality judgmentsfor sighted subjects which is somewhat less than our measures(they reported ∼ 75% for blind subjects). Hence, this suggestsan increase in effective sensitivity when using artificial echocues provided by the Sonic Eye.

Depth judgments were reliably made at a depth difference of47 cm in Experiment 2, corresponding to an unadjusted echo-delay difference of ∼ 2.8 ms, or ∼ 69 ms with a dilation factorof 25. A 69-ms time delay is easily discriminable by humanlisteners but was only interpreted correctly with feedback,suggesting that the distance information in the echo recordings,although initially nonintuitive, became readily interpretablewith practice. Our signal recordings included complex rever-berations inherent in an ecological, naturalistic environment.Thus the discrimination task was more complex than a simpledelay between two isolated sounds. The cues indexing auditorydepth include not only variation in pulse-echo timing delays,but also differences in overall reflected energy and reverber-ance which are strongly distance-dependent. In fact, as cuesproduced by active echoes, discrete pulse-echo delays are nottypically encountered by the human auditory system. Single-subject echoic distance discrimination thresholds as low as∼ 11 cm (∼ 0.6 ms) have been reported for natural humanecholocation [6]. Thus, it is likely that training would improvedepth discrimination considerably, especially with time-dilated

Page 6: A device for human ultrasonic echolocation

6

(a) (b)

(c) (d) (e) (f)Fig. 5. Ten-position localization in one trained participant. A subject was asked to identify the position of an ∼ 30 cm plastic plate held at 1 m distance.(a) Schematic illustration of the 10 possible configurations of the plate, including nine spatial locations and a tenth ‘absent’ condition. (b) Spatially arrangedconfusion matrix of behavioral results. Each sub-figure corresponds to a location of the plate, and the intensity map within each subfigure indicates the fractionof trials the subject reported each position for each plate location. Black corresponds to a subject never indicating a location, and white corresponds to alocation always being indicated. Each sub-figure sums to 100%. (c) The same confusion matrix, plotted without regard to spatial position. The x-axis showsthe true position and the y-axis shows the reported position. The position labels are A: absent, UL: up-left, UC: up-center, UR: up-right, CL: center-left, CC:center-center, CR: center-right, DL: down-left, DC: down-center, DR: down-right. Grayscale intensity as in (b) indicates the percentage of times each positionis reported for a particular true position — each column sums to 100%. (d) Confusion matrix grouped into plate absent and present conditions. (e) Confusionmatrix grouped by horizontal position of the plate. (f) Confusion matrix grouped by vertical position of the plate.

echo information, in theory down to ∼ 0.5 cm with 25-folddilation.

Performance was low on the elevation task in both pin-nae and no-pinnae conditions. It is possible that the echorecordings do not contain the elevation information necessaryfor judgments of 16◦ precision. However, our tasks wereexpressly designed to assess rapid, intuitive use of the echocues provided, while the spectral cues from new pinna taketime to learn; judgment of elevation depends on pinna shape[14], [25], and weeks of adaptation can be required to recoverperformance using a modified pinna [22]. Finally, the designand construction of the artificial pinnae used in the presentexperiment may benefit from refinement to optimize theirspectral filtering properties.

In line with these observations, Experiment 3 suggests thatboth vertical and horizontal localization cues were available toa user with a moderate amount of training. This is qualitativelyconsistent with previous measures of spatial resolution inblind and sighted subjects performing unaided spatial echolo-cation tasks [7], [26]. While further research is needed tovalidate such comparisons and, more generally, characterize

the behavioral envelope of Sonic Eye-aided echolocation, weconsider the results presented here as encouraging. Specifi-cally, they suggest that performance on behaviorally relevanttasks is amenable to training. Informal observations with twofurther participants suggest an ability to navigate throughhallways, detecting walls and stairs, while using the SonicEye blindfolded. A sample of audio and video from the user’sperspective is provided in the supplemental video to thismanuscript, also available at http://youtu.be/md-VkLDwYzc.

Any practical configuration such as that tested in Exper-iment 3 should minimize interference between echolocationsignals and environmental sounds (e.g., speech or approachingvehicles). To this end, open-ear headphones ensure that theear remains unobstructed, as described in Section 2. However,future testing should include evaluations of auditory perfor-mance with and without the device, and training designedto assess and improve artificial echolocation in a naturalistic,acoustically noisy environment.

We note that performance on the experiments reported herelikely underestimates the sensitivity achievable by the SonicEye for several reasons. For example, all experiments featured

Page 7: A device for human ultrasonic echolocation

7

a constraint under which the head was fixed, either virtuallyor in reality, relative to the target object. This would notapply to a user in a more naturalistic context. Additionally,the participants tested were sighted and relatively untrained;blind and visually impaired users may benefit from superiorauditory capabilities (e.g., [27], [28], [29]). Finally, ongoingdevelopment of the prototype continues to improve the qualityof the emitted, received, and processed signal.

VI. SUMMARY AND CONCLUSION

Here we present a prototype assistive device to aid innavigation and object perception via ultrasonic echoloca-tion. The ultrasonic signals exploit the advantages of high-frequency sonar signals and time-stretch them into human-audible frequencies. Depth information is encoded in pulse-echo time delays, made available through the time-stretchingprocess. Azimuthal location information is encoded as inter-aural time and intensity differences between echoes recordedby the stereo microphones. Finally, elevation information iscaptured by artificial pinnae mounted to the microphones asdirection-dependent spectral filters. Thus, the device presentsa three-dimensional auditory scene to the user with hightheoretical spatial resolution, in a form consistent with naturalspatial hearing. Behavioral results from two experiments withnaive sighted volunteers demonstrated that two of three spatialdimensions (depth and laterality) were readily available withno more than one session of feedback/training. Elevation infor-mation proved more difficult to judge, but a third experimentwith a single moderately trained user indicated successful useof elevation information as well. Taken together, we interpretthese results to suggest that while some echoic cues providedby the device are immediately and intuitively available tousers, perceptual acuity is potentially highly amenable totraining. Thus, the Sonic Eye may prove to be a useful assistivedevice for persons who are blind or visually impaired.

ACKNOWLEDGMENTS

The authors would like to thank Jeff Hawkins for supportand feedback in construction and design, Aude Oliva forcomments on an earlier version of this manuscript, DavidWhitney for providing space, equipment and oversight forbehavioral tests, Ervin Hafter for allowing use of his anechoicchamber, Alex Maki-Jokela for help with hardware debuggingand software design, and Jonathan Toomim for work develop-ing and building the next generation prototype.

REFERENCES

[1] G. Jones, “Echolocation.” Current Biology : CB, vol. 15, no. 13, pp.R484–8, Jul. 2005.

[2] A. Grinnell, “Hearing in bats: an overview,” Hearing by Bats, 1995.[3] A. Popper, “Sound emission and detection by delphinids,” Cetacean

behavior: Mechanisms and functions, 1980.[4] W. Au and D. Pawloski, “Cylinder wall thickness difference discrim-

ination by an echolocating Atlantic bottlenose dolphin,” Journal ofComparative Physiology A, 1992.

[5] M. Supa, M. Cotzin, and K. Dallenbach, “‘Facial Vision’: The Percep-tion of Obstacles by the Blind,” The American Journal of Psychology,vol. 57, no. 2, pp. 133–183, 1944.

[6] W. N. Kellogg, “Sonar system of the blind.” Science (New York, N.Y.),vol. 137, no. 3528, pp. 399–404, Aug. 1962.

[7] C. Rice, “Human Echo Perception,” Science, vol. 155, no. 3763, pp.656–664, 1967.

[8] S. Teng, A. Puri, and D. Whitney, “Ultrafine spatial acuity of blind experthuman echolocators.” Experimental Brain Research, vol. 216, no. 4, pp.483–8, Feb. 2012.

[9] T. Papadopoulos and D. Edwards, “Identification of auditory cues uti-lized in human echolocation – Objective measurement results,” Biomed-ical Signal Processing and Control, 2011.

[10] L. Thaler, S. R. Arnott, and M. A. Goodale, “Neural correlates of naturalhuman echolocation in early and late blind echolocation experts.” PloSOne, vol. 6, no. 5, p. e20162, Jan. 2011.

[11] O. Hoshino and K. Kuroiwa, “A neural network model of the inferiorcolliculus with modifiable lateral inhibitory synapses for human echolo-cation,” Biological Cybernetics, 2002.

[12] B. Schenkman and G. Jansson, “The detection and localization of objectsby the blind with the aid of long-cane tapping sounds,” Human Factors:The Journal of the Human Factors and Ergonomics Society, 1986.

[13] J. Brazier, “The Benefits of Using Echolocation to Safely NavigateThrough the Environment,” International Journal of Orientation andMobility, vol. 1, 2008.

[14] J. Middlebrooks and D. Green, “Sound Localization by Human Listen-ers,” Annual Review of Psychology, 1991.

[15] P. Zahorik, “Auditory distance perception in humans: A summary ofpast and present research,” Acta Acustica united with Acustica, vol. 91,no. February 2003, pp. 409–420, 2005.

[16] H. Riquimaroux, S. Gaioni, and N. Suga, “Cortical computational mapscontrol auditory perception,” Science, 1991.

[17] T. Ifukube, T. Sasaki, and C. Peng, “A blind mobility aid modeled afterecholocation of bats,” IEEE Transactions on Biomedical Engineering,1991.

[18] L. Kay, “Auditory perception of objects by blind persons, using abioacoustic high resolution air sonar,” The Journal of the AcousticalSociety of America, 2000.

[19] P. Mihajlik and M. Guttermuth, “DSP-based ultrasonic navigation aidfor the blind,” Proceedings of the 18th IEEE Instrumentation andMeasurement Technology Conference, 2001.

[20] D. Waters and H. Abulula, “Using bat-modelled sonar as a navigationaltool in virtual environments,” International Journal of Human-ComputerStudies, 2007.

[21] A. King and O. Kacelnik, “How plastic is spatial hearing?” Audiologyand Neurotology, 2001.

[22] P. Hofman, J. V. Riswick, and A. V. Opstal, “Relearning sound local-ization with new ears,” Nature Neuroscience, 1998.

[23] S. Teng, J. Sohl-Dickstein, C. Rodgers, M. DeWeese, and N. Harper,“A device for human ultrasonic echolocation,” IEEE Workshop onMultimodal and Alternative Perception for Visually Impaired People,ICME, 2013.

[24] A. Dufour, O. Despres, and V. Candas, “Enhanced sensitivity to echocues in blind subjects,” Experimental Brain Research, 2005.

[25] G. Recanzone and M. Sutter, “The biological basis of audition,” AnnualReview of Psychology, 2008.

[26] S. Teng and D. Whitney, “The Acuity of Echolocation: Spatial Reso-lution in Sighted Persons Compared to the Performance of an ExpertWho Is Blind,” Journal of Visual Impairment & Blindness, no. January,2011.

[27] N. Lessard, M. Pare, F. Lepore, and M. Lassonde, “Early-blind humansubjects localize sound sources better than sighted subjects.” Nature,vol. 395, no. 6699, pp. 278–80, Sep. 1998.

[28] B. Roder, W. Teder-Salejarvi, and A. Sterr, “Improved auditory spatialtuning in blind humans,” Nature, vol. 400, no. July, pp. 162–166, 1999.

[29] P. Voss, F. Lepore, F. Gougoux, and R. Zatorre, “Relevance of spectralcues for auditory spatial processing in the occipital cortex of the blind,”Frontiers in Psychology, 2011.