Top Banner
Cosc 6326/Psych6750X Audition and Auditory Displays
62

Cosc 6326/Psych6750X

Mar 20, 2016

Download

Documents

fruma

Cosc 6326/Psych6750X. Audition and Auditory Displays. Use of auditory displays. Sound in information display. speech provides a high bandwidth communication channel audition is a long distance sense without field of view restrictions - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cosc 6326/Psych6750X

Cosc 6326/Psych6750X

Audition and Auditory Displays

Page 2: Cosc 6326/Psych6750X

Use of auditory displays

Page 3: Cosc 6326/Psych6750X

Sound in information display

• speech provides a high bandwidth communication channel

• audition is a long distance sense without field of view restrictions

• Sound is useful for information display (Cohen & Wenzel 1995) – when origin of message is a sound (voice, music)

Page 4: Cosc 6326/Psych6750X

– when message is simple and short (e.g. event markers)

– when message will not be referred to later (e.g. time)– when message deals with events in time– warnings or prompts (hearing is always on, no field

of view issues)– continuously changing information (e.g. countdown)– when other systems (e.g. vision) are overloaded

Page 5: Cosc 6326/Psych6750X

– when verbal response is required (compatibility)

– when illumination or disability prevents vision (e.g. alarm clock, limited field of view, blindness)

– when the user moves from place to place (sound as an ubiquitous I/O channel)

Page 6: Cosc 6326/Psych6750X

Sonification

• In ‘visualization’ situations, ‘sonification’ of data can assist in the exploration of complex datasets

• In these applications ‘realism’ is typically not a major issue

• Sound can help interpret complex or multidimensional data; can provide an independent display dimension

Page 7: Cosc 6326/Psych6750X

• In addition to information display, in immersive displays sound contributes to: – realism, situational awareness and presence– ambience and emotive context– cueing visual attention– natural communication– space perception

Page 8: Cosc 6326/Psych6750X

Realism and ambience

• High quality sound improves perceived ‘quality’ of visual displays

• Sounds in the environment provides vital information that contributes to situational awareness

• Persistence of sounds of objects out of field of view may help maintain object permanence

Page 9: Cosc 6326/Psych6750X

• Sound is believed to be vital for conveying emotion and ambience in movies

• Ambient sounds can be realistic or abstract (e.g. music to set mood)

• Absence of appropriate sound degrades realism

Page 10: Cosc 6326/Psych6750X

• If background sounds are not well matched to visuals participant may feel detached –‘presence’ may be degraded

• Relation between presence and realism is not straightforward (later lecture)

• Sound is an omni-directional sense and may help user feel immersed in the VE

• Auditory collision cues may help navigating a VE (especially with HMDs)

Page 11: Cosc 6326/Psych6750X

Audition

Page 12: Cosc 6326/Psych6750X

Sound

• Sound is “mechanical vibrations and waves of an elastic medium, particularly in the frequency range of human hearing (16 Hz to 20 kHz)”

• Normally, the medium is air. Sound is an air pressure wave.

• Sound is usually used to describe the physical stimulus.

Page 13: Cosc 6326/Psych6750X

• Audition refers to perception. • An auditory event is usually elicited by a

sound event.• A sinusoidal pressure wave is known as a

pure tone.

Page 14: Cosc 6326/Psych6750X

t

x(t)

T0=1/f0

• Sinusoid– x(t) = A cos(2f0t + )

A is amplitude f0 is frequency

is phase – T0 is period is related to time shift

of peak

fcwavelength ==λ

Page 15: Cosc 6326/Psych6750X
Page 16: Cosc 6326/Psych6750X

Dimensions of sound • Harmonic content: pitch, melody, harmony,

waveshape, timbre, vibrato• Timing: duration, tempo, rhythm, • Loudness, envelope• Spatial: azimuth, elevation, distance• Ambience: resonance, reverberation,

spaciousness• Representation: literal, auditory icons, abstract

Page 17: Cosc 6326/Psych6750X

• Perceptual and physical dimensions are analogous but distinct– pitch and frequency (directly related for pure

tones)– loudness and intensity– timbre and complexity

Page 18: Cosc 6326/Psych6750X

Matlin and Foley, Sensation and Perception

Page 19: Cosc 6326/Psych6750X

Kandel et al, Principles of Neural Science

Page 20: Cosc 6326/Psych6750X

Physiology and psychophysics

• Cochlea performs mechanical spectral analysis of sound signal

• Pure tone induces traveling wave in basilar membrane.– maximum mechanical displacement along

membrane is function of frequency (place coding)• Displacement of basilar membrane changes with

compression and rarefaction (frequency coding)

Page 21: Cosc 6326/Psych6750X

Matlin and Foley, Sensation and Perception

Kandel et al, Principles of Neural Science

Page 22: Cosc 6326/Psych6750X

Perception of pitch

• Along the basilar membrane, hair cell response is tuned to frequency– each neuron in the auditory nerve responds to

acoustic energy near its preferred frequency– preferred frequency is place coded along the

cochlea. Frequency coding believed to have a role at lower frequencies

• Higher auditory centers maintain frequency selectivity and are ‘tonotopically mapped’

Page 23: Cosc 6326/Psych6750X

• Pitch is related to frequency for pure tones. • For periodic or quasi-periodic sounds the pitch

typically corresponds to inverse of period• Some have no perceptible pitch (e.g. clicks,

noise)• Sounds can have same pitch but different

spectral content, temporal envelope … timbre

Page 24: Cosc 6326/Psych6750X

Perception of loudness

• Intensity is measured on a logarithmic scale in decibels

• Range from threshold to pain is about 120 dB-SPL

• Loudness is related to intensity but also depends on many other factors (attention, frequency, harmonics, …)

Page 25: Cosc 6326/Psych6750X

Spatial hearing

• Auditory events can be perceived in all directions from observer

• Auditory events can be localized internally or externally at various distances

• Audition also supports motion perception– change in direction– Doppler shift

Page 26: Cosc 6326/Psych6750X

• Ability to localize depends on sound source and environment– a tone in reverberant room is difficult to locate

in time and space– a click in an anechoic chamber, on the other

hand, is precisely located and time limited

Page 27: Cosc 6326/Psych6750X

Auditory Scene Analysis

• Process of separating out the different sources present in the environment

• Detection and segregation of distinct sources

• Grouping of sounds in spatial and temporal proximity into single streams

Page 28: Cosc 6326/Psych6750X

Cocktail party effect

• In environments with many sound sources it is easier to process auditory streams if they are separated spatially

• Spatial sound techniques can help in sound discrimination, detection and speech comprehension in busy immersive environments

Page 29: Cosc 6326/Psych6750X

Spatial Auditory Cues

• Two basic types of head-centric direction cues– binaural cues– spectral cues

Page 30: Cosc 6326/Psych6750X

Binaural Directional Cues

• When a source is located eccentrically it is closer to one ear than the other– sound arrives later and weaker at one ear– head ‘shadow’ also weakens sound arrive at

opposite ear• Binaural cues are robust but ambiguous

Page 31: Cosc 6326/Psych6750X

http://headwize.com/tech/aureal1_tech.htm

Page 32: Cosc 6326/Psych6750X

• Interaural time differences (ITD)– ITD increase with directional deviation from the

median plane. It is about 600 s for a source located directly to one side.

– Humans are sensitive to as little as 10 s ITD. Sensitivity decreases with ITD.

– For a given ITD, phase difference is linear function of frequency

– For pure tones, phase based ITD is ambiguous

Page 33: Cosc 6326/Psych6750X

– At low to moderate frequencies phase difference can be detected. At high frequencies can use ITD in signal envelope.

– ITD cues appear to be integrated over a window of 100-200ms (binaural sluggishness, Kollmeier & Gillkey, 1990)

Page 34: Cosc 6326/Psych6750X

• Interaural intensity differences (IID)– With lateral sources head shadow reduces intensity

at opposite ear– Effect of head shadow most pronounced for high

frequencies. – IID cues are most effective above about 2000 Hz– IID of less than 1dB are detectable. At 4000 Hz a

source located at 90° gives about 30 dB IID (Matlin and Foley, 1993)

Page 35: Cosc 6326/Psych6750X

Goldstein, Sensation and Perception

Ambiguity and Lateralization

Page 36: Cosc 6326/Psych6750X

Ambiguity and Lateralization• These binaural cues are ambiguous. The same

ITD/IID can arise from sources anywhere along a ‘cone of confusion’

• Spectral cues and changes in ITD/IID with observer/object motion can help disambiguate

• When directional cues are used in headphone systems, sounds are lateralised left versus right but seem to emanate from inside the head (not localised)

Page 37: Cosc 6326/Psych6750X

• also for near sources (less than 1 m) there is significant IID due to differences in distance to each ear even at lower frequencies (Shinn-Cunningham et al 2000)

• Intersection of these ‘near field’ IID curves with cones of confusion constrains them to toroids of confusion

Page 38: Cosc 6326/Psych6750X

Spectral Cues

• Pinnae or outer ears and head shadow each each ear and create frequency dependent attenuation of sounds that depend on direction of source

• Pinnae are relatively small, spectral cues are effective predominately at higher frequencies (i.e. above 6000 Hz)

Page 39: Cosc 6326/Psych6750X

• Direction estimation requires separation of spectrum of sound source from spectral shaping by the pinnae

• Shape of the pinnae shows large individual differences which is reflected in differences in spectral cues

Page 40: Cosc 6326/Psych6750X

Distance Cues

• anechoic– intensity decreases

with distance– attenuation is higher at

high frequency– confound with

spectrum and intensity of source

• Near field IIDhttp://headwize.com/tech/aureal1_tech.htm

Page 41: Cosc 6326/Psych6750X

http://headwize.com/tech/aureal1_tech.htm

Page 42: Cosc 6326/Psych6750X

• reverberation– ratio of direct to reverberant energy indicates

distance wrt environment– reverberation pattern indicates ‘spaciousness’

of the environment– reverberation is more realistic but can degrade

localisation, speech recognition …

Page 43: Cosc 6326/Psych6750X

Visual-Auditory Interactions

• Auditory cues associated with visual targets can cue visual attention

• Latency for audition is less than vision• A sound associated with visual target

– can speed visual search– can reduce response times– facilitate saccadic eye movements– can cue attention outside the field of view

Page 44: Cosc 6326/Psych6750X

• Ventriloquism and visual capture– When a visual and auditory source are grouped,

the sound is usually perceived in the direction of the visual target

Page 45: Cosc 6326/Psych6750X

Auditory/Aural Displays

Page 46: Cosc 6326/Psych6750X

• Headphone displays– Precise independent control of inputs to each ear. – Individual display. – Closed ear type can exclude external sounds. Reduces

interference from external sources; simplifies AR systems.– Entail an encumbrance. – Diotic, dichotic (stereo) and spatialised displays– Head fixed frame of reference. Display needs to be head

tracked to register with virtual world.

Page 47: Cosc 6326/Psych6750X

• Speaker systems– Simpler, less encumbrance, multi-user– Cannot ‘occlude’ real world sounds but can sometimes

mask– Complication with echoes and cross-coupling between

channels– Interference from/with visual displays– World frame of reference. – Subwoofer allows for deep bass. Could augment

headphones

Page 48: Cosc 6326/Psych6750X

Spatialised audio

• simple ITD, IID cues in a display lateralize a sound. Sound is not ‘externalized’

• spatialised audio: generate most of the spatial cues in real world environment using signal processing

• with appropriate modeling of sound sources and user tracking can provide a compelling illusion of spatial sound in a VE

Page 50: Cosc 6326/Psych6750X

• Head related transfer function (HRTF)– describes how sound at a given location is

transformed (by pinnae etc.) as it travels to the ear, as a function of frequency

– function of source direction and distance and frequency (4D)

– equivalent to the Fourier transform of the response to a impulse source at the desired position

Page 51: Cosc 6326/Psych6750X

– IID and ITD as well as spectral cues are incorporated (interaural differences in HRTF)

Page 52: Cosc 6326/Psych6750X

Shilling & Shinn-Cunningham 2001

0.15 m

1.0 m

Page 53: Cosc 6326/Psych6750X

• To simulate a source at a given location– correct HRTF for response of the speaker

system– convolve source with impulse response

corresponding to corrected HRTF. – multiple sources possible by adding up HRTF

transformed signals

Page 54: Cosc 6326/Psych6750X

• To measure HRTF– place microphones in ear canals– measure microphone response to short clicks at

various locations– correct for response characteristics of

microphones• Lengthy, painstaking process.• Storage requirements for dense sampling

Page 55: Cosc 6326/Psych6750X

Cohen and Wenzel, 1995

Page 56: Cosc 6326/Psych6750X

• Limitations in practice:– sampling: often one distance and limited number of

directions– interpolated for other locations– generic versus individualized HRTF’s (front/back

confusion and elevation errors)– HRTF is a characteristic of the user and does not

model effects of environment. – need to track head position. Delay can be problematic.

Page 57: Cosc 6326/Psych6750X

HRTF measurement using model head (KEMAR)

Page 58: Cosc 6326/Psych6750X

Room Modeling• Can model the effects of reverberation, echoes etc.

for a room transfer function– Vary with listener and source position– can have very long response – combinatorially impractical

• Has been effort to develop efficient methods for acoustic modeling of rooms

• Improves realism and distance estimation but difficult for real-time immersive VEs

Page 59: Cosc 6326/Psych6750X

Shilling & Shinn-Cunningham 2001

Page 60: Cosc 6326/Psych6750X

Speaker Systems

• Spatialised audio complicated by fact that both ears hear each speaker and that reverberation will occur

• Effectiveness is sensitive to speaker placement

• Stereo speakers: sound seems to be localised between the speakers

Page 61: Cosc 6326/Psych6750X

• increasing number of speakers increases ability to localise sounds (e.g. 5.1 surround sound systems)

• more complex schemes are possible using DSP but very challenging (‘ambisonics’)– cancel interaural cross-talk based on HRTF

corresponding to speaker location– computations are complex, not robust and must be

done in real time if head tracked

Page 62: Cosc 6326/Psych6750X

Auditory Rendering• Auditory modeling/rendering of VEs

– sampling– synthesis of complex sounds

• spectral• physical models• granular synthesis

– Filtering: HRTFs, reverberation, room modeling– Object occlusion, air absorption, Doppler motion