Top Banner
The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh [email protected] Why VoIP's speech quality is disappointing, and how it wouldn't have to be.
46

The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh [email protected] Why VoIP's.

Jan 02, 2016

Download

Documents

Eileen Hancock
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

The Evolving Quality of Telephonic Speech

Richard A. Thompson

Emeritus ProfessorTelecom Program

University of Pittsburgh

[email protected]

Why VoIP's speech qualityis disappointing, and how

it wouldn't have to be.

Page 2: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Outline

1. Introduction

2. Human capacity for aural quality

3. History of evolving & devolving quality

4. Network integration vs app quality

5. High-fidelity Voice-over-IP

Page 3: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

1. Introduction

• Telecom technology has benefited the human species.– Morse, Bell, Tesla, Zworykin we communicate over distance,– But their inventions had greatly reduced aural & visual quality.

• During the last century, successive technology …– Raised many aspects of the original audio & video quality,– But, also lowered other aspects of app quality

• Two examples of lowered quality:1. Successive technologies reduced audio bandwidth

2. pixel-block “dance” after noisy or lost internet packets.

• This talk discusses the devolution of audio quality– And concludes that we don’t have to live with it.

Page 4: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Gucci Family Slogan

“Quality is remembered …long after

the price is forgotten”

$895

$1950

Page 5: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

2. Human Capacity for Aural Quality

• Anatomy, physics, physiology, & brainware– of human speech and hearing– How we discriminate phonemes & recognize speakers

• Section Outline1. Review of Human Speech

2. Review of Human Hearing

3. Review of Aural Processing

Page 6: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Review of Human Speech

• Speech = complex acoustic signal humans emit & receive– Sequence of air compressions & rarefactions;– Travels about 770 mph

• Speaking requires a complex structure:– By modulating an exhaled air stream, we emit

sequences of elementary sounds, called phonemes.

• If we partly close our larynx as we exhale,– our “vocal cords” vibrate at a fundamental pitch, f1 = 80 to 350 Hz,

– depending on the speaker’s size, shape, gender, & age.

• Altering tension changes f1 to any value

between half and double its regular pitch;– for singing and linguistic cues.

2. Human Capacity for Aural Quality

Page 7: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Variable Acoustic Filter

• Acoustic waveform at the larynx resembles a saw-tooth rich in harmonics.

• Mouth is a variable resonant cavity;– It acts as a tunable acoustic filter.

• By changing our mouth’s internal shape,– we attenuate different harmonics as they pass through.

• Our two main techniques are:– Change our tongue position,– Switch our nasal cavity in/out using our uvula.

• Each phoneme has a different “recipe”– of the weights of the harmonics.

ee aa

ee

nn

Page 8: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Taxonomy ofEnglish phonemes

Type unvoiced voiced

Vowel-like

mouth - vowels, ll, rr

nose - mm, nn, ng

diphthongs - ow, long-i, …

 

Fricatives hh wh

(sustained ss zz

turbulence) sh zh

ff vv 

Plosives ch j

(burst k g

turbulence) p b

t d

• Sustained phonemes:• vowels, ll, rr,• nasals,• fricatives.

• Dynamic phonemes:• Slowly: diphthongs• Quickly: plosives

• Last eight rows:• 8 diff. mouth positions• 2 phonemes per position;

• By vibrating larynx or not.

Page 9: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Mouth-to-Ear Spectrum

• Runs from f1 to our hearing limit of 14 - 20 kHz,– depending on the listener’s age, etc.

• Acoustic energy in different phonemes– is distributed differently over the aural spectrum.

• For example, fricatives like ss,– have significant energy at the high end of the spectrum.

• Hearing accuracy is– a non-linear function of how much

of this spectrum is actually heard.

Page 10: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Review of Human Hearing

• Ear drum, in each ear,– is AC-coupled (the Eustachian tube maintains DC)– to the cochlea by tiny linked bones.

• Cochlea is a horn, wrapped into a snail-shell,

– filled with fluid, lined with small hairs.

• The acoustic signal– causes standing waves inside the cochlea

to excite nerves at the base of each hair.– These nerves transmit a parallel signal to the brain,

giving the weights of the signal’s harmonics.

• Cochlea & its driver (in brainware) compute* the– Fourier Series coefficients of the received acoustic signal.

2. Human Capacity for Aural Quality

*Color code for what we think happens

Page 11: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Hearing Brain-Ware

• Behind this driver, mid-level BW does more processing:1. Calculates acoustic directionality,

2. Selects the desired signal out of background noise,

3. Performs phoneme discrimination (independent of the speaker),

4. Identifies who the speaker is (independent of the phoneme).

• Last 3 tasks are supported by– high-level syntactic & semantic processing which,– at even higher levels of brainware,– depend on content, context, background, and emotional state.

• This paper deals only with low- and mid-level brainware,– Which performs the last two tasks on the list above.

AD

NF

PD

SI

EDEarHW

Page 12: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Review of Aural Processing

• Mid-level brainware identifies speakers– by comparing the set of weights, received from the driver,– against a speaker database.

• Our accuracy at finding a best match is a– nonlinear function of how many weights the

speaker-identifier process receives from the driver.– This number of coefficients depends on how much

acoustic spectrum is heard by the cochlea & its driver.

• We discriminate phonemes more indirectly.– The spectral envelope of most phonemes has

four relative maxima, called “formant frequencies,” F1 to F4.

2. Human Capacity for Aural Quality

Page 13: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Formant Frequencies

• F1 and F2 peaks for ee and aa can be seen

– in the frequency domain.• Generalized time-domain diagrams:

– of F1 and F2 for 21 phoneme-pairs,

– each a dynamic consonant that elides into a vowel.

ee

aa

F1

f1 F2

Page 14: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Formants for Vowels

• Spectral position of these formants, especially F1 and F2,– is the most important cue in phoneme discrimination.– But, it’s complex because formant positions are speaker dependent.

• Each point is an [F1, F2] value for– 76 speakers of 10 sustained phonemes.– Clusters show the intended phoneme.– Proximities pot. error w/o ++spectrum.

• EG, upper-left cluster ee.– Low F1 & high F2 consistent spectrum.

– High prob. ee interpreted as short-i.

ee

Page 15: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Phoneme Discrimination

• We discriminate phonemes in mid-level brainware by:1. Computing formants from weights received from driver,

2. Comparing Fs against a database that works like

• Our accuracy at finding the best match is a– nonlinear function of how many formants

the phoneme-discriminator has available.– This # of formants depends on how much of the

acoustic spectrum is heard by cochlea & driver.

• We have a mirrored set of multilevel processes– in the speaker’s brainware also.– These communicating processes translate thoughts into language,– then to sequence of neural signals that control our mouth parts.

Page 16: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

3. Technology’s Impacton Quality

• After listing components of aural quality,– we review successive technologies and how they– raised some aspects of audio quality and lowered others.

• After discussing their effect on– speaker identification and phoneme discrimination,

• We review the history of the complaint that technology– should never lower any aspect of application quality.

• Section Outline1. Aural Quality and its Impairments

2. Identifying Phonemes and Speakers

3. The History of the Complaint

Page 17: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Aural Quality & its Impairments

• Quality of a natural acoustic signal is measured by its:– Intensity (loudness),– Purity (nothing else added),– Immediacy (un-delayed)– Clarity (undistorted), &– Fidelity ().

• By definition, Fidelity measures an audio signal’s– faithfulness to its acoustic analog.– We’ll defer to the lay def that it implies high band-width.

3. Technology’s Impact on Quality

Page 18: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Natural Impairments

• Natural acoustic signals suffer 5 impairments:– loss,– noise,– crosstalk,– delay, &– echo.

This figure will grow downwardon the following slides

Page 19: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Pros & Cons ofAnalog Networks

• The role of any network is to eliminate natural loss.– Usually replaces large acoustic delay by small signal delay– May also reduce crosstalk & echo.

• Analog networks add crosstalk from the loop pair– and echo from impedance mismatch and leaky hybrids.

• &, they add new impairments, not seen in natural signals:– Amplitude distortion from amplifiers that clip,– Band-restriction & frequency distortion from wire reactance,– Delay distortion because frequency components have diff velocities.

Page 20: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Fidelity inAnalog Networks

• 500-sets– Cut f1 off at low end

– Had 12-kHz of bandpass.– (modern phones have no reason to provide that much BW)

• If phones are connected in a local call,– loop limits end-to-end bandpass to 8-10 kHz, dep on loop-length.

• In long-distance calls,– network further limits bandpass to 4-6 kHz, dep on distance.

• 4-kHz analog LD channel had poorest fidelity, but…– Bell System “spun” the term “toll grade” to imply high quality.

• Note: upper limit of all BWs is given as “3-dB frequency;”– There is significant audio power outside these formal limits.

Page 21: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Subsequent Analog improvements

• Analog technology advancements in:– Channels (fiber),– Amplifiers,– Echo cancellers,– Shielding, &– Noise filters;

• But, not band-restriction,– nor the other two forms of distortion.

• Biggest improvement comes from going digital

Improved:• loss,• noise,• crosstalk,• delay,• echo, &• amplitude distortion.

Page 22: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Pros & Cons of Digital Networks

• Digitizing an audio signal greatly improves intensity.• And, a digital PSTN is virtually noise-free.

– Even loop noise (assume ADC in CO) is partially blocked

on speaker side by ADC anti-alias filter.

• But, new noise is added by:– quantizing, companding, mu-to-A conversion, & bit errors.

• And, Echo is worse because digital transport is 4-wire,– which requires many more hybrids (which can leak) in the network.

Note that adigital networkis embeddedinside ananalog network

Page 23: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Fidelity in Digital Networks

• By far, the worst impairment is that– anti-aliasing filters in the A-to-D converters impair fidelity,– So, all calls are nominally as band-limited as LD analog calls

• Fidelity is even perceptibly lower than “nominal”– because blocking all audio above 4 kHz– requires a half-power point at 3.7 kHz &– high-end drop-off that is much steeper than in analog networks.

• So, digital calls have better SNR than analog calls;– But a local digital call has perceptibly lower fidelity

than even a long-distance analog call.

• For example,

0 4 kHz

analog

digital

Page 24: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

w

Transmitted acoustic signal

Natural medium

Analog network

Digital network

Packet network

Received acoustic signal

Trans-ducer

Trans-ducer

Trans-ducer

Trans-ducer

Trans-ducer

Trans-ducer

With givenintensity,purity, clarity,& fidelity Impaired by natural

loss, noise, crosstalk,delay, & echo

Deletes naturalloss & delay.Reduces naturalnoise, crosstalk,& echo

With poorerintensity, purity,Immediacy,clarity & fidelity

Further impaired byanalog loss, noise,crosstalk, delay, echo,band-loss, & 3 distortions

Reduces analogloss, echo,noise, crosstalk,& 3 distortions

Further impaired byquantization noise,bit-error noise, &++band-loss fromanti-aliasing filter

Retains alldigital networkimpairments

Exacerbatesbit-error noise.Adds more delay,which can ++echo

VoIP’s Cons

• VoIP further impairs digital audio quality• Audio purity is further impaired because:

– speech compression exaggerated bit errors– noticeable clunks from lost packets (packet loss: 0 1% 5% )– silence-detecting codecs’ slow-start may clip leading-plosive

Note that apacket networkis embeddedinside adigital network

Page 25: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

VoIP & Delay

• Immediacy is greatly impaired by delays caused by:– Packetization, jitter buffers, router proc, & multi-hop packet re-xm.

• VoIP calls often exceed– user acceptance of conversation interaction delay.

• User opinions below are my “compromise”– between Bell System standards & IETF standards

Round-Trip Delay Opinion< 150 ms good150-300 ms noticeable300-450 ms annoying> 450 ms unacceptable

Page 26: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

VoIP & Echo

• Acoustic echo– Is eliminated by wearing a head-set.

• Electrical echo1. VoIP-PSTN gateways more problematic than D-to-A gateways

because echo canceller is far from the echo source (hybrid)

2. User sensitivity to echo depends on individual, echo-to-signal ratio (TELR), & one-way delay.

• Since a digital conversation’s TELR 55 dB,– One-Way delay must be < 200-500 ms; but it’s often >200ms.– Large delay reduces the effectiveness of electronic echo cancellers.

• Summarizing, VoIP-to-POTS &, esp, VoIP-to-cell calls– are often characterized by annoying echo.

is much worse because:

Page 27: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Summarizing…

• Digitizing speech – Improves intensity & purity;– But, noticeably degrades fidelity.– Overall, digital is perceived as “better than” analog;– But, it could be much better.

• VoIP makes no positive contribution;– VoIP only lowers the quality.– The last section proposes how we might change this.

Page 28: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Identifying Phonemes and Speakers

• “Telephone voice” impairs our ability to– hear what a speaker says & identify who the speaker is.

• 4-kHz DS0 channel has enough BW for F1 & F2,

Little difficulty identifying vowels, ll, and rr.

• Hearing the 3rd and 4th formants would:– Slightly improve discrimination of these sounds,– Greatly improve discrimination of fricatives & plosives.

• A low F3 passes over a DS0 channel;

– But a high F3 will not, and F4 will not.

3. Technology’s Impact on Quality

Page 29: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

++Bandwidth ++Phoneme Discrimination

• We need a 7-kHz channel to receive all four formants,• & >7 kHz for sounds we typically struggle with:

– nasals (distinguishing mm and nn),– plosives (distinguishing k and t),– fricatives (distinguishing ss and ff).

• Exp: ff was spoken to many listeners over 3 channels:

Identified as:

Chan BW ff th p other

200-5000 Hz 194 35 6 9

200-2500 Hz 186 31 6 13

1000-5000 Hz 162 28 12 50

Page 30: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

++Bandwidth ++Speaker Identification

• We identify speakers directly by their Fourier weights,– Not their formant frequencies.– Success is based on the amount of data: # weights received.

• Consider three population groups:• Consistent with most people’s experience on the phone:

– Men are easily recognized, women less easily,– & we see why “all children sound the same on the phone.”

• A child could be recognized over a 12-kHz channel– as well as an average male is over a 4-kHz channel.– At 12kHz, any woman would be as identifiable as any man at 4kHz,– and men could be almost perfectly identified.

Type f1-range #H’s < 3.7kHz RankMen 75-150 Hz 25-50 mostWomen 140-300 Hz 12-26 middleChildren 275-350 Hz 10-13 least

Section 4 discusses howaudio quality is Impactedby “integrated networks”

Page 31: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

The History of the Complaint

• When T1 was proposed in the 1960s,– Amos Joel objected to its 8-kHz sample rate.

• T1’s advocates stifled him by saying he was– a dinosaur who objected to digital voice (he did not).

• Now, some VoIP advocates– use this tactic to stifle their critics.

• 8-kHz sampling was standardized– when bandwidth was expensive;

• Now that it isn’t,– we’re still stuck with the DS0 channel …– or are we?

Page 32: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

4. Network Integrationand App-Quality

• Review historical attempts at integrating networks,– Generalize how integration naturally lowers app quality– Ask why we have refused to learn this lesson.

• Section Outline1.History of Integrated Networks

2.Why Integration Lowers App Quality

3.Why are we Blind to this Lesson?

Page 33: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

History ofIntegrated Networks

• More than 35 years ago, ISDN … – was proposed as a global end-to-end network for all data types.– Today, it’s relegated to the network edge, as an access standard.

• ISDN’s post mortem shows two reasons it failed:1. ISDN needed a global digital network,

• an inexpensive users’ appliance/terminal,

• and a collection of integrated services – simultaneously.

• AT&T could have done it, but focused on surviving (it didn’t).

2. We learned that the application matters.• Ethernet’s stat-muxing was more efficient for bursty data,

• especially key-strokes on a LAN, than ISDN circuit switching.

• And, efficiency trumped integration.

4. Network Integration and App-Quality

Page 34: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

The 2nd attempt

• More than 20 years ago, ATM …– was proposed as a global end-to-end network for all data types.– Cell relay & virtual circuits avoid congestion from large packets– Limited success in “core,” where congestion is significant,

• Failed to achieve its main goal, again for two reasons:1. ATM’s success required that it also be cost-effective as a LAN.

• But, Ethernet prevailed because of embedded base of interface cards, LAN-manager familiarity, & evolution to higher rates

2. We saw again that application matters.• ATM was compared to a duck:

“Ducks can swim, fly, and walk, but none well.

ATM carries voice, data, and video, but none well.”

Page 35: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

The 3rd attempt

• Now, the Internet is proposed– as a global end-to-end network– to carry all data types.

• ISDN and ATM each failed– in part because application matters.

• What is different now?

Page 36: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Why IntegrationLowers App Quality

• Let’s examine an economic explanation.– Box represents the cost of a basic un-optimized network

• Consider four cases defined by Networks: Separated Integrated

Low app-quality

High app-quality

4. Network Integration and App-Quality

$ basicnetwork

1 2

3 4

Page 37: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Implementations withLow Quality

1. Separated & low - 2 apps, voice & data, with equal load

– Boxes represent the cost of two separate networks,• each dedicated to one app.

– App quality is barely acceptable because• neither network has been optimized for its app’s quality.

2. Integrated & low - 2 apps over an un-optimized integrated network.

– This box’s area >> the reference square• because the integrated network supports twice as much load.

• But, its area is less than the sum of the areas of 2 squares

• because of economy-of-scale & reduced staff of network managers.

– Since apps may interact in the integrated network,• each app’s quality is worse than in 2 separate networks.

• This is the classic “duck”.

$ basicvoice

network

$ basicdata

network

$ basicintegratednetwork

Page 38: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Implementations withHigh Quality

3. Separated & high – Increase the quality of both appsby optimizing each network in Case 1 raises cost of each.

– Squares rectangles on different dimensions

optimize each network differently for resp. app.

4. Integrated & high - Improve apps’ quality in integrated network

– Perform same optimizations as on the separate networks.– So, the “duck” is elongated, but in both dimensions.– Significantly larger square than in Case 2– “SWAN” (Superior-service-With-All-apps Network).

$ goodvoice

network

$gooddata

network

$ integratednetwork thatis good for

voice & data

Page 39: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Integration vs App-Quality

• If we don’t care about app-quality,– Case 2 beats Case 1– the integrated network is slightly more economical.

• If we do care about quality, Case 3 vs Case 4?

– Unclear how area of SWAN compares against– the sum of the areas of 2 separate rectangles

• Does the cost of optimizing an integrated network,– so its apps have good quality,– cancel the small savings provided by the integration?

• If not, wouldn’t IP-based voice carriers– Like Qwest long-distance, Skype, and Vonage– have dominated the telephone industry by now?

$ basicvoice

network

$ basicdata

network

$ basicintegratednetwork

$gooddata

network

$ goodvoice

network

$ integratednetwork thatis good for

voice & data

Page 40: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Why are we Blindto this Lesson?

• Prior analysis is admittedly weak,– But it’s not fundamentally flawed.

• Seems clear from analysis & history lesson– that network integration is a bad idea;– assuming we don’t want to further degrade app-quality.

• Alchemists, a half a millennium ago, – had a goal that is at least easy to appreciate.

• Our determination to continue trying– to integrate networks is admirable, but puzzling.

Page 41: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

5. What Can We Do?

• Ranting about how bad things are– has become an all-too-familiar form of discourse.– Want to more than rant, & make a positive contribution,

• This section makes the transition from– how-bad-it-is to how-good-it-could-be by discussing– the market potential and proposing a solution.

• Section Outline1.Market Potential for High-Quality Apps

2.High-fidelity Voice-over-IP

Page 42: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Market Potential for High-Quality Apps

• Significant market niche that cares about voice quality?– If there is a market, it’s among people who

• appreciate music that sounds better over a high-fi channel &

• are annoyed by, or have difficulty with, cell-phone audio quality

– This group is older, and growing rapidly as• the surge of baby boomers become older … and deafer.

• Decreasing ear-bandwidth reinforces adequacy of 12-kHz channel.

• Not an accurate marketing study - But, it seems likely that,

– If market size to justify products isn’t significant enough yet,– it could become large enough in just a few more years.

5. What Can We Do?

Page 43: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

analogsignal

High-fidelityVoice-over-IP

• VoIP presents the opportunity to raise voice quality,– not just to toll-grade, but even beyond.

1. 12-kHz channel would virtually eliminate “telephone voice” &

– Improve phoneme discrimination & speaker identification.– Channel bandwidth = 3x the DS0’s equivalent bandwidth

• G.711 codec 3x: Anti-aliasing filter’s BW & the ADC’s sample rate

– & Must packetize the digital stream at speaker-end• So it’s easily separated for a G.711 at the listener-end.

? Should be easily downward compatible: G.711New

? Made to work with speech compressing codecs

– While this proposal needs to be built & tested,• Two others have been implemented and tested at Pitt

5. What Can We Do?

A-to-Dconverter

12-kHzAAF

Packe-tizer

24 KHz

Note: The paperis incorrect.

Page 44: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Minimizing Delay

2. VoIP delay,– & echo’s dependence on delay,– can be reduced by optimal packetization.

• When a network is lightly loaded,– packetization delay is reduced by generating small packets– Often – perhaps every 10 ms.

• When a network is heavily loaded,– network queue delays are reduced by generating large packets– less often – perhaps every 30 ms.

• We have demonstrated this– & necessary signaling has been implemented in RTCP.

Page 45: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

Maximizing Quality

3. Overall audio quality,– as defined by the ITU, is a complicated function of

• codec type, end-to-end delay, fidelity, etc.

• If an IP-phone has multiple codec-types,– We can optimize overall audio quality

• by changing codec-type mid-stream,

• depending on network congestion.

– Control signaling can also use VoIP’s RTCP.

• At Pitt, we are building …– a prototype system, we call Ernestine,– in which such techniques will be built & tested.

Page 46: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.

6. Conclusion

• Technology has improved net audio quality– over the last 100 years.– But, some aspects of audio quality,

especially fidelity, have devolved.– But, this devolution has an ironic solution.

• VoIP’s poor audio quality is not inherent to VoIP;– But, is a function of design choices,– some of which date back to the 1960s.

• Surprisingly, VoIP gives us the opportunity– to provide excellent audio quality,– If design changes proposed here are implemented.