Slide 1
Part B
Sliding-band Dynamic Range Compression
(N. Tiwari & P. C. Pandey, NCC 2014)
P. C. Pandey, "Signal processing for improving speech perception
by persons with sensorineural hearing loss: Challenges and some
solutions", IEEE Workshop on Intelligent Computing, IIIT Allahabad,
13-15 Oct. 2014
[email protected] Dept., IIT
BombayOverviewIntroductionSliding-band Dynamic Range Compression
Offline & Real-time ImplementationsTest ResultsSummary &
Conclusion#/[email protected] Dept., IIT
BombayIntroductionDynamic range compressionTo present sounds
comfortably within the limited dynamic range of the listener by
amplifying the low level sounds without making the high level
sounds uncomfortably loud.
Processing stepsInput level estimationGain calculation based on
input levelMultiplication of input with gain function Output
resynthesis
ClassificationOn the basis of signal level calculation:
single-band or multibandOn the basis of gain control method:
feedback or feed-forward12345#/[email protected] Dept.,
IIT BombayProcessingGain dependent on the dynamically varying
signal level.ParametersCompression threshold (Th)Compression ratio
(CR)Attack & release time
Problems
Single-band dynamic range compressionDoes not account for
frequency dependent loudness growth functionPower mostly
contributed by low-frequency components amplification of
high-frequency components depends low-frequency components
Inaudibility of high frequency components, distortions in temporal
envelope
12345#/[email protected] Dept., IIT BombayMultiband
dynamic range compression
General scheme of processingSpectral components of the input
signal divided in multiple bands and the gain for each band
calculated on the basis of signal power in that band.Parameters
(band specific): compression threshold Th, compression ratio CR,
attack & release time for
detection.12345#/[email protected] Dept., IIT
BombayLippmann et al. (1980): 16-channel compression9% improvement
in recognition score over linear amplification.Asano et al.(1991):
Multiband dynamic range compression realized as a single
time-varying FIR filter & implemented on a 32-bit DSP
fixed-point processorLess spectral distortion due to smoothened
frequency response of FIR filter.Stone et al. (1999): Comparison of
single and four-channel compression schemes & effect of varying
CR, Th, and attack & release times Intelligibility &
quality tests showed no specific preference for schemes.Li et al.
(2000): Wavelet-based compression (7 octave sub-band analysis using
wavelet filter bank & resynthesis after applying a logarithmic
compression on the wavelet coefficients)Increase in intelligibility
without introducing noticeable distortions.Magotra et al. (2000):
Multiband dynamic range compression using a 16-bit fixed-point
processor Taylor's series approximation used for the compression
function to reduce computations in gain calculation.
12345#/[email protected] Dept., IIT BombaySpurious
spectral distortionsReduction in spectral contrasts and modulation
depthDistortion in spectral shape of formants lying across the band
boundariesDistortion of formant transitions across the adjacent
bands Time-varying magnitude response without corresponding
variation in the phase response leading to quality degradation
Audible distortions, perceptible discontinuities, adverse effect on
the perception of certain speech cuesDisadvantages of multiband
dynamic range compression12345#/[email protected] Dept.,
IIT BombayExample of distortion due to multiband dynamic range
compression during spectral transition
Processed output: multiband compression with 18 auditory
critical bands, CR = 30, Ta = 6.4 ms, Tr = 192 ms
Swept sinusoidal input: constant amplitude, 125 250 Hz linearly
swept frequency, 200 ms sweep durationTime (s)Time
(s)12345#/[email protected] Dept., IIT BombayInvestigation
objective
Real-time dynamic range compression to compensate for
frequency-dependent loudness recruitment associated with
sensorineural hearing loss for use in hearing aids with a low-power
processor.
Low distortions Low computational complexity & memory
requirementLow signal delay (algorithmic +
computational)12345#/[email protected] Dept., IIT
BombaySliding-band compressionProposed for significantly reducing
the temporal and spectral distortions associated with the currently
used single-band and multiband compressions in hearing
aids.Realized with computational complexity acceptable for
implementation on a 16-bit fixed-point DSP processor and signal
delay acceptable for real-time application.
Investigations using offline & real-time
implementationsSelection of processing parameters
Evaluation of the implementationsInformal listening, PESQ
measure 12345#/[email protected] Dept., IIT
BombaySliding-band Dynamic Range CompressionProcessing steps
Short-time spectral analysis: windowing, zero-padding, DFT
calculationSpectral modification: gain calculation, output spectrum
calculationResynthesis: IDFT calculation, windowing, overlap-add
ProcessingApplying a frequency-dependent gain function, with the
gain for each spectral sample determined by the short-time power in
auditory critical bandwidth centered at it & in accordance with
the specified hearing thresholds, compression ratios, and attack
and release times.12345#/[email protected] Dept., IIT
BombaySpectral modification
Pmc(k): Power at upper comfortable listening levelCR(k):
Compression ratio Short-time spectral analysis: windowing (length
L, shift S), zero-padding, N-point DFT Resynthesis: N-point IDFT,
overlap-add 12345#/[email protected] Dept., IIT
BombayAuditory critical bandwidth BW(k) = 25 + 75(1 + 1.4f 2)0.69,
freq. sample = k, freq. = f
Target gain calculationPower at upper comfortable listening
level: Pmc(k)Compression ratio: CR(k)Input power: Pic(k), Output
power: Poc(k)Target gain: Gt(k) = Poc(k) / Pic(k)Compression
relationdB scale: [Poc(k) / Pmc(k)]dB = [Pic(k) / Pmc(k)]dB /
CR(k)linear scale: Poc(k) / Pmc(k) = [Pic(k) / Pmc(k)]1/
CR(k)Target gain for kth spectral sample[Gt(k)]dB = [1 1 / CR(k)]
[Pmc(k) / Pic(k)]dB Gain
calculation12345#/[email protected] Dept., IIT BombayGain
calculation (contd.)Gain changed in steps from the previous value
towards the target value with settable attack and release timesFast
attack: to avoid the output level from exceeding UCL during
transients Slow release: to avoid the pumping effect or
amplification of breathingNumber of steps during attack phase = sa
Number of steps during release phase = srTarget gain corresponding
to min. input level = GmaxTarget gain corresponding to max. input
level = GminGain ratio for attack phase a = (Gmax / Gmin)1/saGain
ratio for release phase r = (Gmax / Gmin)1/sr Gain for ith window
& kth spectral sampleG(i,k) = max[G(i 1 ,k) / a, Gt(i,k)] for
Gt(i,k) < G(i 1 ,k) min[G(i 1 ,k) r, Gt(i,k)] for Gt(i,k) >
G(i 1 ,k)Attack time Ta = saS / fs , Release time Tr = srS / fs [fs
= sampling freq., S = window
shift]12345#/[email protected] Dept., IIT
BombayAnalysis-synthesis using least-square error based signal
estimation from modified STFT (Griffin & Lim, 1984): Processing
artifacts reduced by masking the effect of phase discontinuities in
the modified short-time complex spectrum. Look-up table based gain
calculation: Two-dimensional look-up table relating the input power
with gain as a function of frequency. Reduces computations for
real-time implementation.Permits compression function most suited
to compensate for the abnormal loudness growth.Implementation
related challengesModifications in the short-time magnitude
spectrum without corresponding changes in the phase spectrum can
cause audible distortions.Computational complexity: log or series
approximation based gain calculations not suitable for use in
sliding-band compression.Solution12345#/[email protected]
Dept., IIT BombayOffline & Real-time
ImplementationsImplementation for offline processing Implementation
using Matlab 7.10 for evaluating the performance of the proposed
technique and the effect of processing parameters. Processing
parameters fs = 10 kHz Frame length = 25.6 ms (L = 256) Overlap =
75% (S = 64) FFT size N = 5122D look-up table for
frequency-dependent compression based on a linear relation between
input-dB and output-dB, with settable CR(k) and Pmc(k). Input
range: 20 log intervals (trade-off: small gain increments, look-up
table size). Look-up table with 25620 entriesAttack and release
times sa=1, Ta = 6.4 ms: Fast attack to avoid uncomfortable level
during transients sr=30, Tr = 192 ms: Slow release to avoid pumping
& amplification of breathing12345#/[email protected]
Dept., IIT BombayImplementation for real-time
processingImplementation on a 16-bit fixed-point DSP board to
examine suitability of the technique for use in hearing aids.DSP
chip: TI/TMS320C551516 MB memory space (320 KB on-chip RAM with 64
KB dual access, 128 KB on-chip ROM)Three 32-bit programmable
timers4 DMA controllers each with 4 channelsFFT hardware
accelerator (up to 1024-point FFT)Max. clock speed: 120 MHz DSP
Board: eZdsp4 MB on-board NOR flash for user programStereo codec
TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 192 kHz
samplingSoftware development: C using TI's 'CCStudio ver. 4.0
12345#/[email protected] Dept., IIT BombayInput-output
operations: DMA based I/O with cyclic buffersADC and DAC: one codec
(left channel) with 16-bit quantizationProcessing parameters (same
as for offline processing): fs = 10 kHz, L = 256, S = 64, N =
512Data representation (input samples, spectral values, processed
samples): 16-bit real & 16-bit imaginary
Implementation12345#/[email protected] Dept., IIT
BombayData transfers & buffering operations (S = L/4)DMA cyclic
buffers 5-block S-sample input buffer 2-block S-sample output
buffer Pointers Current input block Just-filled input block Current
output block Write-to output block(incremented cyclically on DMA
interrupt)Signal delay Algorithmic: 1 frame (25.6 ms) Computational
frame shift (6.4 ms)
12345#/[email protected] Dept., IIT BombayTest
ResultsTests for verification and evaluationOffline
processingVerification of the compression technique for speech
input with a large level variation and examination of the effect of
different set of processing parameters.Assessment of output speech
quality (using informal listening) for different input speech
materials and time varying levels.Comparison of distortions
introduced by different compression techniques during spectral
transitions.Real-time processingComparison of the processed outputs
from offline & real-time implementation: informal listening,
PESQ measure (0 4.5).Signal delay & computational
requirement.12345#/[email protected] Dept., IIT
BombayExample: "you will mark ut please" concatenated with scaling
factors for variation in the input level. CR = 2, Ta = 6.4 ms, Tr =
6.4 & 192 ms.
Input waveform Scaling factor Unprocessed waveform Processed Tr
= 6.4 ms, low Pmc Processed Tr = 192 ms, low Pmc Processed Tr = 6.4
ms, high Pmc Processed Tr = 192 ms, high Pmc
Time (s)
Results from offline processingProcessing of different speech
materials with varying levels: No audible roughness or distortion
during informal listening.12345#/[email protected] Dept.,
IIT Bombay
Time (s)
Distortions during spectral transitions: Example of swept
sinusoidal input. Sliding band compression outputMultiband
compression (18 auditory critical bands) outputSingle-band
compression outputInput: constant amplitude, 125 250 Hz linearly
swept frequency, 200 ms sweep durationCR = 30, Ta = 6.4 ms, Tr =
192 ms. 12345#/[email protected] Dept., IIT BombayResults
from real-time processingInformal listening: real-time output
perceptually similar to the offline outputPESQ for real-time w.r.t.
offline : 3.5Signal delay = 36 msUse of processing capacity: 41%
(lowest proc. clock for satisfactory operation = 50 MHz, max. clock
= 120 MHz)
Unprocessed waveform Offline processed waveformReal-time
processed waveform
Example: "you will mark ut please" concatenated with scaling
factors for variation in the input level. CR = 2, Ta = 6.4 ms, Tr =
192 ms, low Pmc.Time (s)12345#/[email protected] Dept.,
IIT BombaySummary & ConclusionsSliding-band dynamic range
compression presented to compensate for frequency-dependent
loudness recruitment associated with sensorineural hearing loss
without introducing the distortions associated with single-band
& multiband compression.Realized using modified fixed-frame
analysis-synthesis for low computational complexity & without
distortions associated with phase discontinuities.Suitable for
speech & non-speech audio & provision for settable attack
time, release time, & compression ratios.Implemented using
16-bit fixed-point DSP chip & tested for satisfactory
operation: 36 ms signal delay, 41% use of processing capacity,
indicating scope for combination with other processing
techniques.1234 5#/[email protected] Dept., IIT
BombayThank YouTo be continued to Part [email protected]
Dept., IIT Bombay10
20
30
40
50
60
70
80
90
100
60
70
80
90
100
110
120
130
Output dB SPL
Input dB SPL
Linear
Compression
Limiting
CR = 1
CR = 2
Th
110
Input Signal
BPF-1
BPF-2
BPF-n
Detector
Detector
Detector
Gain Calc.
Gain Calc.
Gain Calc.
Delay
Delay
Delay
Output Signal
Th
CR
CR
CR
Th
Th
Short-time Spectral Analysis
InputSignal
Resynthesis Using Overlap-add
Spectral Modification
OutputSignal
Level Estimation
Target Gain Calc.
CR(k)
Pmc(k)
Gain Calc.
kth Sample-Centered Band
kth Sample
Attack Time
Release Time
Input Short-time Complex Spectrum
Modified Short-time Complex Spectrum
Band Samples
Text
Codec
Processor
Output Cyclic Buffer
Input Cyclic Buffer
IFFT & OutputWindowing
InputWindowing & FFT
Output Signal
Input Signal
ADC
DAC
Overlap- Add
Spectral Modification
Text
Input Samples
Ouput Samples
Just Filled Block
Input Block
L
N - L
Words
Output Block
Mult. By Modified Hamming Window
Mult. By Mod. Hamm. Window & Overlap-Add
L
N - L
Words
DMA Input Cyclic Buffer
DMA Output Cyclic Buffer
Write to Block
Input Data Buffer
Output Data Buffer
LSamples
SSamples
FFT
IFFT