Fan Chirp Transform

Department of Precision and Microsystems Engineering

Improvements on Time-Frequency Analysis

using Time-Warping and Timbre Techniques

Name: Maarten van der Seijs

Report no: EM 11.018

Coach: dr. ir. D. de Klerk

Professor: prof. dr. D.J. Rixen

Specialisation: Engineering Mechanics

Type of report: Masters Thesis

Date: Delft, June 6, 2011

Abstract

Spectral analysis of non-stationary signals is known to be a challenging task. Classical methods

like the discrete Fourier transform are often inadequate to capture and track periodic content with

rapidly changing frequencies. This is basically for two reasons. On one hand, the Fourier transform

is intended for expressing frequency content in terms of constant-frequency contributions. On the

other hand, the simultaneous accuracy of temporal and spectral localisation is limited by the time-

frequency uncertainty principle. This thesis lays out the findings of an explorative study towards

potential improvements on time-frequency analysis.

Anticipating on the first issue, the concept of time-warping has been explored. By stretching and

contraction of pieces of the signal, frequency changes may be "flattened out", resulting in improved

detection of non-stationary frequencies and much sharper spectra than possible with traditional

Fourier analysis. Both linear and non-linear time warping approaches were investigated, together

with the required non-uniform interpolation techniques.

Application of linear time-warping prior to a Fourier transformation leads to the definition of the Fan

chirp transform. This transformation is in essence closely related to the popular short-time Fourier

transform, but provides time-frequency basis functions in a fan-geometry rather than a rectangularly-

tiled grid. The skewed basis functions match the harmonic structure of an instationary component

with linearly increasing frequency.

The second issue is addressed by considering periodic signals in their entirety rather than by their

individual partials (or harmonics, overtones). A novel concept is proposed: timbre analysis. The

timbre representation provides means to classify a tonal signal, similar to the way the human ear

(which is in fact a remarkably sophisticated Fourier analyser) perceives and identifies sound. It is

shown that the instantaneous timbre, obtained by normalisation of the harmonic phases, tends to

remain stationary throughout a non-stationary signal.

The timbre representation is used to identify components in polyphonic problems, where the signal is

a mixture of multiple crossing tonal components. In addition, a pitch tracking technique is proposed

that tracks a periodic component based on its timbre. The component can then be isolated and

extracted using Vold-Kalman filtering.

3

Preface

This thesis is the result of a Master of Science thesis project from October 2010 onwards. It was

fulfilled in the group of Engineering Dynamics, which is part of the Precision and Microsystems

Engineering department at Delft University of Technology.

First, I would like to thank dr. ir. Dennis de Klerk for his enthusiastic and dedicated supervision. As

a true expert in experimental dynamics, he confronted me with a diversity of challenges and never

failed to inspire me.

Second, I greatly acknowledge prof. dr. Daniel Rixen for his support throughout my entire Masters

studies. His readiness to help and ever-constructive suggestions are exemplary. I frankly believe that

due to his involvement with students, many will eventually find the path to Engineering Dynamics.

Finally, I would like to thank my family, friends and house mates for their love, support and reflection

throughout my entire studies in Delft.

Maarten van der Seijs,

June 2011

4

Contents

Abstract 3

Preface 4

Contents 5

Nomenclature 9

Introduction 13

Research context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Research goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Personal contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

I Basic Concepts 15

1 Time-Domain Concepts 17

1.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.1.1 Continuous-time signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.1.2 Discrete-time signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.1.3 Digital signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2 Periodic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2.1 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2.2 Frequency and phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2.3 Periodic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.2.4 Orthogonality of harmonic waves . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3 Signal modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3.1 Sinusoids plus noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.3.2 Tonal components plus noise model . . . . . . . . . . . . . . . . . . . . . . . . 22

1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Frequency-Domain Concepts 25

2.1 Time domain vs. frequency domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5

CONTENTS

2.1.1 Example 1: Basis vector transformation . . . . . . . . . . . . . . . . . . . . . . 26

2.2 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.1 Trigonometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.2 Complex exponential series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.3 Example 2: Trumpet harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.1 Continuous-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.2 Discrete-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4.1 Spectral symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4.2 Periodic extension & spectral leakage . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4.3 Plancherel theorem & Parseval’s theorem . . . . . . . . . . . . . . . . . . . . . 35

2.4.4 Fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.4.5 Example 3: Trumpet DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5.1 Window application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.5.2 Window properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.5.3 Rectangular window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5.4 Hanning window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5.5 Gaussian window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.5.6 Cosine and cosine-sigma window . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.5.7 Example 4: Windowing of a simple signal . . . . . . . . . . . . . . . . . . . . . 42

2.6 Uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.6.1 Temporal and spectral localisation . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.6.2 Time-frequency product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.6.3 Uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.6.4 Example 5: Time-frequency product of four windows . . . . . . . . . . . . . . . 45

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

II Advanced concepts 47

3 Time Warping 49

3.1 Linear time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.1.1 Warp function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.2 Chirp wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.3 Chirp rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.4 Inverse warp function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1.5 Inverse time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1.6 Example 6: Linear warp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.2 Non-linear time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3 Discrete implementation & interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3.1 Interpolation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3.2 Spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3.3 Interpolation performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3.4 Example 7: Linear warp DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6

CONTENTS

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Timbre 59

4.1 Definition of timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.1.1 Normalised amplitude and phase . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.2 Complex normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.3 Timbre vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Instantaneous timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.2 Discrete implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.3 Example 8: Trumpet timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.4 Bandwidth considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.5 Example 9: Helicopter timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.3 Warped timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

III Short-time Spectral Analysis 69

5 Short-Time Fourier Transform 71

5.1 Short-time blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.1 Shift size & overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2 Short-time DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3 Time-frequency considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3.1 Spectral/temporal resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3.2 Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3.3 Zero-padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3.4 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Fan Chirp Transform 77

6.1 Formulation of the FChT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Short-time FChT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2.1 Block chirp rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2.2 Example 10: STFChT of a chirp wave . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2.3 Example 11: STFChT of an engine run-up . . . . . . . . . . . . . . . . . . . . . 80

6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

IV Pitch Tracking & Order Extraction 83

7 Pitch Tracking Techniques 85

7.1 Pitch detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2 Pitch tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.1 Pitch salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.2.2 Salience tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.2.3 Pitch tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7

CONTENTS

8 Vold-Kalman Order Filtering 91

8.1 Vold-Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.1.1 Data equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.1.2 Structural equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.1.3 Least squares problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.1.4 Order extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.2 VKF operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.2.1 Solving the linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8.2.2 Bandwidth and roll-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.2.3 Time-varying bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.2.4 Multi-order tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8.2.5 Example 12: Helicopter signal seperation . . . . . . . . . . . . . . . . . . . . . . 98

8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Appendices 101

A C-code 103

A.1 hdtft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

B MATLAB functions 104

B.1 window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

B.2 interpolant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

B.3 timewarp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

B.4 warpedtimbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

B.5 vkf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Bibliography 118

Index 120

8

Nomenclature

Conventions

The following conventions are used throughout the report:

– Lower-case symbols followed by round brackets, e.g. y(t) denote continuous signals functions,

t ∈ R.

– Lower-case symbols followed by square brackets, e.g. y[n] denote discrete signals or indexed

functions, n ∈ N.

– Bold-face lower-case symbols can denote either discrete vectors, e.g. y =[y[0], y[1], . . . , y[n]

]or continuous vectors, e.g. c(t) =

[c1(t), c2(t), . . . , cK(t)

]– Capital symbols denote scalars, e.g. N or Tb. Bold-face capitals represent arrays, e.g. E.

– Scalars n and m are zero-based indexing integers of a discrete-time signal, e.g. y[n]. If not

specified, n = 0, . . . , N−1 and t[n] = nfs.

– i represents the imaginary unit, defined by i2 = −1.

– The decibel dB is defined as 2010log(a).

Acronyms

CTFT Continuous-time Fourier Transform

DFT Discrete Fourier Transform

DTFT Discrete-time Fourier Transform

FChT Fan Chirp Transform

FT Fourier Transform

IF Instantaneous Frequency

IT Instantaneous Timbre

PME Precision and Microsystems Engineering

STFT Short-time Fourier Transform

STFChT Short-time Fan Chirp Transform

VKF Vold-Kalman filter

9

NOMENCLATURE

Symbols

Lower-case symbols

a Trigonometric Fourier series coefficient (cosine)

b Trigonometric Fourier series coefficient (sine) or block index

c Complex Fourier series coefficient

cn Timbre vector, normalised to amplitude and phase

cθ Timbre vector, normalised to phase

cn(t) Instantaneous timbre vector, normalised to amplitude and phase

cθ(t) Instantaneous timbre vector, normalised to phase

d Sample deviation

e Euler’s number

ei(·) Complex exponential expression

e Time-domain basis vector

e Frequency-domain basis vector

f frequency

f(t) Instantaneous frequency

fc Centre or mean frequency

fm Modulation frequency

fs Sample rate in Hertz

g Tonal component index

h Harmonic index

i Imaginary unit

k Component or harmonic partial index

l Signal block index

m Block signal sample index

n Signal sample index

nb Block centre sample index

p Power or function order

q Spline order

r Resampling ratio

s(f0) Pitch salience function

t Time

t[n] Time vector

tb Block centre time

Δt Sampling interval

w(t) Window function

w[n] Window vector

x(t) Sinusoidal wave function

y(t) Signal or function

y[n] Signal vector

yb[m] Block vector

10

Upper-case symbols

A Amplitude

B Number of signal blocks

E Array of time-domain basis vectors

E Array of frequency-domain basis vectors

G Number of tonal components

H Number of harmonics

I Unity matrix

K Number of components or harmonic partials

L Shift size

M Signal block size

N Signal size

Q Bit depth

T Signal length

Tb Signal block length

Tl Shift length

U Time-frequency product

Greek symbols

α Linear chirp rate

β(d) Spline basis function

δ Kronecker delta or Dirac pulse

ε Normalised error

η[n] Noise

θ Phase shift

μ Mean value

ρα(t) Warp normalisation function

σ Standard deviation or second central moment

ϕ(t) Phase function

φα(t) Warp function

ψα(t) Inverse warp function

ωs Sample rate in radians per second

Superscripts and embellishments

� Frequency domain representation

� Complex conjugate

�′ Time-derivative

� Time-warped

�n Phase and amplitude normalised

�θ Phase normalised

�(w) Windowed

11

Introduction

Research context

Spectral analysis is the field of analysis that characterises signals in terms of frequency content.

Whether it concerns a measurement of an accelerating car engine or an acoustic recording of a entire

symphony orchestra, a spectral representation can provide valuable information about the periodic

components present in the signal. Periodic components often correlate with clear deterministic

systems and are therefore in the very interest of engineers.

A spectral representation of a signal is obtained by means of a Fourier transformation. A major

disadvantage of the Fourier transform is that it tries to express signals in terms of constant

frequencies. This will not be a problem for stationary signals, e.g. a car engine running at constant

speed or a decaying piano chord. Non-stationary signals however exhibit frequencies that change

over time and will appear “blurry” in a spectral representation.

Time-frequency analysis extends spectral analysis and represents a signal in both time and frequency

domain. The most popular technique is the short-time Fourier transform, that analyses shorter blocks

of a signal. The general idea is that shorter blocks are “quasi-stationary”, or at least more stationary

than the entire signal. By choosing a proper block size and thereby time/frequency resolution, one

can often obtain a reasonably sharp spectrum. However, the simultaneous accuracy of temporal and

spectral resolution is always subject to the time-frequency uncertainty principle.

Research goal

At the start of my thesis project, I was confronted with some highly non-stationary signals from

dynamic measurements. It appeared to be very challenging, if not impossible, to identify and

characterise the tonal components to a reasonable degree of accuracy. As classical Fourier time-

frequency analysis appeared to be inadequate for this purpose, we decided to take time-frequency

analysis a step further. An explorative approach was chosen that pursues the following research

objective:

Improvements on time-frequency analysis and identification of non-stationary signals.

This thesis lays out the basics of present time-frequency analysis and the findings towards potential

improvements.

13

INTRODUCTION

Personal contributions

In order to extend the current state of time-frequency analysis, the following developments are

proposed:

– The technique of linear and non-linear time-warping as an independent operation.

– The fan chirp transform applied to signals from experimental dynamics, as a way to improve

the spectral resolution.

– Timbre analysis as a means of characterising a tonal signal on the basis of the harmonic

amplitudes and phases relative to the fundamental.

– The formulation of the cosine-sigma window as an intermediate between the Hanning and

Gaussian window.

– Pitch tracking based on timbre-following.

Thesis outline

The research approach is explorative and not focussed on solving a single problem. It was therefore

chosen to present the theory in a textbook-like format. The thesis consists of 8 chapters, subsequently

divided into four parts:

– Part I discusses the fundamental concepts of the time domain (chapter 1) and frequency domain

(chapter 2). All analysis is performed on complete signals. The main topics are signal modelling,

Fourier transformations, windowing and the uncertainty principle.

– Part II introduces the “advanced” concepts. Time-warping is discussed in chapter 3, together

with the required interpolation techniques. Timbre analysis is introduced in chapter 4.

– Part III is dedicated to short-time spectral analysis techniques. Chapter 5 discusses the short-

time Fourier transform. Chapter 6 extends this concept to the fan chirp transform.

– Part IV discusses pitch tracking techniques in chapter 7 and Vold-Kalman order filtering in

chapter 8.

All chapters are illustrated by examples and concluded with a summary.

All calculations carried out in this thesis were performed by a collection of Matlab R© functions,

specially written for efficient time-frequency analysis. Only a few functions are included in appendix

A and B. The complete toolbox including the code for the examples is found on a CD-ROM.

14

Part I

Basic Concepts

Chapter1

Time-Domain Concepts

This chapter introduces some basic concepts related to signals and signal processing in the time-

domain. First, the discretisation of continuous signals into digital signals is discussed. In section 1.2,

the concepts of periodicity and harmonicity are considered. Section 1.3 concludes with a discussion

of signal modelling.

A thorough discussion of the concepts is found in textbooks on signal processing, for instance [14, 13].

This chapter offers a brief recap of time-domain signal theory that is relevant for this thesis.

1.1 Signals

In a most general formulation, a time-domain signal can be any real-valued quantity that varies in

time. Mathematically, a signal may be written as y = f(t), where the quantity y and the time domain

t are not necessarily bounded. Due the complexity of most signals, the function f can rarely be

expressed as a simple closed form, but may in some cases be approximated.

1.1.1 Continuous-time signals

By nature, all signals we encounter in the real world are continuous. Nature treats everything with

infinite smoothness; it does not require any type of discretisation, nor limits accuracy. Therefore both

quantity and time have an uncountable domain of real values: y, t ∈ R. These signals are called

continuous-time signals and are often referred to as analogue, in contrast to digital.

1.1.2 Discrete-time signals

Discrete-time signals on the other hand have a discretised time domain, meaning that the signal is

sampled at a finite number (N ) of instances tn: t0, t1, . . . , tN−1. The sampling is usually performed

at a constant interval (i.e. tn+1 − tn = Δt), yielding the so-called sample rate fs = 1/Δt in Hz or

ωs = 2π/Δt in rad/s. The discrete-time signal may be represented as a vector y[n] with the signal

values denoted by y[0], y[1], . . . , y[N−1]. Note that these values can still be continuous; y[n] ∈ R.

After sampling, the signal values are merely known at the specified instances tn, which would imply

that everything that happened in between of two adjacent samples is lost. What this really means

for signals in the context of frequency content is discussed later on in chapter 2. An illustration of

discrete-time sampling is shown in figure 1.1.

17

1. TIME-DOMAIN CONCEPTS

time

val

ue

(a) Continuous-time signal

time

val

ue

(b) Discrete-time signal

Figure 1.1: Discrete-time signals use a finite number of values at a fixed sample rate.

1.1.3 Digital signals

Digital signals take both a discretised time and value domain. Digital devices like computers use

finite sets of values to approximate and store the values of every sample into a binary format. For this

purpose, the values are rounded or quantised to a fixed set of equally spaced values a1, a2, . . . , aM(see figure 1.2). The amountM of possible values depends on the chosen resolution. The resolution

is usually specified in terms of a bit depth or word-length Q, related to the total amount of values by

M = 2Q.

The bit depth is directly related to the dynamic range. Dynamic range is defined as the range

between the smallest and largest possible value in a set. For instance: compact disc audio is

formatted in 16-bit resolution which has M = 216 = 65536 values, providing a dynamic range of

20 · 10log(65536) = 96.33 ≈ 96 dB. As an approximation, it is often said that every bit increases the

dynamic range with 6 dB.

Typical classes of quantised values encountered in signal processing are 8, 16, 32 or 64-bit integers,

single precision (32-bit) or double precision (64-bit) floating point. The latter is used most often in

computational applications such as Matlab R© and provides a virtually unlimited dynamic range and

negligible quantisation errors.

time

val

ue

Figure 1.2: The values of digital signals are quantised to a fixed set of values.

18

1.2. PERIODIC SIGNALS

1.2 Periodic signals

A particular interest is in the signals that exhibit a certain amount of periodicity. Periodicity implies

that events occur repeatedly in time, with constant period T in seconds. That may be the extension

of an oscillating spring, the water level of the ocean due to tidal change or the sound pressure created

by the vibration of a guitar string. Periodic signals often origin from clear deterministic systems

and are therefore in the interest of engineers. Noise, in contrast, is typically generated by more

or less stochastic processes. Many signals encountered in practice exhibit both noise and periodic

components.

(a) Periodic signal (b) Noise

Figure 1.3: Periodic signals have repetitive content with period T . Noise is fully stochastic.

1.2.1 Periodicity

Mathematically, a signal or function is said to be periodic if there is a period T that satisfies:

y(t) = y (t+ kT ) k ∈ N (1.1a)

or for discrete-time signals:

y[n] = y [n+ kT ] k ∈ N, T fs ∈ N (1.1b)

which shows already a periodicity problem for Tfs /∈ N, see section 2.4.2. However, equations (1.1a)

and (1.1b) are very strict definitions of periodicity and only hold for a few simple functions. In real

life, a signal consist of several components that only partly satisfy the definition.

1.2.2 Frequency and phase

Periodic components can be assigned by frequency, f = 1/T . As a convention, frequencies expressed

in Hertz take the symbol f , while angular frequencies in radians per second are notated with ω = 2πf .

After one period T , the phase ϕ(t) of the component has advanced 2π radians. The phase refers to the

instantaneous position of a component y at time t and indicates the fraction of the period in radians

that has been elapsed. For a sine wave, it is simply given by y(t) = sin(ϕ(t)

).

19


For stationary signals with constant frequency f , the phase continues to increase linearly with ϕ(t) =

2πft or ϕ(t) = ωt. The phase may be biased by a constant phase shift θ in radians that determines

the phase for t = 0:

ϕ(t) = 2πft+ θ (1.2)

The concept of frequency is generalised for non-stationary signals by defining the time-dependent

instantaneous frequency (IF) f(t) as the derivative of the phase ϕ(t) with respect to time:

f(t) =1

2π

dϕ(t)

dt(1.3)

Likewise, the phase of a component with time-varying frequency is found by

ϕ(t) = 2π

∫ t

0

f(t) dt+ θ (1.4a)

or in the discrete case

ϕ[n] = 2π

n∑n=1

f [n] Δt+ θ (1.4b)

although it is better to replace the latter summation by a proper numerical integration method. A

complete discussion of instantaneous frequency and phase is found in [3].

1.2.3 Periodic functions

The most basic but very important periodic functions are the sine and cosine functions (or sinusoids)

with constant frequency f :

x(t) = sin(ϕ(t)

)= sin

(2πft

)(1.5a)

x(t) = cos(ϕ(t)

)= cos

(2πft

)(1.5b)

These functions fully satisfy the definition of periodicity. A complex exponential equation is based on

Euler’s formula and describes a complex-valued1 wave x(t) with constant frequency f :

x(t) = ei 2πft = cos(2πft

)+ i sin

(2πft

)(1.6a)

Multiplying with the complex amplitude scalar c = a + bi and keeping only the real part, the wave

can be given an amplitude and phase shift:

x(t) = � (c ei 2πft) = a cos(2πft

)− b sin(2πft

)= A cos

(2πft+ θ

)(1.6b)

with A = ‖c‖ the absolute amplitude and θ = ∠c the phase shift in radians. An alternative notation

uses the identities cos(ϕ) = 12

(ei ϕ + e−i ϕ

)and sin(ϕ) = 1

2i

(ei ϕ − e−i ϕ

):

x(t) = c+ ei 2πft + c− e−i 2πft (1.6c)

If c− is the complex conjugate of c+, the resulting signal y(t) is real. If so, it follows that c+ = 12 c

and c− = 12c and consequently A = 2 ‖c+‖ = 2 ‖c−‖ and θ = ∠c+ = −∠c−.

Equations 1.6a — 1.6c lie in the very essence of the frequency-domain representation, as discussed in

chapter 2.

1Throughout this text, the quantity i is reserved for the imaginary unit, defined by i2 = −1.

20

1.3. SIGNAL MODELLING

1.2.4 Orthogonality of harmonic waves

An important property of the sinusoids and complex waves is the orthogonality between waves with

intersecting full periods. Consider the complex exponential wave x0(t) with base frequency f0 and

non-zero complex amplitude c0 given by

x0(t) = ei 2πf0t (1.7)

Also consider the kth harmonic wave with derived frequency fk = kf0 and complex amplitude ck:

sk(t) = ck ei 2πkf0t k ∈ N (1.8)

Then the projection of the wave xk(t) onto xl(t) over a full period T0 = 1/f0 writes:∫ T0

0

xk(t) · xl(t) dt =∫ T0

0

ck cl ei 2π (l−k) f0t = (ck cl) δkl k, l ∈ N (1.9)

using the Kronecker delta notation:

δkl =

{1 for k = l

0 for k �= l

The harmonic waves xk and xl for k �= l are thus orthogonal over a full period of the base frequency,

regardless of their amplitude and phase shift.

In discrete notation, vector sk is given by:

sk[n] = ck ei 2πk n

N n = 0, . . . , N−1 (1.10)

and sk and sl are orthogonal as well:

1

NsHk sl =

1

N

N−1∑n=0

sk[n]sl[n] = (ck cl) δkl k, l ∈ N (1.11)

where (·)H denotes the complex conjugate or Hermitian vector transpose.

1.3 Signal modelling

The sinusoid wave by itself is the most pure periodic signal and is (in the audible frequency range)

perceived by the human ear as a tone with a most “mellow” quality. In reality, such signals are rarely

seen, except for the test signal on some audio devices or the swing of a pendulum clock. Still, in a

good approximation, periodic signals can be brought down to a combination of many sinusoids:

y(t) =

K∑k=1

Ak(t) cos(ϕk(t)

)0 ≤ t < T (1.12)

This observation is actually the basis for the Fourier series as introduced in section 2.2.

In addition to periodic components, signals may also contain a certain amount of uncorrelated

noise. The following sections discuss means to model an arbitrary signal comprising both periodic

components and noise.

21


+ =

Figure 1.4: A signal can be modelled as a combination of periodic components and noise.

1.3.1 Sinusoids plus noise model

Let us consider an arbitrary signal y(t) for 0 ≤ t < T . Then the signal can be modelled as a

summation of a deterministic periodic part consisting of K sinusoids characterised by Ak(t) and

ϕk(t), plus a stochastic part of uncorrelated noise η(t):

y(t) =

K∑k=1

Ak(t) cos(ϕk(t))︸︷︷︸deterministic

+ η(t)︸︷︷︸stochastic

0 ≤ t < T (1.13)

This way of representing a signal is often referred to as the sinusoids plus noise model [17]. In case

of a stationary signal ys(t), frequencies and amplitudes remain constant and ys(t) can be written as:

ys(t) =

K∑k=1

Ak cos(2π fk t+ θk

)+ η(t) 0 ≤ t < T (1.14)

An example is shown in figure 1.4.

1.3.2 Tonal components plus noise model

If some frequencies f(1)h ⊆ fk can be related to a fundamental frequency f

(1)0 by f

(1)h = hf

(1)0 ,

then the waves for h = 1, . . . , H are considered to be harmonic partials of a tonal component with

fundamental frequency f(1)0 . The stationary tonal component y(1)(t) is then modelled as:

y(1)s (t) =

H∑h=1

A(1)h cos

(2π hf

(1)0 t+ θ

(1)h

)0 ≤ t < T (1.15)

An non-stationary tonal component may be described by:

y(1)(t) =

H∑h=1

A(1)h (t) cos

(ϕ(1)h (t)

)0 ≤ t < T (1.16)

Defining the fundamental phase shift θ(1)0

Δ= 0, it follows that ϕ

(1)0 (0) = 0 and the phase functions of

the partials are given by:

ϕ(1)h (t) = hϕ

(1)0 (t) + θ

(1)h 0 ≤ t < T (1.17)

This concept will be used extensively in chapter 4.

22

1.4. SUMMARY

A monophonic signal y(t) comprises only one tonal component y(1)(t) in addition to noise. A

polyphonic signal contains multiple tonal components y(g)(t) with different fundamental phase

functions ϕ(g)0 (t). The model for an instationary signal comprising g = 1, . . . , G tonal components

plus noise finally writes:

y(t) =G∑

g=1

y(g)(t) + η(t) 0 ≤ t < T (1.18)

This model is referred to as the tonal components plus noise model.

1.4 Summary

Time-domain signals are formulated in either the continuous-time or the discrete-time domain. The

value of the continuous-time signal is known at every instance in time. The values of a discrete-time

signal are solely known at a finite number of instances in time and can be obtained by sampling. A

digital signal additionally requires the values to be quantised to a finite set of values.

Signals usually comprise both periodic components and noise. Periodic components can be assigned

by frequency, which is the inverse of the period. A stationary component has a constant frequency

and consequently a linearly increasing phase. For a non-stationary component, the instantaneous

frequency is found as the derivative of the phase with respect to time. Essential periodic functions

are the sine and cosine wave or sinusoids, that may also be formulated using complex exponential

notation. The sinusoid exhibits orthogonality for waves with intersecting full periods.

In a good approximation, any time-domain signal may be considered as a deterministic part consisting

of sinusoids plus a stochastic part of uncorrelated noise. If some sinusoids can be related to a

fundamental frequency, the sinusoids are considered to be harmonic partials of a tonal component. A

monophonic signal comprises only one tonal component, while a polyphonic signal contains multiple

tonal components.

23

Chapter2

Frequency-Domain Concepts

For many analysis purposes, a time-domain representation of a signal does not offer enough

information. More insight is gained from its frequency domain representation or frequency spectrum,

which can be obtained by means of a Fourier transformation.

This chapter explores the fundamentals of the frequency domain and Fourier analysis. First, the

difference between the time and frequency domain is discussed and illustrated by a basis vector

transformation. In section 2.2, the Fourier series and Fourier transformations are introduced. The

discrete Fourier transform and its properties are discussed in section 2.4. Section 2.5 addresses the

theory and application of windowing. The chapter concludes with a study of the uncertainty principle

in section 2.6 that brings up the fundamental trade-off between time and frequency localisation.

2.1 Time domain versus frequency domain

The frequency domain is the domain in which a function or signal is expressed in terms of frequency

content rather than time content. Formally, a frequency domain representation shows the spectral

distribution of a signal, whereas a time-domain representation shows its temporal distribution.

time

freq

uen

cy

(a) Time domain

time

freq

uen

cy

(b) Frequency domain

Figure 2.1: The difference between the time and frequency domain representation of a signal. Thetime domain offers perfect time localisation but no frequency localisation, while the frequency domainoffers excellent frequency localisation but lacks time localisation.

25

2. FREQUENCY-DOMAIN CONCEPTS

The concept is illustrated in figure 2.1 with time on the horizontal axis and frequency on the vertical

axis. Let us assume that all time and frequency content of a signal is contained within the square.

Then the time-domain only offers temporal localisation, while the frequency-domain merely provides

spectral localisation. Nevertheless, both domains can represent exactly the same signal as long as

some conditions are satisfied, as will be discussed in this chapter.

The frequency-domain representation or Fourier transformed of a signal y(t) is denoted by y(f), with

f the frequency in Hertz. The transformation y(t) ⇒ y(f) is called a Fourier transform (FT) and is

discussed in section 2.2. For discrete signals, the time-domain sequence y[n] can be transformed to a

frequency-domain spectrum y[k] of equal length by means of a discrete Fourier transform (DFT), as

discussed in section 2.4. First, the difference between the two domains is illustrated by a basis vector

transformation.

2.1.1 Example 1: Basis vector transformation

Let us consider the simple time-domain sequence y[n] as shown in figure 2.2 with onlyN = 8 points.

In a time-domain representation, the 8× 1 vector y[n] holds the 8 values for the 8 different instances

n = 0, 2, . . . , 7:

y =[1 2 0 −1 −1.5 −0.5 1.5 1

]TAll 8 values are independent of each other: there exists no other combination of these 8 entries in

y that represent the sample signal. Mathematically speaking, the space of the R8 time-domain can

exactly be spanned byK = 8 orthogonal basis vectors ek that form the basis vector matrix E:

E =

⎡⎢⎢⎢⎣e0e1...

e7

⎤⎥⎥⎥⎦ =

⎡⎢⎢⎢⎣1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0...

......

......

......

...

0 0 0 0 0 0 0 1

⎤⎥⎥⎥⎦ = I

Since E equals the unity matrix, the vectors ek are also orthonormal and it simply follows that

y = Ey. It is observed that the vectors of E are perfectly independent in terms of time localisation,

but do not offer any information about the frequency content.

1 2 3 4 5 6 7 8−2

−1

0

1

2

n

y

Figure 2.2: A simple time-domain sequence of 8 samples.

In the frequency-domain representation, the same sequence y is expressed in another set of 8 basis

vectors ek, k = 0, . . . , 7. The vectors correspond to the 8 orthogonal complex exponential waves

26

2.1. TIME DOMAIN VS. FREQUENCY DOMAIN

0 1 2 3 4 5 6 7 8−1

0

1

e 0

0 1 2 3 4 5 6 7 8−1

0

1

e 1

0 1 2 3 4 5 6 7 8−1

0

1

e 2

0 1 2 3 4 5 6 7 8−1

0

1

e 30 1 2 3 4 5 6 7 8

−1

0

1

e 4

0 1 2 3 4 5 6 7 8−1

0

1

e 5

0 1 2 3 4 5 6 7 8−1

0

1

e 6

0 1 2 3 4 5 6 7 8−1

0

1

e 7

Figure 2.3: The orthogonal basis vectors of the 8-point frequency domain. The real part is colouredblue, the imaginary part is red. The arrows indicate the conjugate pairs: waves with similarfrequencies but opposing imaginary part.

with full periods (see equation 1.10), counting from k = 0 to 7:

ek[n] = ei 2π knN k, n = 0, 1, . . . , 7 N = 8

The vector values and corresponding waves are shown in figure 2.3 with the real part in blue and the

imaginary part in red.

The first vector e0 corresponds to a constant level, also referred to as the DC component. Vector 1 to

4 contain respectively 1 to 4 full periods. Note that vector 0 and 4 are real-valued.

For vector 5 to 7, the number of periods are expected to be 5 to 7, but their vector values only show

3 to 1 periods. This is in accordance with the Nyquist-Shannon sampling theorem that states that a

sampled signal can only contain frequency content up to half the sampling frequency. As a result,

waves with frequencies > 12fs will appear as aliased waves with lower frequency and an opposing

complex part, which yields in case of this example:

e5 = e3

e6 = e2

e7 = e1

The conjugate pairs are indicated by arrows in figure 2.3. In fact, the aliased vectors correspond to

the negative frequencies of the spectrum, as will be discussed in section 2.4.

Getting back to the basis vectors, a square matrix E can be constructed from the 8 vectors ek , similar

to the time domain procedure. Just as the basis vectors ek , the vectors ek span an orthogonal space

(see equation (1.11)). In contrast toE, E contains vectors that are independent in terms of frequency

27


0 1 2 3 4 5 6 70

0.5

1

n

YFigure 2.4: Frequency-domain representation y of the sequence y of figure 2.2.

but are evenly spread over time:

E =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 1 1 1 1 1 1 1

1√2(12 + 1

2 i)

i√2(− 1

2 + 12 i) −1

√2(− 1

2 − 12 i) −i √

2(12 − 1

2 i)

1 i −1 −i 1 i −1 −i1

√2(− 1

2 + 12 i) −i √

2(12 + 1

2 i) −1

√2(12 − 1

2 i)

i√2(− 1

2 − 12 i)

1 −1 1 −1 1 −1 1 −1

1√2(− 1

2 − 12 i)

i√2(12 − 1

2 i) −1

√2(12 + 1

2 i) −i √

2(− 1

2 + 12 i)

61 −i −1 i 1 −i −1 i

1√2(12 − 1

2 i) −i √

2(− 1

2 − 12 i) −1

√2(− 1

2 + 12 i)

i√2(12 + 1

2 i)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

While EHE = I , it appears that EHE = NI with N = 8, meaning that the vectors are orthogonal

but not orthonormal. Nevertheless, E can be regarded as an orthogonal transformation matrix,

that can be used to transform the representation in the frequency-domain by the basis ek to the

representation in the time-domain by the basis ek:

y =1√N

E yn (2.1a)

yn =1√N

EHy (2.1b)

Vector yn denotes the normalised Fourier transformed of y. However, a more common transforma-

tion writes:

y = E y (2.1c)

y =1

NE

Hy (2.1d)

As such, vector y corresponds to the amplitudes of the complex basis waves as present in the signal

y[n].

The values for y are found:

y =[0.31 0.71 −0.25 −0.09 −0.06 −0.09 −0.25 0.71

]T+[

0.00 0.14 −0.19 −0.23 0.00 0.23 0.19 −0.14]Ti

28

2.2. FOURIER SERIES

It can be observed that y[0] is real and y[1], y[2], y[3] form complex conjugate pairs with y[7], y[6], y[5].

The absolute values |y| are shown in figure 2.4.

It is verified that yH y = 1/N yHy and yHn yn = yHy, which shows that energy is conserved

throughout the transformation. This property is known as Parseval’s identity (see section 2.4.3).

The obtained vector y is exactly the Fourier transformed of y, as will be discussed in the following

sections.

2.2 Fourier series

A decomposition of a periodic signal into its harmonic sinusoidal partials is called a Fourier series.

The concept is named after Joseph Fourier (1768 — 1830), a French mathematician who discovered

that stationary periodic signals can be expressed as a superposition of sinusoids:

y(t) =K∑

k=1

Ak cos(fkt+ θk

) −∞ < t <∞ (2.2)

In theory, a Fourier series can describe any periodic signal exactly as long as an infinite number of

partials is allowed.

2.2.1 Trigonometric series

Following the definition of equation (1.6b), the harmonic partials k = 1, . . . ,K of a periodic signal

y(t) with fundamental frequency f0 can be found in terms of ak and bk by:

ak =2

T0

∫ T0

0

y(t) cos(2πf0t

)dt (2.3a)

bk =2

T0

∫ T0

0

y(t) sin(2πf0t

)dt (2.3b)

The DC offset a0 is determined by:

a0 =1

T0

∫ T0

0

y(t) dt (2.3c)

The integration is performed over 0 ≤ t < T0, although any full period can be used. The

trigonometric series is easy to interpret, but mathematically inferior to its complex equivalent.

2.2.2 Complex exponential series

The complex exponential Fourier series provides a mathematically more elegant alternative to the

trigonometric series. Considering a signal y(t) with fundamental frequency f0, the complex partials

ck are found by:

ck =1

T0

∫ T0

0

y(t) e−i 2πkf0t dt k = 0, ±1, ±2, . . . , ±K (2.4)

The DC component is also obtained by (2.4): the exponential term is 1 for k = 0. Since complex

exponentials come in pairs as in (1.6c), both positive and negative frequencies should be addressed:

29


k = 0,±1, ±2, . . . , ±K . For real-valued signals it is found that c−k = ck, hence the negative

components do not need to be computed separately. The complex exponential partials are related to

the trigonometric partials by:

ck =

⎧⎪⎪⎨⎪⎪⎩a0 for k = 012 (ak − ibk) for k = 1, 2, , . . . ,K12 (ak + ibk) for k = −1,−2, . . . ,−K

This follows from the Euler identities of section 1.2.3. For discrete signals, time is replaced by t = nΔt

and the Fourier series reads:

ck =1

N0

N0−1∑n=0

y[n] e−i 2πkf0nΔt k = 0,±1, ±2, . . . , ±K (2.5)

N0 corresponds to the required number of samples for a complete period: N0 = T0fs, rounded to an

integer value1. Note that by the sampling theorem, waves with frequency kf0 >12fs become aliased

waves, which should be avoided:

K < 12

fsf0

The inverse Fourier series combines the complex partials to a stationary periodic function similar to

equation (2.2):

y(t) =

K∑k=−K

ck ei 2πkf0t −∞ < t <∞ (2.6a)

If the partials are complex conjugate pairs, the following summation is equivalent:

y(t) = c0 + 2

K∑k=1

� (ck ei 2πkf0t) −∞ < t <∞ (2.6b)

The Fourier series are illustrated by the following example.

2.2.3 Example 2: Trumpet harmonics

As an illustration of Fourier series, a short fragment of a trumpet tone is analysed. The played note

is a B�4 which has a fundamental frequency of f0 = 466Hz and period T0 = 2.145ms. The signal

is sampled at fs = 16000Hz; 5 full periods are shown in figure 2.5. Clearly, the signal consists of a

fundamental component plus harmonics.

The first 8 partials ck of the signal are determined using equation (2.5) and listed in table 2.1. The DC

component for k = 0 is neglected. Since the signal is real, the negative partials c−k are simply the

complex conjugates of ck. The amplitudes read A = 2 ‖ck‖; the phase shifts θ = ∠ck are expressed

in degrees instead of radians. The tonal intervals of the partials are also listed.

The partials ck allow us to reconstruct the signal from its harmonic sinusoidal components using the

inverse Fourier series of equation (2.6a). Figure 2.6 shows the procedure for the first 5 partials, adding

a partial for every T0. It can be observed that the similarity between the assembled and original signal

increases with k.1If T0fs does not yield an integer number of samples, spectral leakage will occur (see section 2.4.2).

30

2.2. FOURIER SERIES

0 1 2 3 4 5 6 7 8 9 10−1

−0.5

0

0.5

1

Time [ms]

Am

plitu

de

Figure 2.5: A short fragment of a B�4 on a trumpet

0 1 2 3 4 5 6 7 8 9 10−1

−0.5

0

0.5

1

Time [ms]

Am

plitu

de

Figure 2.6: The signal cumulatively built up from the first 5 partials: every period adds one partial.

k ck A = 2 ‖ck‖ θ = ∠ck interval

1 0.160− 0.014i 0.321 −5.0 fundamental2 −0.006− 0.172i 0.344 −92.0 octave3 −0.119 + 0.049i 0.258 157.4 octave + fifth4 −0.034− 0.072i 0.158 −114.9 2 octaves5 0.003 + 0.032i 0.064 83.4 2 octaves + third6 0.005 + 0.043i 0.088 −97.2 2 octaves + fifth7 0.008 + 0.011i 0.027 125.5 2 octaves + seventh8 0.002 + 0.014i 0.029 −96.7 3 octaves

Table 2.1: Amplitudes and phase shifts of the first 8 harmonics of the trumpet signal

31


2.3 Fourier transform

The Fourier transform is a generalisation of the Fourier series. While the Fourier series is limited to

periodic signals and uses a discrete set of wave functions, the Fourier transform also applies to a large

class of non-periodic signals and represents spectral content in a continuous frequency-domain. Both

the continuous-time and discrete-time Fourier transform are discussed.

2.3.1 Continuous-time Fourier transform

The continuous-time Fourier transform (CTFT) or simply Fourier transform (FT) transforms a time-

domain function y(t) to the corresponding frequency-domain function y(f). The Fourier transform

does not require the signal to be periodic with some period T0 = 1/f0 and therefore also applies to

non-periodic signals. The most general formulation writes:

y(f) =

∫ ∞

−∞y(t) e−i 2πft dt −∞ < f <∞ (2.7a)

The inverse Fourier transform performs the transformation vice-versa:

y(t) =

∫ ∞

−∞y(f) ei 2πft df −∞ < t <∞ (2.7b)

Strictly, the Fourier transform in the ordinary sense only exists if y(t) is Lebesgue integrable, which

requires the signal to converge:∫ ∞

−∞|y(t)|dt <∞

However, the generalised Fourier transform provides ways to describe non-converging signals by

means of standard Fourier pairs, such as the Dirac pulse for a complex wave:

ei 2πf0t ↔ δ(f − f0) (2.8)

The Fourier transform is an important mathematical concept. However, since both time and frequency

appear unbounded in equation (2.7a), it is not the most applicable variant for signal processing.

2.3.2 Discrete-time Fourier transform

The discrete-time Fourier transform (DTFT) yields the same continuous frequency information as the

Fourier transform, but operates on finite discrete-time signals y[n] sampled at fs. The time instances

are t[n] = n/fs. The DTFT is given by:

y(f) =1

N

N−1∑n=0

y[n] e−i 2πf

nfs − 1

2fs < f < 12fs (2.9)

The frequency domain is limited to ± 12fs, since y[n] can only contain non-aliased frequency content

up to the Nyquist frequency. Note that only the frequencies (or rather: waves with frequency)

f = kfs/N , k ∈ Z are fully periodic over the sequence y[n]. Mathematically spoken, frequencies

with complete periods over lengthN are orthogonal. These are fk = kfs/N with−N/2 < k < N/2.

All frequencies in between project on multiple fk and are therefore “dependent”.

32

2.4. DISCRETE FOURIER TRANSFORM

Note that due to aliasing, the waves with negative frequencies are equivalent to waves with positive

frequencies above the Nyquist frequency: k = N/2, . . . , N−1. This was already seen in example 1.

Consequently, the set of frequencies k = 0, . . . , N − 1 provides all orthogonal waves within

− 12fs < f < 1

2fs.

Due to this orthogonality, theK = N frequencies fk span a sufficient space to describe y[n] spectrally.

The Discrete Fourier transform (DFT) uses these frequencies to build up a discrete set of orthogonal

basis functions and can therefore be regarded as a special case of the DTFT, as will be discussed in the

following section.

2.4 Discrete Fourier transform

The discrete Fourier transform (DFT) is an extremely powerful tool for analysis of discrete signals. It

transforms a finite time-domain sequence y[n] sampled by fs to N samples into a finite frequency-

domain spectrum y[k] with N frequency bins, and reads:

y[k] =1

N

N−1∑n=0

y[n] e−i 2π k nN k = 0, . . . , N−1 (2.10a)

The obtained values correspond to complex waves with frequency

fk =k

Nfs k = 0, . . . , N−1 (2.10b)

The DFT can be considered as a generalisation of the complex Fourier series (2.5) of a finite signal

with f0 = fs/N and only for non-negative frequencies k. The inverse discrete Fourier transform

(IDFT) performs the transformation vice-versa:

y[n] =

N−1∑k=0

y[k] ei 2π k nN n = 0, . . . , N−1 (2.10c)

The factor 1/N in (2.10a) is present for the reason that the formulation of (2.10a) is not a unitary

transformation. The transformation can be made unitary by appropriate scaling, as was also seen for

the basis vector transformation (2.1a) in example 1. The normalised DFT then reads:

yn[k] =1√N

N−1∑n=0

yn[n] e−i 2π k n

N k = 0, . . . , N−1 (2.11)

yn[n] =1√N

N−1∑k=0

yn[k] ei 2π k n

N n = 0, . . . , N−1 (2.12)

showing that the DFT and IDFT are equal except for a minus sign. The formulation of (2.10a) is used

more frequently since it is intuitively related to the amplitudes of the partials.

The following sections briefly discuss some properties of the DFT. A more thorough discussion can

be found in standard textbooks on Fourier analysis, for instance [14, 13].

33


2.4.1 Spectral symmetry

The DFT exhibits symmetry about k = N/2 as can be observed from y[N − k] and the fact that

e−i 2πn = 1 for n ∈ N:

y[N − k] =1

N

N−1∑n=0

y[n] e−i 2π (N−k) nN

=1

N

N−1∑n=0

y[n] ei 2π k nN e−i 2π n

=1

N

N−1∑n=0

y[n] ei 2π k nN = y[k] k = 1, . . . , N−1

(2.13)

If y[n] is real-valued, y[N − k] is the complex conjugate of y[k] and the pairs combine to the real-

valued waves characterised by:

Ak =∣∣y[k]∣∣+ ∣∣y[N − k]

∣∣ = 2∣∣y[k]∣∣ (2.14a)

θk = ∠ y[k] = −∠ y[N − k] (2.14b)

which is in accordance with Euler’s formula (1.6c). Therefore, only n = 0, . . . , N/2 need to be

computed for a complete spectral representation of y[n].

2.4.2 Periodic extension & spectral leakage

The DFT intrinsically assumes the sequence y[n] to be periodic with period T = N/fs. After the last

sample y[N − 1], y[N ] = y[0] is expected. If there is a large difference between the samples y[N − 1]

and y[0], this will be understood as a 0th order discontinuity, resulting in so-called spectral leakage.

Leakage occurs for every periodic component that has incomplete periods, as illustrated in figure 2.7,

or discontinuities of higher order [18]. Spectral leakage can be minimised by applying windowing, as

discussed in section 2.5.

0 0.5 1 1.5 2−1

−0.5

0

0.5

1

Time [s]

(a) No discontinuity for f = 1Hz, T = 1 s

0 0.5 1 1.5 2−1

−0.5

0

0.5

1

Time [s]

(b) Discontinuity for f = 1.5Hz, T = 1 s

Figure 2.7: The DFT assumes periodic extension, which may lead to discontinuities.

34

2.5. WINDOWING

2.4.3 Plancherel theorem & Parseval’s theorem

The Plancherel theorem applied to the DFT reads:

1

N

N−1∑n=0

x[n] y[n] =

N−1∑k=0

x[k] y[k] (2.15a)

For x[n] = y[n], it reduces to the well known theorem of Parseval :

1

N

N−1∑n=0

∣∣y[n]∣∣2 =

N−1∑k=0

∣∣y[k]∣∣2 (2.15b)

The latter equation is used to determine the power of a signal; the RMS power is defined as the square

root of (2.15b).

2.4.4 Fast Fourier transform

The fast Fourier transform (FFT) is a highly efficient implementation of the DFT, proposed by James

W. Cooley and John W. Tukey in 1965 [6]. It computes the transformation of (2.10a) in a much more

efficient way by breaking up y[n] in multiple DFTs of smaller size. This approach is also known

as the divide-and-conquer method and allows the sub-problems to be solved as parallel procedures.

Especially when N is a power of 2 or a highly composite number, an enormous speed increase is

achieved. The necessary amount of multiplications is in the order 2log(N)N/2 for FFT, while direct

evaluation of DFT would require N2 multiplications. For example: for N = 4096, the increase of

speed is:

MDFT

MFFT=

N2

2log(N)N/2≈ 682

For most signal processing applications, it is common to chooseN as a power of 2.

2.4.5 Example 3: Trumpet DFT

Let us again consider the trumpet signal from example 2. The signal is sampled at fs = 16000Hz and

has a fundamental period T0 = 2.145ms. As a comparison, two DFTs are plotted in figure 2.8:

– Blue: a DFT for N = 34 points, corresponding with one period.

– Red: a DFT for N = 172 points, corresponding with five periods.

Both DFTs exhibit a small amount of leakage as the frequencies fk do not entirely match the

harmonics of f0 = 466Hz. Both spectra are symmetric about f = 8000Hz. The partials of example 2

appear as peaks with the same amplitude as in table 2.1. For the 1-period DFT, all points correspond

to the harmonic partials of the trumpet. For the 5-period DFT, only 1 out of 5 points correspond to

the harmonics; the remaining points represent leakage or noise.

2.5 Windowing

An arbitrary finite-length signal usually has discontinuities between the first and the last point. Only

frequencies that intersect with the frequencies of equation (2.10b) can be fully periodic in the interval.

35


0 2000 4000 6000 8000 10000 12000 14000 160000

0.05

0.1

0.15

0.2

Frequency [Hz]

Am

plitu

de

Figure 2.8: DFT of the first period of the trumpet signal

All other frequencies have incomplete periods which leads to discontinuities. The discontinuities

cause the DFT to come up with non-zero values at frequencies other than the principal frequencies

present in the signal. These spurious leakage components can sometimes be severe enough to mask

components from smaller signals.

A more mathematical explanation was formulated by Harris [12, page 173]:

From the continuum of possible frequencies, only those which coincide with the basis will

project onto a single basis vector; all other frequencies will exhibit non-zero projections

on the entire basis set. This is often referred to as spectral leakage and is the result

of processing finite-duration records. [. . . ] An intuitive approach to leakage is the

understanding that signals with frequencies other than those of the basis set are not

periodic in the observation window. The periodic extension of a signal not commensurate

with its natural period exhibits discontinuities at the boundaries of the observation. The

discontinuities are responsible for spectral contributions (or leakage) over the entire basis

set.

Windowing is a popular technique to reduce the effect of leakage by reducing the discontinuities at

the boundaries of the signal. This is achieved by multiplying the signal with a smooth amplitude

envelope that reaches zero (or almost zero) at the boundaries.

There are many different window functions; the choice of the window depends on the purpose.

The Matlab R© function in appendix B.1 implements 15 different window functions. The following

sections only discuss a few important window functions and ways to quantify their properties. For a

complete study on windowing, refer to [12].

2.5.1 Window application

Let y[n] be an N point signal and w[n] the window function of equal length. Then the windowed

signal is obtained by element-wise multiplication:

y(w)[n] = w[n]y[n] n = 0, . . . , N−1 (2.16)

36

2.5. WINDOWING

The time instances t[n] are:

t[n] =n−N/2

fs= n/fs − T/2 n = 0, . . . , N−1 (2.17)

Hence, the window length is T = N/fs, the time domain −T/2 ≤ t < T/2 and the window is

symmetric about t[N/2] = 0 s. The DFT of the window is denoted by w[k].

In general, windows have their maximum value at n = N/2 and reduce smoothly to zero towards the

boundaries. The obtained spectrum after windowing y(w)[k] is predictable, since it can be seen as a

convolution of the DFT of y[n] and w[n].

Since one is especially interested in the response of the window to incomplete waves, k should not be

limited to integers (k ∈ N). Instead, the Fourier transform of w[n] is considered as a function of the

continuous frequency f ∈ R in Hertz:

w(f) =1

N

N−1∑n=0

w[n] e−i 2πf nN − 1

2fs < f < 12fs (2.18)

Equation (2.18) represents the discrete-time Fourier transform (DTFT) of w[n] for the domain of

feasible frequencies (see section 2.3.2). The frequency f corresponds to the number of frequency

bins relative to DC, as will be illustrated below.

2.5.2 Window properties

The window properties are introduced using the rectangular window as an example. Figure 2.9 shows

(from left to right) the window w[n] on linear scale, w(f) on linear scale and w(f) on logarithmic

(dB) scale. The performance indicators of the window will be discussed separately.

−0.5 −0.25 0 0.25 0.50

0.5

1

1.5

2

Time [s]

Time domain − linear

−10 −5 0 5 100

0.2

0.4

0.6

0.8

1

Frequency [bins]

Frequency domain − linear

−10 −5 0 5 10−160

−120

−80

−40

0

Frequency [bins]

Frequency domain − dB

Figure 2.9: Time and frequency domain representation of the rectangular window.

Coherent gain

Let us have a look at the frequency spectrum of figure 2.9. As already mentioned, the spectrum

of a windowed signal can be seen as the spectrum of the signal convolved with the spectrum of the

window. Let the signal be composed from broad-band noise plus a single sinusoidal component. Then

37


the coherent gain (CG) is defined as the DC component of the window (f = 0) and determines the

gain of the sinusoid at its true frequency in the spectrum:

CG =1

N

∑N

w[n] (2.19)

For the rectangular window, CG = 1. However, for most windows CG < 1, meaning that the

amplitude of the principal component is reduced. Often, window functions are normalised to a

processing gain of 0 dB to make sure that the spectral amplitudes correspond to the true amplitudes,

apart from the contributions of the noise.

Equivalent noise power & bandwidth

Unfortunately, the amplitude determined by the coherent gain is biased by the neighbouring

frequency content that is accumulated according to the response of the filter for f �= 0. The total noise

power is defined as the integral of the square of the frequency response over the complete frequency

domain −fs/2 < f < fs/2:

NP =

∫ fs/2

−fs/2

|w(f)|2df =1

N

∑N

|w[n]|2 (2.20)

where Parseval’s theorem is used for the latter expression. The equivalent noise bandwidth (ENBW) is

a measure for the width of a hypothetical rectangular “filter” with coherent power CG2, that would

accumulate the same amount of noise power. In other words, it represents the width of a rectangle of

height CG2 that has the same squared area as the area under |w(f)|2. It is indicated by a dashed box

in the centre plot in figure 2.9. Using the definitions of CG and NP, it is given by:

ENBW =1

N

NP

CG2 (2.21)

The rectangular window has an ENBW of 1. For most windows however, ENBW > 1. Consequently,

the ENBW quantifies the reduction of the achievable spectral resolution compared to the rectangular

window.

Main lobe width & -6dB bandwidth

The main lobe width (MLW) is the width of the centre lobe between the first points where w(f) = 0.

It is a measure for the sharpness or spectral resolution of the DFT: lower values correlate with sharp

spectra while higher values produce more blurry spectra, making it difficult to distinguish closely

spaced frequencies. The -6dB bandwidth (BW) is a similar measure but corresponds the the bandwidth

between the points where w(f) = 0.5 = −6 dB. Note that both bandwidths are implications of the

window itself and are not related to leakage due to incomplete periods in the signal.

Side lobe level & roll-off rate

The side lobe level (SLL) and the side lobe roll-off (SLR) quantify the amount of leakage. The side lobe

level is the maximum level of the contributions of frequencies that are not part of the main lobe and

should therefore be minimised. The side lobe roll-off is a measure for the asymptotic rate of side lobe

level decrease per frequency bin, usually specified in dB per octave. It is a direct result of the order of

38

2.5. WINDOWING

discontinuity on the boundaries:

0th order 1/f −6 dB/oct

1st order 1/f2 −12 dB/oct

2nd order 1/f3 −18 dB/oct

pth order 1/fp+1 −6(p+ 1) dB/oct

The above relation is a result of the differentiation property of the Fourier transform: every additional

order of continuity adds factor 1/f to the roll-off rate [18].

2.5.3 Rectangular window

The rectangular or Dirichlet window (figure 2.9) is the most trivial window, as it is often explained as

applying no windowing at all:

w[n] = 1 n = 0, . . . , N−1 (2.22)

The DTFT of the rectangular window is given in closed form by:

w(f) =cos(πf)

πf= sinc(f) (2.23)

The coherent gain and equivalent noise bandwidth are both 1. The main lobe width is 2 bins: except

for f = 0, all integer frequency bins yield zero amplitude. The -6dB bandwidth is only 1.21 bins. The

side lobe level is −13.3 dB and the roll-off rate is of course −6 dB/oct, since the window exhibits a 0th

order discontinuity.

The rectangular window has the lowest possible ENBW, MLW and BW, meaning that it is able to

produce a very sharp DFT. However, due to the high side lobe level and slow roll-off rate, the window

suffers from severe leakage.

2.5.4 Hanning window

The Hanning window (or Hann, named after the Austrian meteorologist Julius von Hann) is perhaps

the most frequently applied window since it offers excellent leakage suppression and has a very

predictable response. The window is given by:

w[n] = 12 − 1

2 cos(2π

n

N

)= cos2

(π( nN

− 12

))n = 0, . . . , N−1 (2.24)

The window is shown in figure 2.10. The red point represent the window values at the DFT points,

i.e. integer values of f . Clearly, the CG is 0.5 due to the first term in equation (2.24). The cosine term

appears as two points with amplitude 14 at f = ±1. The ENBW is 1.5, meaning that the spectrum

is 1.5 times less sharp than the rectangular window. The SLL is −31.5 dB and the SLR is −18 dB/oct

since both the window value and its first derivative are continuous at the boundaries. This comes at

the cost of a higher bandwidth: the MLW is 4 bins and the BW is 2 bins.

The Hanning window offers much better leakage suppression then the rectangular window. In

addition, it has perfect temporal coverage when adjacent windows are observed with 12T spacing

in time, as will be discussed in chapter 5.

39


−0.5 −0.25 0 0.25 0.50

0.2

0.4

0.6

0.8

1

Time [s]


−10 −5 0 5 100

0.1

0.2

0.3

0.4

0.5

Frequency [bins]


−10 −5 0 5 10−160

−120

−80

−40

0

Frequency [bins]


Figure 2.10: The Hanning or Hann window.

−0.5 −0.25 0 0.25 0.50

0.2

0.4

0.6

0.8

1

Time [s]


−10 −5 0 5 100

0.1

0.2

0.3

0.4

0.5

Frequency [bins]


−20 −10 0 10 20−160

−120

−80

−40

0

Frequency [bins]


Figure 2.11: The Gaussian window for σ = 0.2 (blue), σ = 0.1 (green) and σ = 0.05 (red).

2.5.5 Gaussian window

The Gaussian window implements the Gaussian function with standard deviation σ:

wσ [n] = e− 1

2

⎛⎝ t[n]σ

⎞⎠

2

n = 0, . . . , N−1 (2.25)

Time t[n] is defined by (2.17). Due to the absence of the normalisation term 1/(σ√2π), the function

is not the normalised Gaussian or normal distribution with unity area. Instead, it has a peak value

w[N/2] = 1, just like the other windows. The Gaussian function is the only function known in closed

form that transforms to itself:

wσ(f) =(σ√2π)e−1

2

⎛⎝2πf

1/σ

⎞⎠

2

− 12fs < f < 1

2fs (2.26)

The proof is not trivial [1, equation 7.4.6, page 302]. The Gaussian window can thus be “tuned” by the

time-domain standard deviation σt, while the frequency-domain σf follows according to:

σf =1

2πσ(2.27)

Figure 2.11 shows the Gaussian window for σ = 0.2 (blue), σ = 0.1 (green) and σ = 0.05 (red). It can

be verified that both the time-domain and frequency-domain window have the shape of a Gaussian

40

2.5. WINDOWING

function. Note that the axes of the logarithmic frequency domain plot extend to ±20 bins.

The Gaussian function only reaches zero at infinity. Therefore the transformation of (2.26) only holds

for an time window of infinite length. In equation (2.25), the function is truncated at T/2. The

standard deviation σt determines the amount of discontinuity on the boundaries. Looking at the

logarithmic frequency domain plot, it is observed that w(f) is perfectly quadratic near the centre,

as was expected from equation 2.26. From a certain level, the response starts to show side lobes.

It is observed that the side lobe level increases with an increasing discontinuity of the time-domain

window w[0], although there is no explicit relation.

Since w(f) is logarithmic quadratic near the centre, frequency estimates can be obtained by quadratic

interpolation. Also, the Gaussian window minimises the time-frequency product, as will be discussed

in the following section.

2.5.6 Cosine and cosine-sigma window

Recall that the Hanning window can be written as a cosine function to the power of 2 (equation

(2.24)). The Hanning window is therefore part of the family of cosα windows. The family for non-

negative α can be characterised by a MLW of exactly 2+α bins and a SLR of−6(2+α) dB/oct, since

every additional cosine wave adds one order of derivative continuity. Furthermore, the following

dependencies were found for α ∈ [0, 10]:

SLL = 13.3− 7.47α dB

BW =√1.45 + 1.35α dB

The window for α = 0 is obviously the rectangular window. It is observed that for increasing α, the

window tends to converge to a Gaussian window. This observation is justified by the mathematical

limit that shows the convergence of a power cosine function to an exponential function:

limN⇒∞

cos

(t

N

)N2

= e−12 t

2

(2.28)

In contrast to the Gaussian window, the boundaries of a cosine window are always zero, meaning

that the window has a much higher roll-off rate. The cosine-sigma window is therefore suggested,

combining the properties of the cosα window and the Gaussian window:

wσ,α[n] =

⎧⎪⎨⎪⎩cos

(t[n]

σ√α

)α

− 12πσ

√α < t[n] < 1

2πσ√α

0 elsewhere

(2.29)

The parameter σ determines the theoretical Gaussian standard deviation for the case α = ∞. The

parameter α controls the exponent of the cosine function and thereby the validity of equation (2.28).

Note that the window for α = 2 and σ = 1/(π√2)= 0.225 is exactly the Hanning window.

Figure 2.12 shows a Gaussian window for σ = 0.1 (blue) and the cosine-sigma window with σ = 0.1

and α = 16 (green). On linear scale, the windows appear almost the same. On logarithmic scale, it

can be observed that the cosine-sigma window is slightly more concentrated. The measured standard

deviation is 0.097 which is close to the expected σ = 0.1. It is verified that the cosine-sigma window

converges to the Gaussian window for α >> 100.

41


−0.5 −0.25 0 0.25 0.50

0.2

0.4

0.6

0.8

1

Time [s]


−10 −5 0 5 100

0.1

0.2

0.3

0.4

Frequency [bins]


−20 −10 0 10 20−160

−120

−80

−40

0

Frequency [bins]


Figure 2.12: The Gaussian window for σ = 0.1 (blue) and the cosine-sigma window for σ = 0.1 andα = 16 (green). The cosine-sigma window has a better roll-off rate than the Gaussian window.

2.5.7 Example 4: Windowing of a simple signal

To illustrate the difference between window functions, a simple signal is considered:

y[n] = cos(2πf1t[n]

)+ cos

(2πf2t[n]

)+ 0.01 cos

(2πf3t[n]

)+ 0.01η[n] n = 0, . . . , N−1 (2.30)

The time instances are t[n] = n/fs with fs = 100Hz. The length of the sequence is N = 100, which

corresponds to T = 1 s.

The first periodic component at f1 = 10Hz is fully periodic in the window. The second component

at f2 = 25.5Hz exhibits a discontinuity and will cause leakage. A third component at f3 = 40Hz is

periodic in the window but has a 100× smaller amplitude. Furthermore, the signal is corrupted by

white noise represented by a random vector 0.01η[n] ∈ [−0.01, 0.01].

The signal is multiplied with four different window functions and the DFT is computed. The

amplitudes of the single-sided spectra are shown in figure 2.13. The four window functions are:

1. Rectangular window. The 10Hz component is represented by a single peak. The 25Hz com-

ponent however causes a severe amount of leakage, completely masking the third component

40Hz. Clearly, the rectangular window is a bad choice for signals with a so-called high dynamic

range.

2. Hanning window. The Hanning window is able to reveal all periodic components, although it

may be difficult to exactly determine the frequencies from the spectrum. The amplitudes of the

peaks are a factor 0.5 = 6 dB lower than the true amplitudes. The remaining 6 dB is spread

over the neighbouring frequency bins. Also, the window offers enough leakage suppression to

reveal the noise floor at approximately −80 dB.

3. Gaussian window with σ = 0.2. This windows yields similar results as the Hanning window.

All periodic components appear as peaks with a quadratic shape, regardless of being periodic

in the window or not.

42

2.6. UNCERTAINTY PRINCIPLE

0 10 20 30 40 50−120

−100

−80

−60

−40

−20

0

Frequency [Hz]

(a) Rectangular window

0 10 20 30 40 50−120

−100

−80

−60

−40

−20

0

Frequency [Hz]

(b) Hanning window

0 10 20 30 40 50−120

−100

−80

−60

−40

−20

0

Frequency [Hz]

(c) Gaussian window σ = 0.2

0 10 20 30 40 50−120

−100

−80

−60

−40

−20

0

Frequency [Hz]

(d) Gaussian window σ = 0.1

Figure 2.13: Four different window functions applied to a simple signal.

4. Gaussian window with σ = 0.1. The window offers a very poor frequency resolution, but

excellent leakage and noise suppression. If the aim was to estimate the spectral location of

the periodic components, this window may still be a good choice since the frequencies can be

estimated accurately using quadratic interpolation.

2.6 Time-frequency uncertainty principle

Throughout the chapter, it has become clear that temporal accuracy is inversely related with spectral

accuracy. Recall from example 1 that the time-domain basis vectors have excellent time localisation

but give no frequency information, while the Fourier basis vectors offer perfect frequency localisation

without time information. Frequency localisation means the ability to clearly identify periodic

components that are concentrated at particular frequencies [16].

In effect, by applying a non-rectangular window to a signal, one centres the observation at t = 0

and accepts that events “further away” from this centre are attenuated more than events close to the

43


centre. Thereby, windowing introduces a certain amount of temporal localisation.

To conclude the chapter, this fundamental trade-off is formalised as the time-frequency uncertainty

principle.

2.6.1 Temporal and spectral localisation

Let us once again consider a window w[n] of length N with time t[n] given by (2.17). Using the

definition for the total noise power (2.20), the temporal and spectral centres can be found by:

μt =1

NP

N−1∑n=0

t[n]∣∣w[n]∣∣2 (2.31a)

μf =1

NP

∫ fs/2

−fs/2

f |w(f)|2 df (2.31b)

Next, define the second central moments of the temporal and spectral distribution:

σ2t =

1

NP

N−1∑n=0

(t[n]− μt)2 ∣∣w[n]∣∣2 (2.32a)

σ2f =

1

NP

∫ fs/2

−fs/2

(f − μf )2 |w(f)|2 df (2.32b)

Note that σ2t and σ

2f are the standard deviations of the quadratic values of respectively w[n] and w(f).

If the window is well-localised in time, then the signal is concentrated at time instance μt and σ2t is

small. Similarly, if the window is well-localised in frequency, then the spectrum is centred around μf

and σ2f is small.

It can be verified that σ2t and σ2

f are invariant under time and frequency shift, by the definition of the

temporal and spectral centres (2.31a) and (2.31b). The relation for time-scaling by factor a writes:

σ2t

(at[n]

)=

1

|a|σ2t

(t[n])

a ∈ R (2.33a)

σ2f

(at[n]

)= |a|σ2

f

(t[n])

a ∈ R (2.33b)

This means that a decrease of the window length in time increases the spectral width proportionally.

2.6.2 Time-frequency product

The dimensionless time-frequency product is given by:

U = σtσω = 2π σtσf (2.34)

Obviously, U is invariant under time and frequency shifts. By equation (2.33a) and (2.33b), the product

is also invariant under time scaling. The product can therefore be interpreted as a measure of how

well the window is localised in both time and frequency: a low value of U correlates with a good

localisation.

44

2.7. SUMMARY

2.6.3 Uncertainty principle

The time-frequency uncertainty principle states that no window or waveform can have a time-

frequency product less than 12 , frequency being expressed in radians:

U = 2π σtσf >12 (2.35)

The proof follows from the Cauchy-Schwartz inequality. It is related to Heisenberg’s uncertainty

principle for quantum physics, that states that one can not at the same time determine the position

and momentum of a particle to an arbitrary degree of accuracy. Similarly for time-frequency analysis,

the simultaneous accuracy of time and frequency localisation is limited by (2.35).

2.6.4 Example 5: Time-frequency product of four windows

The time-frequency products of the four windows of example 4 are determined. As stated above, the

lower limit of U is 0.5. The windows are shown in figure 2.9, 2.10 and 2.11.

1. Rectangular window. The second central moment of the window is exactly ( 112 )

3/2 = 0.29 s.

The second central moment of the frequency spectrum is 8.38Hz. The time-frequency product

is 15.200, which is awfully high compared to the lower limit.

2. Hanning window. σt = 0.18 s and σf = 0.57Hz. The time-frequency product is 0.513, which

is close to the minimum.

3. Gaussian window with σt = 0.2. The second central moments are σt = 0.14 s and σf =

0.83Hz. The time-frequency product is 0.742. Although the window is a Gaussian, the lower

limit of 0.5 is not reached since the Gaussian is truncated, which causes a large discontinuity.

4. Gaussian window with σt = 0.1. The second central moments are σt = 0.07 s and

σf = 1.13Hz. The time-frequency product is 0.500. The window achieves the lower limit

for uncertainty. Compared to the Gaussian window with σt = 0.2, the truncation causes a

negligible discontinuity.

Concluding, the Hanning window appears to be a good all-round window. For specific purposes, the

Gaussian window or cosine-sigma may be preferred, especially if one needs control over temporal

and spectral localisation.

2.7 Summary

A frequency-domain representation provides spectral information of a signal or function, in contrast

to the time-domain representation that only provides temporal information. The study of signals in

terms of frequency content is called a Fourier analysis. The fundamental concept is the observation

that periodic signals can be decomposed into harmonic sinusoids, the so-called Fourier series.

The continuous-time and discrete-time Fourier transform generalise this concept to non-periodic

signals and provide a continuous frequency domain. The discrete Fourier transform only examines

the frequencies that have complete periods over the signal length. As these waves are orthogonal, the

DFT functions as a linear transformation.

45


The relationship between the four Fourier transformations and their domains is depicted schemat-

ically in figure 2.14. The continuous / discrete time domain is distinguished horizontally; the

continuous / discrete frequency domain vertically. Note that the Fourier series only fits in this scheme

if the windows length T equals the fundamental period T0. In that case, the complex series ck are

directly related to y[k].

y(t) y[n]

y(f)

y[k]FS

R ⇒ Z

DFT

N ⇒ N

CTFT

R ⇒ R

DTFT

N ⇒ R

Figure 2.14: The four Fourier transformations and their domains.

Finite-length signals usually exhibit waves that are not fully periodic over the signal length. This

causes discontinuities on the boundaries, resulting in spectral leakage. Leakage can be reduced by

applying a window function. Generally, a window function suppresses the spectral leakage, at the

cost of a reduced spectral resolution. The window function should be chosen with care, depending on

the analysis purpose.

For every finite-length signal, one can determine the localisation accuracy in time and frequency.

By choosing the window function and length, both properties can be controlled. However, the

simultaneous temporal and spectral accuracy is limited by the time-frequency uncertainty principle.

The Gaussian window is the only window that potentially reaches the minimum time-frequency

product. The trade-off between time and frequency accuracy is a fundamental issue for time-

frequency analysis.

46

Part II

Advanced concepts

Chapter3

Time Warping

Time warping is a technique that applies stretching and contraction of a signal in the time domain. It

involves a modification using a non-linear time function that realises frequency scaling. The concept

of a non-linear time may sound odd, but has some well-known equivalents in real life:

– A disk-jockey manually controlling the speed of a vinyl disc on a turn-table

– The observed frequency shift of the siren signal of a passing emergency vehicle (Doppler effect)

– The vibrato or chorus effect of a guitar, realised by a series of electronic delay circuits

In all cases, frequency scaling occurs due to a change in the way time is conceived. This conceived

time or warped time is denoted by t and can be expressed as t = φα(t) or t = ψα(t), some non-linear

functions of t. The time-warped signal then reads y(t) = y(t).

Note that the term frequency scaling is used rather than frequency modulation or shift. Frequency

modulation is achieved by multiplication with a complex wave. If this wave has frequency fm, the

frequency content of the modulated signal is shifted entirely with −fm or +fm, depending on the

sign in the exponent. This modulation is the basis of the Fourier transformations, as discussed in

chapter 2. In contrast, time warping scales all frequencies by a constant rate, keeping the ratios

between harmonics intact.

The concept of time warping is often applied for analysis of non-stationary chirp signals1, i.e. signals

with increasing or decreasing frequency. It will be shown that time warping can be used effectively

as pre-processing operation for analysis of a more general class of non-stationary signals. It thereby

concentrates the energy of the partials on centre frequencies while preserving the harmonic structure.

3.1 Linear time warping

As an introduction on the time warping topic, the following sections discuss the theory of linear

time warping in the continuous time domain. The formulation of the linear time warping function is

adopted from [21] where it was mentioned as part of the fan Chirp transform (see chapter 6). This

section discusses linear time warping as an independent operation.

1The term chirp is adopted from the sound of chirping birds.

49

3. TIME WARPING

3.1.1 Warp function

Following the notation of [21] and [5], the linear time warping function2 is defined as:

φα(t) = (1 + 12αt)t (3.1)

and the time derivative:

φ′α(t) = 1 + αt (3.2)

Note that the time warp function is a quadratic function of t for α �= 0, and simply the linear time

function for α = 0.

3.1.2 Chirp wave

Let us consider a linear chirp wave y(t) with mean frequency fc, subject to the warping function of

equation (3.1). The instantaneous frequency is defined as:

f(t) = (1 + αt)fc = φ′α(t)fc (3.3a)

and the wave with constant amplitude A reads:

y(t) = A cos

(2π

∫(1 + αt)fc dt

)= A cos

(2πfc φα(t)

) (3.3b)

fc is the instantaneous frequency at t = 0. Note that f(−1/α) = 0, meaning that the instantaneous

frequency has a focal point at t = −1/α, regardless of the value of fc. Beyond the focal point,

frequencies become negative, which in this case implies reversal of time. Since this is not the aim of

warping, the time domain is limited to −1/α < t < 1/α. In this interval, fc is the mean frequency.

3.1.3 Chirp rate

The chirp rate α is defined as the frequency increase relative to the mean frequency fc and reads:

α =f ′(t)fc

(3.4)

The derivative of (3.2) is only valid for a constant chirp rate α. For linear chirps, f ′(t) is constant sothe chirp rate α is constant as well, which is easily verified from equation (3.3a):

α =(φ′α(t) fc)

′

fc= φ′′α(t) = α

For higher order chirp waves, the time-derivative writes:

α = φ′′α(t) = α+O(t) (3.5)

indicating that the chirp rate is not constant. For linear time warping, the higher order terms must be

neglected. Non-linear time warping is discussed in section 3.2.

2Note that the warp function denoted by φα(t) is not related to the phase function ϕ(t), although the Greek letter (“phi”)may appear the same.

50

3.1. LINEAR TIME WARPING

3.1.4 Inverse warp function

Linear chirp signals that satisfy f(t) = φ′α(t)fc can be transformed back to stationary signals by

applying inverse time-warping. In order to perform inverse time-warping, an inverse expression for

φα(t) must be found: t(φα) = φ−1α (t). This is not trivial since φα(t) is a quadratic expression that

may have two solutions. For the time domain −1/α < t < 1/α, it is verified that:

0 < φ′α(t) < 2 ∀α (3.6)

Since φ′α(t) has no sign change, φ−1α (t) is a strictly monotonic increasing function. Consequently, the

inverse has only one solution:⎧⎨⎩φ

−1α (t) = − 1

α+

√1 + 2αt

αfor α �= 0

φ−1α (t) = t for α = 0

(3.7)

From here on, the inverse warp function is denoted by the symbol ψ(t) = φ−1α (t).

Note that√1 + 2αt becomes imaginary for t < −1/2α. Also, the time derivative or warped-time

rate reads:

ψ′α(t) =

1√1 + 2αt

(3.8)

The warped-time rate reaches infinity at t = −1/2α, implying that the warped time would have to

run infinitely fast and then becomes imaginary. The applicable time domain is therefore reduced to:

−1/2α < t < 1/2α (3.9)

which is referred to as the time support property of the warp function [21]. Figure 3.1 shows the

warp function (green) and inverse warp function (red) for α = 0.5 and −1 < t < 1, the maximum

allowable warp rate. It can be observed that the derivative of the inverse warp function reaches

infinity at t = −1 = −1/2α.

3.1.5 Inverse time warping

A time-warped version y(t) of the linear chirp signal y(t) can now be obtained using the inverse warp

function:

y(t) = y(ψα(t)) = A cos(2πfc φα

(ψα(t)

))(3.10)

Development of the interior time-dependent term yields:

φα(ψα(t)

)= − 1

α+

√1 + 2αt

α+

1

2α

(1

α2+

1 + 2αt

α2− 2

√1 + 2αt

α2

)= t

(3.11)

which shows that the the linear chirp signal y(t) after time warping has become a stationary signal

with constant frequency fc:

y(t) = A cos(2πfc t) − 1

2α< t <

1

2α(3.12)

51

3. TIME WARPING

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

Time

War

ped

time

Warped time φα(t)

Inverse warped time ψα(t)

Figure 3.1: The warp and inverse warp functions.

It can be concluded that any linear chirp signal can be transformed into a constant-frequency signal

by applying linear time-warping, as long as the time support −1/2α < t < 1/2α is satisfied. This is

illustrated by the following example.

3.1.6 Example 6: Linear warp

An example is shown in figure 3.2 for both the time and frequency domain. The chirp signal is

composed from a cosine wave plus a sine wave with half the amplitude and twice the frequency:

f(t) = 4 + t

ϕ(t) = 2π (4t+ 12 t

2)

y(t) =1

2cos(ϕ(t)

)+

1

4sin(2ϕ(t)

)The mean frequency is fc = 4Hz; the chirp rate is α = 0.25. The time interval of interest is

t = (−1, 1), which lies safely in the supported time-domain. The warped time interval is found from

(3.7):

t = ψα(t) = (−4 + 2√2,−4 + 2

√6) = (−1.1716, 0.8990)

The original signal for t is shown in green. The warped signal y(t) = y(t) is shown in red.

Clearly, y now has a constant frequency and will appear as two distinct pulses in a Fourier

representation (provided that leakage is absent by proper choosing of window length, see section

2.5). This will be illustrated in example 5.

52

3.2. NON-LINEAR TIME WARPING

−4 −3 −2 −1 0 1 2 3 4−1

−0.5

0

0.5

1

Am

plitu

de

Time domain

−4 −3 −2 −1 0 1 2 3 40

5

10

15

Time [s]

Fre

quen

cy [H

z]

Frequency domain

Figure 3.2: A warped signal in time and frequency domain.

3.2 Non-linear time warping

The previous discussion merely applied to linear time warping functions with constant chirp rate α.

It was seen that the inverse function of φα(t) only exists is φα(t) is strictly monotonically increasing,

i.e. φ′α(t) > 0. However, that does not limit the applicability to linear warp functions.

Consider the instantaneous frequency f(t) > 0 on the time domain −T < t < T . Let us define

fc = f(0). According to (3.3a), the time-derivative of the warp function writes:

φ′(t) =f(t)

fc(3.13)

such that the non-linear warp function can be found by integration:

φ(t) =

∫φ′(t) dt (3.14)

A “quick and dirty” inverse warp function can be found by time-integration of the inverse of the warp

rate (3.13):

ψ(t) ≈∫

1

φ′(t)dt =

∫f(0)

f(t)dt (3.15)

The basic idea is that the inverse warp rate ψ′(t) for some point t should be roughly 1/φ′(t) to stretchout frequency modulations in f(t). An approximation of the inverse warp function is then found by

time-integration, which can be performed either numerically or symbolically. Note that it may be

necessary to bias ψ(t) by some value, to make sure that ψ(0) = 0.

Equation (3.15) applied to the linear chirp function yields:

ψ(t) ≈∫

1

1 + αtdt =

log(1 + αt)

α

For small chirp rates, (3.15) yields a good approximation of the exact inverse warp function (3.7).

53

3. TIME WARPING

3.3 Discrete implementation & interpolation

The example above describes an analytical signal in a rather ideal situation, since every time

point ψα(t) is available from the function. As discrete signals are sampled at a finite number of

instances t[n], n = 1, . . . , N , the instances ψα(t[n]) do generally not correspond to existing samples.

Hence, non-uniform interpolation is required to obtain y for the instances t[n]. The quality of

the interpolation process is crucial: a badly performed interpolation increases the spectral leakage

dramatically. The issue of interpolation is addressed in the following sections.

3.3.1 Interpolation approaches

Two interpolation approaches are distinguished:

1. Upsampling — cheap interpolation

2. Expensive interpolation

Method 1 first samples the signal up by a factor r to increase the number of time points. Next, a

numerically cheap interpolation algorithm is employed to obtain the samples for t[n]. Cheap methods

include nearest neighbour (0th order) and linear (1st order) interpolation, or polynomial methods of

low degree e.g. cubic spline interpolation. Since resampling can be performed quickly, this method

does not necessarily take more time than the second approach. Reference [5] suggests upsampling by

a factor 2 and linear interpolation.

Method 2 skips the resampling step and performs the interpolation right away. To obtain similar

or better results than the first approach, a numerically more expensive interpolation method should

be employed. Theoretically, a sinc interpolation achieves the best result: according to the Nyquist-

Shannon sampling theorem, a band-limited signal can be revealed exactly from its samples by:

y(t) =

N∑n=1

y[n] sinc(fs(t− t[n])

)(3.16)

The sinc function is defined as:

sinc x =cos(πx)

πx

For every point t, the complete sequence is evaluated and weighted by the sinc function. This

implementation is computationally far to expensive and thus seldom used. Luckily, similar results

can be obtained using higher order spline interpolation [15].

3.3.2 Spline interpolation

The B-spline or basis spline interpolation method generalises a large class of interpolation techniques.

The interpolation algorithm for order q writes:

y(t) =

N∑n=1

c[n]βq(d[n])

(3.17a)

d[n] = fs(t− t[n]) (3.17b)

54

3.3. DISCRETE IMPLEMENTATION & INTERPOLATION

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Deviation

Wei

ghtin

g

0

1

2

3

4

5

6

7

8

Figure 3.3: Spline basis functions obtained by convolution of the 0th order function

d[n] is the sample deviation of t to every point t[n], for instance: a value of 0.5 corresponds to a time

instance exactly in the middle of two existing time points in t[n]. βq(d) is the basis function of order

q as a function of the deviation d. The basis functions for 0th to 8th order are shown in figure 3.3. They

are the result of repetitive convolution of the 0th order basis function:

βq(d) =

∫ ∞

−∞βq−1(δ)β0(d− δ) dδ (3.18a)

with

β0(d) =

⎧⎨⎩1 for |d| ≤ 0.5

0 for |d| > 0.5(3.18b)

The coefficients c[n] are found by solving the linear system of N equations:

y[n] =

N∑n=1

c[m]βq(d[n])

m = 1, . . . , N (3.19)

Looking at the basis functions, it can be observed that the width of function βq is limited to q, which

is a direct result of the convolution. Therefore both (3.17a) and (3.19) can be performed by sparse

linear arithmetic [15].

It may be observed that spline interpolation for order q = 0, 1, 2, 3 is equivalent to respectively

nearest neighbour, linear, quadratic and cubic interpolation.

3.3.3 Interpolation performance

To test the accuracy of the interpolation methods, 4 different test signals yref[n], −2 < t[n] < 2 are

considered at a sample rate of fs = 1024Hz:

55

3. TIME WARPING

1 2 4 8 16 32

10−12

10−9

10−6

10−3

100

Resampling ratio (r) or interpolation order (q)

Rel

ativ

e er

ror

(a) Cosine wave with f0 = 10Hz

1 2 4 8 16 32

10−12

10−9

10−6

10−3

100


Rel

ativ

e er

ror

(b) Cosine wave with f0 = 100Hz

1 2 4 8 16 3210

−3

10−2

10−1

100


Rel

ativ

e er

ror

(c) Block wave with f0 = 10Hz

1 2 4 8 16 32

10−9

10−6

10−3

100


Rel

ativ

e er

ror

(d) White noise band-limited to fs/4 = 256Hz

Figure 3.4: Interpolation errors. Blue, green and red correspond to respectively nearest neighbour,linear and spline interpolation. The x-axis represents resampling ratio r for the nearest neighbourand linear interpolation methods and interpolation order q for the spline interpolation.

1. Cosine wave with f0 = 10Hz

2. Cosine wave with f0 = 100Hz

3. Block wave with f0 = 10Hz

4. White noise band-limited to fs/4 = 256Hz

All signals are first warped using equation (3.1) to yw1[n] and then warped back to yw2[n] according

to (3.7). The chirp rate is chosen α = 0.25 for all signals. For evaluation of yw2[n], the time domain

−1 ≤ t < 1 s is considered. The error is determined as the norm of the difference between the original

signal yref[n] and warped signal yw2[n]:

εt =

∥∥∥yw2[n]− yref[n]∥∥∥∥∥∥yref[n]∥∥∥

The latter is obtained from the DFT of the time signals.

The results are shown in figure 3.4. Blue, green and red correspond to respectively nearest neighbour,

linear and spline interpolation. For the nearest neighbour and linear interpolation methods, the x-axis

indicates the ratio of resampling r. For the spline interpolation, the x-axis corresponds to the order

56

3.3. DISCRETE IMPLEMENTATION & INTERPOLATION

of the spline algorithm q .

Some observations:

– The nearest neighbour interpolation performs worst, followed by the linear interpolation. The

spline interpolation with q > 3 outperforms both methods. However, the spline method takes

considerably more time to compute, since it first needs to solve the system of equation (3.19)

and then computes the desired points by (3.17a).

– The nearest neighbour interpolation errors decrease with approximately 1/r. The linear

interpolation errors decrease in some cases with approximately 1/r2 and in some cases remain

equal.

– The spline interpolation errors for q = 1 equals the linear interpolation errors. This confirms

that the methods are the same for q = 1without resampling. The quadratic spline interpolation

for q = 2 appears to be a bad choice.

– It was observed that the interpolation methods that use resampling (nearest neighbour and

linear interpolation) suffer from a small time shift. This might explain the high error compared

to spline interpolation.

– All methods have difficulties with warping of the block wave. The block wave can be considered

a worst-case signal, as it comprises large discontinuities that are difficult to fit with any of the

interpolation methods. Also, since the frequency content of the block wave extends to fs/2,

aliasing occurs due to warping. Resampling or the use of higher order spline methods do not

yield any improvements.

– The band-limited noise signal can be warped to reasonable accuracy. If the noise was not band-

limited, the results would be almost as bad as for the block wave.

Generally, it can be concluded that spline interpolation is a good choice for a wide range of signals. A

basic implementation of multi-order spline interpolation is included in the Curve Fitting ToolboxTM

of Matlab R© . In addition, some efficient implementations are available under the BSD licence on the

File Exchange of the Mathworks R© website.

A Matlab R© function ��was written, implementing all interpolation algorithms that were

discussed in this section. See appendix B.2 for the syntax and implementation.

3.3.4 Example 7: Linear warp DFT

To conclude this chapter, the DFT of the warped signal of example 6 is considered. The signal is

sampled at fs = 100Hz and is composed from a cosine wave with A = 0.5 and fc = 4, plus a sine

wave with A = 0.25 and fc = 8. The chirp rate is α = 0.25. The DFT of the original chirp signal

(green) and the warped signal (red) for −1 ≤ t < 1 are shown in figure 3.5, a Hanning window is

applied.

The DFT of the original signal is spread out over the frequency range, which makes it difficult to

determine the centre frequency. The DFT of the warped signal shows two distinct peaks (with side

lobes caused by the Hanning window) at 4 and 8Hz with the right magnitude.

57

3. TIME WARPING

0 2 4 6 8 10 12 14 16 18 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Frequency [Hz]

Am

plitu

de

Original signalWarped signal

Figure 3.5: DFT of original and warped signal.

3.4 Summary

Time warping can be applied to (pieces of) a non-stationary signal in the time domain. The aim is to

stretch out frequency variations by stretching and contraction of the signal. The ratio of frequency

variation is called the chirp ratio. Time warping can theoretically warp chirping signals into constant

frequency signals, as long as certain conditions are satisfied.

Time warping requires an inverse of the warp function. In case of linear warping, the inverse function

exists in closed-form. In case of non-linear warping however, it can be challenging to find an inverse

expression. If it does not exist in closed form, the inverse may be approximated either by linearisation

of the warp function, or by time-integration of the inverse warp rate.

A warped signal can be obtained from a discrete-time signal by non-uniform interpolation. One

approach first resamples the signal to a higher sample rate and then applies a cheap interpolation

algorithm, such as nearest neighbour or linear interpolation. A second approach applies a higher

order spline algorithm on the original signal. The second approach generally yields better results

than the first, although it is computationally a bit more demanding.

Time warping proves to be a useful technique in detecting non-stationary tonal components, since

it concentrates the energy of the partials on the centre frequencies while preserving the harmonic

structure.

58

Chapter4

Timbre

The term timbre is so often used, but also ambiguously defined. It is mostly used in a perceptual

context, saying something about the quality of tonal sound. For certain, it has nothing to do with

pitch or loudness. It may be related to the presence of the harmonic partials or the amount of vibrato

(amplitude and frequency modulation), but more often it is expressed in qualitative terms like bright,

harsh, nasal, sonorous, etc. For sure, timbre helps the perceiver in determining the source of the sound.

For example, a singer and a saxophone may play (sing) the same note at the same loudness, but due

to timbral differences, one is able to distinguish the different sources.

Many interesting psycho-acoustic studies focus on perceptive attributes of timbre, which is beyond

the scope of this thesis. Rather than qualitative descriptions, two quantitative properties can assist in

identification of tonal components in a signal: harmonic relative amplitude and relative phase.

A novel method is proposed that uses both properties to investigate the stationarity of the harmonic

partials. For this purpose, the timbre is defined as the amplitude and phase shift of the partials relative

to its fundamental partial. As such, the timbre may characterise a tonal component independently of

time and can thus be employed to identify and track tonal components in a signal.

This method and the concept of timbre is introduced in a step-by-step way in the following sections.

4.1 Definition of timbre

Let us consider a real-valued continuous-time signal y(t). The signal consists of a stationary tonal

component with fundamental frequency f0, built up fromK stationary partials xk(t):

xk(t) = ck eiϕk(t) k = ±1,±2, . . . ,±K (4.1)

The notation is analogue to the complex Fourier series, see section 2.2.2. For convenience and without

loss of generality, only the positive frequencies are considered, as ck and c−k are complex conjugate

pairs. The complex scalars ck determine both amplitude and phase shift of partial k:

Ak = ‖ck‖ (4.2a)

θk = ∠ck = tan−1 �(ck)�(ck) (4.2b)

59

4. TIMBRE

Since the signal is stationary, frequency and amplitude of the partials are constant. Hence, the phase

of partial xk(t) grows linearly:

ϕk(t) = 2π kf0 t+ θk k = 1, . . . ,K (4.3)

with θk the phase shift of partial k at t = 0.

4.1.1 Normalised amplitude and phase

Let us now define the normalised amplitude and phase shift as quantities relative to the fundamental

partial k = 1:

Ank =

Ak

A1(4.4a)

θnk = θk − k θ1 (4.4b)

The first equation is straight-forward. The latter is less obvious: it uses the fact that the phase of

partial xk(t) runs k times faster than partial x1(t), as seen in equation (1.17). Similarly, the normalised

phase function at time t is defined as:

ϕnk (t) = ϕk(t)− k ϕ1(t)

= 2π kf0 t+ θk − k (2π f0 t+ θ1)

= θk − k θ1

= θnk

(4.5)

which means that the normalised phase function has become the normalised phase shift: a time-

independent angle, relative to the phase shift of the fundamental. Consequently, θn1 = 0 and

ϕn1 (t) = 0 by definition.

Figure 4.1 illustrates the phase normalisation for three partials. The fundamental frequency is 1Hz.

The phase functions and normalised phase functions are given by:

ϕ1(t) = 110 + t ϕn1 (t) = 110 + t− (110 + t) = 0◦

ϕ2(t) = 180 + 2t ϕn1 (t) = 180 + 2t− 2(110 + t) = −40◦

ϕ3(t) = 30 + 3t ϕn1 (t) = 30 + 3t− 3(110 + t) = −300◦

It is observed that the normalised phase functions reduce to the constant normalised phase shifts:

ϕnk (t) = θnk .

4.1.2 Complex normalisation

The normalisation steps can be performed efficiently using the complex notation of (4.1). Multiplica-

tion of ck with complex scalar rp is equivalent to scaling with ‖rp‖ and rotation with p∠r. If r hasunit length, only the rotation is effective. This property can be used for normalisation of the partials:

cnk =ck‖c1‖

(c1

‖c1‖)−k

(4.6)

60

4.2. INSTANTANEOUS TIMBRE

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−360

−180

0

180

360

540

720

900

1080

1260

Time [s]

Pha

se [d

eg]

φ

1(t)

φ2(t)

φ3(t)

φ1n(t)

φ2n(t)

φ3n(t)

Figure 4.1: Three phase functions (continuous lines) together with their normalised phase functions(dashed lines).

The first term normalises the amplitude, the second term rotates with angle −k∠c1. Depending on

the purpose, one can choose to leave out the amplitude normalisation and only normalise the phases:

cθk = ck

(c1‖c1‖

)−k

(4.7)

4.1.3 Timbre vector

The normalised partials from equation (4.6) can be combined to a K-dimensional complex timbre

vector:

cn =

⎛⎜⎜⎜⎝cn1cn2...

cnK

⎞⎟⎟⎟⎠ or cθ =

⎛⎜⎜⎜⎝cθ1cθ2...

cθK

⎞⎟⎟⎟⎠ (4.8)

In the ideal case of stationary partials, this vector remains constant and uniquely defines the timbre

of the tonal component.

4.2 Instantaneous timbre

The definition of timbre (4.5) applies to harmonically stationary signals. In other words, (4.5) only

holds when the frequencies of the partials are kept at an exact ratio throughout the time interval. It

will not come as a surprise that this concerns a rather hypothetical situation.

Consider for example the harmonics of a vibrating piano string. If the dynamics of the string

were fully governed by linear dynamics, one would find the harmonics at exact multiples of the

fundamental frequency. Due to their mutual orthogonality, there would be no interaction between

61

4. TIMBRE

the harmonics, hence the timbre would remain stationary throughout a free vibration.

However, little dynamic systems behave perfectly linear. The string, for example, may exhibit a

certain amount of inharmonicity: a small discrepancy between the actual harmonic frequencies and

their ideally expected values. This can for instance be caused by bending stiffness or in-elasticity of

the material; effects that can not be fully described by linear dynamics. As a result, the timbre is no

longer exactly stationary.

Thus, for real-life signals, it can be interesting to see to what extend the timbre remains stationary

over time. This can be studied on the basis of the instantaneous timbre (IT). For an arbitrary non-

stationary signal, the instantaneous timbre at time t can be interpreted as a “cross-section” of the

analytical signal for a certain instant in time. That is: the instantaneous amplitude and phase per

partial, normalised to the fundamental partial.

4.2.1 Definition

Consider a non-stationary signal y(t) consisting of a tonal component with instantaneous fundamen-

tal frequency f0(t). Then the instantaneous amplitude and phase of partial k are determined by:

ck(t) = 21

T

∫ T/2

−T/2

w(τ) y(t + τ) e−i 2π kf0τdτ k = 1, . . . ,K (4.9)

Equation (4.9) is in fact a Fourier transform for a single frequency component fk = kf0, centred

around time instance t. Factor 2 compensates for the absence of the negative frequencies. T is the

length of the window. w(t) is a window function that is symmetric about τ = 0 (see section 2.5). The

complex exponential is the modulator that shifts all frequency content of y(t) by −kf0.

Using equation (4.6) and (4.8), the instantaneous timbre reads:

cn(t) =

⎛⎜⎜⎜⎝cn1 (t)

cn2 (t)...

cnK(t)

⎞⎟⎟⎟⎠ or cθ(t) =

⎛⎜⎜⎜⎝cθ1(t)

cθ2(t)...

cθK(t)

⎞⎟⎟⎟⎠ (4.10)

4.2.2 Discrete implementation

The discrete-time implementation of (4.9) uses the DTFT to compute the partials. Consider the signal

y[n] of lengthN with corresponding time vector t[n] = n/fs, n = 0, . . . , N−1. In order to obtain the

instantaneous timbre for a certain instant in time tb, a smaller block yb[m] of sizeM < N is analysed.

Block yb[m] has length Tb =M/fs and is obtained from y[n] by:⎧⎨⎩yb[m] = y[m−M/2 + nb] m = 0, . . . ,M−1

nb = tbfs nb ∈ n(4.11)

with nb the sample index corresponding to the time instance tb of interest.

62


0 0.5 1 1.5 20

0.05

0.1

0.15

0.2

Time [s]M

agni

tude

0 0.5 1 1.5 2−360

0

360

Time [s]

Pha

se [d

eg]

f1

f2

f3

f4

f5

f6

Figure 4.2: Timbre of a stationary trumpet note, normalised to the phase of the fundamental.

TheK partials for f0 at the particular point in time tb are then determined by:

ck(tb) = 21

M

M−1∑m=0

w[m] yb[m] e−i 2π kf0mfs k = 1, . . . ,K (4.12)

and the timbre vector cn(tb) or cθ(tb) is constructed using (4.10). Equation (4.12) can be evaluated

efficiently by a small C-code, that can be found in appendix A.1.

4.2.3 Example 8: Trumpet timbre

The timbre of a stationary trumpet note is determined from a fs = 16000Hz recording of a B�4 with

fundamental frequency f0 = 466Hz. Six partials are estimated using (4.12). A Gaussian window is

used with lengthM = 4000, equal to Tb = 0.25 s.

Figure 4.2 shows the timbre cθ(t) of the six partials. The normalised phase of the first partial is zero

throughout the interval, as expected from the definition. The other partials are quite stable throughout

the stationary part of the signal. From t = 1.8 s, the amplitudes drop and the phases change. This is

understandable, since the signal is no longer stationary.

4.2.4 Bandwidth considerations

Example 8 concerns a monophonic signal, i.e. a signal with only one tonal component. For polyphonic

signals or signals with disturbances, the bandwidth of the timbre estimation by equation (4.12) can be

an important issue, as will be discussed next.

63

4. TIMBRE

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

Time [s]M

agni

tude

0 0.5 1 1.5 2 2.5 3 3.5 4−180

−90

0

90

180

Time [s]

Pha

se [d

eg]

f1

f2

f3

f4

Figure 4.3: Timbre of a disturbed signal for M = fs/4. The bandwidth is to high to neglect thedisturbance.

Consider the following signal sampled at fs = 1024Hz with a tonal component at f0 = 10Hz, build

up from 4 partials:

y[n] =

4∑k=1

1

kcos

(2πkf0

n

fs

)n = 0, . . . , N−1

Hence, the frequencies present in the signal are 10, 20, 30, 40Hz. Let the signal be disturbed by an

enharmonic wave at fd = 22Hz:

yd[n] = y[n] + cos

(2πfd

n

fs

)n = 0, . . . , N−1

First, the timbre of the tonal component at f0 is estimated using a window size of M = 256,

Tb = 0.25 s and a Hanning window. The results are shown in figure 4.3; note that the amplitudes

are attenuated by factor 12 due the coherent gain of the Hanning window (see section 2.5.4).

Clearly, the disturbance has influence on the second partial. It follows from DFT theory that the

frequency spacing between two orthogonal waves is M/fs = 1/Tb = 4Hz. Hence, the spectral

resolution or bandwidth is 4Hz. In addition, the Hanning window has a −6 dB bandwidth of 2 bins,

thereby increasing the effective bandwidth to 8Hz.

In fact, the algorithm “feels” the two waves at 20Hz and 22Hz as one wave at 20Hz with some

amplitude modulation. From a continuous-time approximation, partial c2(t) reads:

c2(t) =

∫Tb

(12 cos(2π 20 t) + cos(2π 22 t)

)e−i 2π 20tdt = a+ b cos(2π 2 t)

The scalars a and b follow from the window characteristics. Indeed, the wave at 22Hz is enharmonic

to the fundamental, which explains the mismatch in the normalised phase (equation 4.5). The

64


0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

Time [s]M

agni

tude

0 0.5 1 1.5 2 2.5 3 3.5 4−180

−90

0

90

180

Time [s]

Pha

se [d

eg]

f1

f2

f3

f4

Figure 4.4: Timbre of a disturbed signal forM = fs. The bandwidth is small enough to suppress thedisturbance.

normalised phase seems to travel 720 degrees per second upwards, suggesting that the frequency

is +2Hz off. The amplitude modulation confirms this observation.

A complete suppression of the disturbance requires an effective bandwidth of 2 Hz. This is achieved

by increasing the block size toM = 1024 samples, or Tb = 1 s. The newly obtained timbre is shown in

figure 4.4. The disturbance is complete suppressed and the partials appear with the correct amplitude

and phase.

Summarizing: the disturbance of figure 4.3 was removed by increasing the window size and thereby

decreasing the effective bandwidth. In this perspective, the single-frequency Fourier transform of (4.9)

can be considered as a band-pass filter around f = kf0 with a certain effective bandwidth, realised

by the spectral resolution of the window 1/Tb and the -6dB bandwidth. Note that an increase of

the spectral resolution always compromises the temporal localisation, by the uncertainty principle

(section 2.6).

4.2.5 Example 9: Helicopter timbre

A 5 second microphone recording of a distant helicopter is analysed. The signal is sampled at

fs = 4096Hz. It is known that the main rotor has fundamental frequency f0 = 29.4Hz. The

timbre is first analysed using window size M = 512, Tb = 0.125 s. Again, the Hanning window is

applied, which causes the effective bandwidth to be 16Hz.

The result is shown in figure 4.5. The third partial, estimated at 3f0 = 88.2Hz exhibits amplitude

modulation, suggesting the presence of another wave. From the figure, the amplitude modulation is

approximated at 8.5Hz. Furthermore, the phase seems to be running downwards. This suggests the

65

4. TIMBRE

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.005

0.01

0.015

Time [s]

Mag

nitu

de

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−180

−90

0

90

180

Time [s]

Pha

se [d

eg]

f1

f2

f3

f4

Figure 4.5: Timbre of a helicopter for M = 512. The amplitude modulation on the third partialsuggests the presence of another wave with similar frequency.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.005

0.01

0.015

Time [s]

Mag

nitu

de

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−180

−90

0

90

180

Time [s]

Pha

se [d

eg]

f1

f2

f3

f4

Figure 4.6: Timbre of a helicopter forM = 1024. The disturbing source at 80Hz is suppressed.

presence of a disturbing wave at approximately 80Hz. This is verified from a DFT; the 80Hz wave is

in fact the fundamental frequency of the tail rotor.

To suppress the disturbing wave from the timbre, the effective bandwidth (including the 2 bins

window bandwidth) is decreased to 8Hz by increasing the window size to M = 1024, Tb = 0.25 s.

The new timbre plot is shown in figure 4.6. The amplitude modulation is suppressed and the phase

has become stable.

66

4.3. WARPED TIMBRE

4.3 Warped timbre

Let us get back to the discrete-time definition of instantaneous timbre (4.12). The equation accepts

blocks yb[m] of a time signal y[n] and returns the instantaneous timbre centred around time instances

tb. However, any block of correct sizeM may be inserted into the equation, including blocks obtained

by time-warping. Let y{t} denote a continuous-time interpolant function that was obtained by

interpolation of y[n]. Then the warped signal block can be found by:

yb[m] = y{t[m] + tb

}(4.13)

t[m] represents the warped local time vector centred around 0, as defined in section 3.3. tb is the

centred time instance of interest for block b.

The computation of the warped timbre is implemented in the Matlab R© function �� ,

which can be found in appendix B.4. The function returns the instantaneous timbre for a given set of

time instances, frequencies and linear chirp rates.

4.4 Summary

The timbre of a tonal component is in this work defined as the amplitude and phase of the harmonic

partials, relative to the fundamental partial. It is observed that the normalised phase function of

a partial reduces to a constant phase shift, as long as the partials remain exactly harmonic. The

timbre, formed by the normalised DTFT coefficients of the partials, characterises a tonal component

independently from fundamental frequency and total amplitude.

The assumption of exact harmonicity of the partials is grounded by theory from linear dynamics.

As most real-life dynamic systems are not perfectly linear (or even highly non-linear), it can be

interesting to observe the development of the timbre over time. The instantaneous timbre makes

use of the DTFT, which brings up time-frequency considerations as discussed in chapter 2. For a

good estimate of the timbre, one must make sure that the effective bandwidth (the result of spectral

resolution and window bandwidth) is small enough to suppress contributions of enharmonic waves.

The timbre may also be determined from time-warped blocks. An implementation of the required

time-warp, interpolation, windowing and DTFT operations is found in appendix B.4.

Timbre provides a useful and intuitive approach to representing a periodic signal, that bears a much

closer resemblance to the human perception of sound.

67

Part III

Short-time Spectral Analysis

Chapter5

Short-Time Fourier Transform

The Fourier analysis techniques discussed in the previous chapters were predominantly applied to

signals in their entirety. This methods is useful if the signal is short or if the signal is reasonably

stationary throughout the time-domain. In practice, most signals obtained from experiments are

non-stationary and can be minutes long. It would be impractical to analyse these signal as a whole.

Besides, one is often interested in how the signal changes over time.

Short-time analysis divides a longer signal into many shorter blocks, that usually have some overlap

in time. Every block is centred at a certain point in time and can be analysed using standard Fourier

techniques as discussed in chapter 2.

The short-time Fourier transform (STFT) (or formally: short-time discrete Fourier transform) is

perhaps the most popular and generally applicable method for short-time spectral analysis. This

chapter will discuss the basic aspects and time-frequency considerations.

5.1 Short-time blocks

Consider an arbitrary N -point signal y[n] sampled at fs. The time domain of the signal is 0 ≤ t < T

with total duration T = N/fs. It was already seen that a N -point DFT offers excellent frequency

information, but does not give any temporal information.

Therefore, signal y[n] is subdivided into B blocks yb[m]. Similar to n, m is the sample index of the

signal blocks: m = 0, . . . ,M − 1. The blocks are counted b = 1, . . . , B and have a smaller size

M < N , which corresponds to length Tb =M/fs.⎧⎨⎩yb[m] = y[m−M/2 + nb] m = 0, . . . ,M−1

nb = tbfs nb ∈ n(5.1)

The blocks are centred in time at the instances tb. The shift between two adjacent blocks, i.e. tb+1− tbis called the shift length Tl. Similarly, the shift size is L = Tlfs. To make sure that no samples of y[n]

are skipped, L < M and consequently Tl < Tb.

The first block b1 lies at t1 = 12Tb. The time instances for the other blocks b are given by:

tb =12Tb + (b− 1)Tl b = 1, . . . , B (5.2a)

71

5. SHORT-TIME FOURIER TRANSFORM

T

Tb

Tl

tb

yb[n]

y[n]

t = 0 t = T

Figure 5.1: The short-time Fourier transform divides a signal sequence into many shorter blocks ofequal length.

The block centre index is given by nb:

nb =12M + (b − 1)L b = 1, . . . , B (5.2b)

The short-time blocks are shown schematically in figure 5.1. The symbols involved in equations (5.1),

(5.2a) and (5.2b) are listed in table 5.1.

symbol domain name

y[n] R signal vectort[n] [0, T 〉 time vectorfs R sample raten 0, . . . , N−1 signal sample indexN N signal sizeT R signal length

yb[m] R block vectorm 0, . . . ,M−1 block sample indexM N < N block sizeTb R < T block length

b 1, . . . , B block numberB N total number of blockstb [0, T 〉 block centre timenb 0, . . . , N−1 block centre sample index

L N < M shift sizeTl R < Tb shift length

Table 5.1: Symbols used to define the short-time blocks.

5.1.1 Shift size & overlap

Figure 5.1 shows a certain amount of overlap between two blocks. The overlap is determined by the

block size and the shift size: M/L. Theoretically, the shift size can be anything from 1 sample to the

block size. A too low number for L results in a large number of blocks and consequently many DFT

72

5.2. SHORT-TIME DFT

computations. For instance, if L is much smaller than M , the DFTs of successive blocks will not be

much different. If however L is too large, some events in the signal may not be detected.

When a non-rectangular window is applied, an overlap of at least 2× is necessary to make sure every

sample of y[n] is contained in the spectrum. In particular for the Hanning window, an overlap of

exactly 2 ensures that every sample in y[n] is covered equally over the successive blocks.

5.2 Short-time DFT

The distribution of y[n] over short-time blocks yields B blocks yb[m] of size M . The DFTs of the

blocks can be obtained by the fast Fourier transform (section 2.4.4), that performs the following

transformation:

yb[k] =1

M

M−1∑m=0

y(w)b [m] e−i 2π k m

M k = 0, . . . ,M−1 b = 1, . . . , B (5.3)

The blocks y(w)b [m] are windowed by a window function w[m] according to (2.16).

After FFT computation, array yb[k] is obtained that consists of B ×M complex elements. B is the

number of DFT blocks for the time instances tb. M is the number of frequencies fk that follow from

(2.10b). The frequency resolution is determined by the block size and the sample rate:

Δf =M/fs (5.4)

A popular representation of yb[k] is the waterfall plot or spectrogram, which shows the amplitudes of

the single-sided spectrum colours on a 2D time-frequency grid. Examples of the spectrogram will be

shown in the following sections.

5.3 Time-frequency considerations

Although (5.1) — (5.3) involve many different symbols, only three independent choices are left for the

spectral representation:

1. Block sizeM . The block length follows from Tb =M/fs.

2. Shift size L. The shift length follows from Tl = L/fs.

3. Window type.

The meaning of these properties for the spectrogram are discussed on the basis of a STFT of a fly-by

helicopter, sampled at fs = 4096Hz.

5.3.1 Spectral/temporal resolution

Figure 5.2(a) shows the STFT for block sizeM = fs/1 = 4096 and shift size L = M/2 = 2048. The

overlap ratio isM/L = 2×. Figure 5.2(b) has a 4× smaller block size: M = fs/4 = 1024. The shift

size is reduced to L =M/8 = 512, keeping the same overlap ratio of 2×. Both STFTs use a Hanning

window.

Clearly, the first STFT yields more spectral detail, whereas the second STFT has better temporal

localisation.

73

5. SHORT-TIME FOURIER TRANSFORM

Frequency [Hz]

Tim

e [s

]

0 50 100 150 200

1

2

3

4

5

6

7

8

9

[dB]0

10

20

30

40

50

60

70

80

(a) STFT for Δf = 1Hz and Tl = 0.5 s

Frequency [Hz]

Tim

e [s

]

0 50 100 150 200

1

2

3

4

5

6

7

8

9

[dB]0

10

20

30

40

50

60

70

80

(b) STFT forΔf = 4Hz and Tl = 0.125 s

Frequency [Hz]

Tim

e [s

]

0 50 100 150 200

1

2

3

4

5

6

7

8

9

[dB]0

10

20

30

40

50

60

70

80

(c) STFT forΔf = 1Hz and Tl = 0.125 s

Frequency [Hz]

Tim

e [s

]

0 50 100 150 200

1

2

3

4

5

6

7

8

9

[dB]0

10

20

30

40

50

60

70

80

(d) STFT for (b) with 3× zero-padding

Frequency [Hz]

Tim

e [s

]

0 50 100 150 200

1

2

3

4

5

6

7

8

9

[dB]0

10

20

30

40

50

60

70

80

(e) STFT for (c) using a rectangular window

Frequency [Hz]

Tim

e [s

]

0 50 100 150 200

1

2

3

4

5

6

7

8

9

[dB]0

10

20

30

40

50

60

70

80

(f) STFT for (c) using a Gaussian window, σ = 0.1 s

Figure 5.2: Six different DTFTs for the same signal. Figure (a) and (b) show the effect of a differentspectral/temporal resolution. Figure (c) increases the overlap ratio. Figure (d) applies 3× zero-paddingto (b). Figure (e) and (f) use different window functions for the settings of (c).

74

5.4. SUMMARY

5.3.2 Overlap

An overlap ratio of 2× ensures that every point of the original signal is contained in the STFT.

Following this reasoning, a higher ratio will undoubtedly bring up some redundancy in the STFT.

Nevertheless, a higher ratio can yield a better time-frequency localisation. Figure 5.2(c) shows the

STFT for M = fs/1 = 4096 and L = M/8 = 2048; the overlap ratio is 8×. Compared to figure

5.2(a), the STFT is better localised in time.

5.3.3 Zero-padding

Zero-padding is an elegant trick to reach a somewhat higher frequency resolution while keeping

the same block size. By padding a block of M samples by (for instance) 3M zeros before being

transformed by the FFT, the spectral resolution becomes 4 times higher. Although no new information

is added to the block, the increased resolution canmake it easier to recognise closely spaced frequency

peaks. The effect of 3× zero-padding is shown in figure 5.2(d).

5.3.4 Windowing

The quality of an STFT heavily depends on the chosen window function (section 2.5). Two STFTs are

shown as a comparison with figure 5.2(c). Figure 5.2(e) uses a rectangular window, resulting in sharp

frequency lines but a high level of leakage. Figure 5.2(f) uses a Gaussian window with σ = 0.1 s,

resulting in more “blurry” frequency lines.

Time and frequency localisation was formalised in section 2.6.1 by the temporal and spectral second

moments: σt and σf . These values objectively quantify the performance of the window:

figure window σt σf5.2(c) Hanning 0.14 s 0.58Hz

5.2(e) Rectangular 0.29 s 8.38Hz

5.2(f) Gaussian 0.07 s 1.13Hz

Looking at figures 5.2(c), 5.2(e) and 5.2(f), it is verified that the Gaussian window has the best temporal

localisation of the three windows. Also, the Hanning window yields the best spectral localisation.

5.4 Summary

The short-time Fourier transform is a popular method for short-time spectral analysis. The DTFT

divides a signal into shorter segments and applies the DFT to the individual blocks. The obtained

spectrum can be visualised by a waterfall diagram, with time on one axis and frequency on the other.

The temporal and spectral localisation can be controlled by proper choosing of the block size, shift size

and window type. As a last resort, zero-padding can be applied to increase the number of frequency

points. Yet, the STFT is subject to the time-frequency uncertainty principle, limiting the simultaneous

temporal and spectral localisation.

75

Chapter6

Fan Chirp Transform

Many real-life signals are non-stationary by nature. Examples are countless and include recordings

or measurements of speech, music, passing vehicles, engines, etc. The short-time Fourier transform

(STFT) as discussed in chapter 5 can be applied to any signal. However, if the signal comprises rapidly

changing frequency content, the results can be cumbersome: the DFT tries to “project” the changing

frequencies on a rectangular-tiled time-frequency grid, resulting in undesired frequency spreading.

For highly instationary signals, the question arises if the rectangular grid provides the best basis

for analysis. The answer is “not really” and an ingenious alternative is provided by the Fan Chirp

Transform (FChT). The fan chirp transform was proposed in 2007 by Luis Weruaga and Márian

Képesi [21]. It effectively provides basis vectors in a fan-geometry by pre-processing the signal with

time-warping technique as discussed in chapter 3. The Short-Time Fan Chirp Transform (STFChT)

implements this time-warping and operates per block, similar to the STFT.

Figure 6.1 illustrates the difference between the FChT and STFChT schematically. Considering the

fact that the harmonic structure of a tonal component ensures constant ratios to the fundamental

frequency, the skewness of the grid must increase accordingly.

frequency

tim

e

(a) DFT basis grid

frequency

tim

e

(b) FChT basis grid

Figure 6.1: Schematic representation of a non-stationary tonal component against a rectangular-tiledand fan-tiled basis grid.

77

6. FAN CHIRP TRANSFORM

6.1 Formulation of the Fan Chirp Transform

The Fan Chirp Transform1 for the continuous-time domain as formulated in [21] reads:

yα(f) =

∫ ∞

−∞y(t)

√|φ′α(t)| e−i 2πfφα(t) dt −∞ < f <∞ (6.1)

The linear time warp function φα(t) is defined by equation (3.1). The term√φ′α(t) is a normalisation

function that preserves the unitarity of the transformation2. Equation (6.1) can be interpreted as a

projection of y(t) onto a set of chirping basis functions e−i 2πfφα(t).

By applying the change of variable τ = φα(t) and inversely t = φ−1α (τ) = ψα(τ), equation (6.1) is

placed on the warped-time axis and becomes:

yα(f) =

∫ ∞

−∞y(ψα(τ))

√∣∣φ′α(ψα(τ))∣∣ e−i 2πfτ dτ

=

∫ ∞

−∞y(τ) ρα(τ) e

−i 2πfτ dτ −∞ < f <∞ (6.2)

The latter equation uses two substitutions:

1. The warped signal y(τ) obtained by the procedure of linear time-warping, as discussed in

section 3.1.

2. A normalisation function ρα(τ) which can be shown to be [21, 9]:

ρα(τ) =1

4√|1 + 2ατ | (6.3)

Equation (6.2) is easily recognised as a Fourier transform of the product of y(τ) and ρα(τ). It can

therefore be computed efficiently by the fast Fourier transform.

6.2 Short-time Fan Chirp Transform

The short-time fan chirp transform (STFChT) combines the STFT and FChT and computes the

following transformation:

yb[k] =1

M

M−1∑m=0

yb[m] ραb[m]w[m] e−i 2π k m

M k = 0, . . . ,M−1 b = 1, . . . , B (6.4)

Vector yb[m] denotes the warped signal block b centred at time instance tb and obtained by linear time-

warping with chirp rate αb. Note that the warped block is obtained by non-uniform interpolation (see

section 3.3) and as a consequence, the time instances do not fully correspond with those of the STFT

blocks. This was illustrated by example 6.

The vector ραb[m] is the normalisation term of (6.3) for block b. w[m] represents a symmetric window

function. Note that windowing is applied on the warped-time axis, while it can also be applied on the

linear-time axis, i.e. prior to the time-warping. [5] suggests the first method, motivated by the fact that

the main function of the window is improving the spectral representation, rather than distributing the

signal evenly over time. For both cases however, the peak of the window will always correspond to

the centre time instance tb.

1The term “fan” points out that all frequencies focus in the same focal point, see section 3.1.2.2It can be shown that due to this term, the transformation is still unitary and therefore Parseval’s theorem also applies to

the FChT [21].

78

6.2. SHORT-TIME FCHT

6.2.1 Block chirp rate

In contrast to the STFT, the frequencies of the STFChT are instationary, as was illustrated by the

skewed grid of figure 6.1(b):

fkb(t) = (1 + αb(t− tb)) − 1

2Tb ≤ (t− tb) <12Tb (6.5)

The chirp rates can be set individually for each block, within the time support limit of (3.9):

− 1

2αb< (t− tb) <

1

2αb

The block chirp rate is therefore limited by the block length Tb:

αb <1

Tb(6.6)

Considering a tonal component with instantaneous fundamental frequency f0(t), the ideal linear

chirp rate for block b is simply estimated by:

αb =f ′0(tb)

f0(tb)b = 1, . . . , B (6.7)

As the fundamental frequency is often given or approximated as a discrete-time vector f0[n], the chirp

rate can be found by finite differences, for instance:

αb =f0[nb + 1]− f0[nb − 1]

2Δt f0[nb]b = 1, . . . , B (6.8)

If the fundamental frequency is not (precisely) known, one can vary the chirp rate and choose the rate

that yields the sharpest spectrum / highest peaks. This approach is used in [5].

6.2.2 Example 10: Short-time fan chirp transform of a chirp wave

Consider a T = 5 s signal y[n] of a chirp signal sampled at fs = 1024Hz. The instantaneous

frequency is given by:

f0(t) = 10 + 1220t

2 0 ≤ t < T

The signal consists of two partials with frequencies f0(t) and 2f0(t). Using equation (6.7), the chirp

rate for block b at time tb reads:

αb =20tb

10 + 10t2bb = 1, . . . , B

The maximum of αb is 1 at tb = 1 s. Therefore, the block length Tb is chosen to be 1 s to satisfy the

time-support of (6.6).

An STFT and STFChT are computed for y[n] and shown in figure 6.2. Both transformations use block

length Tb = 1 s, shift length Tl = 1/32 s and a Hanning window. The time-warping for the FChT is

performed by 8-point spline interpolation.

79


Frequency [Hz]

Tim

e [s

]

0 100 200 300 400 5000.5

1

1.5

2

2.5

3

3.5

4

[dB]−160

−140

−120

−100

−80

−60

−40

−20

0

(a) STFT

Frequency [Hz]

Tim

e [s

]

0 100 200 300 400 5000.5

1

1.5

2

2.5

3

3.5

4

[dB]−160

−140

−120

−100

−80

−60

−40

−20

0

(b) STFChT

Figure 6.2: A instationary wave represented by a STFT and a STFchT.

As expected, the chirp waves appear as widely spread frequency bands in the STFT of figure 6.2(a).

This can be regarded as a very bad analysis, since the chirp signal covers a considerable part of the

spectrum.

The STFChT of figure 6.2(b) shows much more concentrated lines. However, the warping operation

introduces some leakage. Most leakage is found around t = 1 and t = 4.5. The leakage around t = 1

can be explained by the fact that the chirp reaches the time-support limit as defined by (6.6). The

leakage around t = 4.5 is due to aliasing: the frequencies at t = 4.5 are warped in the vicinity of the

Nyquist sampling limit 12fs = 512Hz. It was verified that both types of leakage are absent for a signal

with fundamental frequency f0(t) = 10+ 1210t

2 with the maximum chirp rate 0.707 and frequencies

extending to 270Hz.

6.2.3 Example 11: Short-time fan chirp transform of an engine run-up

The possibilities of the STFChT are illustrated on the basis of a typical signal from dynamic

experiments: a microphone measurement of a car engine during a run-up. The signal and

transformation are characterised by:

– Sample rate: fs = 8192Hz.

– Signal length: T = 30 s.

– Block length: Tb = 1 s.

– Shift length: Tl = 1/8 s.

– Window: Hanning.

The chirp rate was estimated from the tacho-vector (the engine speed in RPM) by finite differences

using equation (6.8). Since the engine speed increases slowly, the chirp rate is rather low: α stays

below 0.05. Still, the STFChT offers better spectral information than STFT, as can be observed from

figure 6.3.

80

6.2. SHORT-TIME FCHT

Frequency [Hz]

Tim

e [s

]

DFT Engine run−up

0 500 1000 150010

11

12

13

14

15

16

17

18

19

20

[dB]0

10

20

30

40

50

60

70

80

(a) STFT

Frequency [Hz]

Tim

e [s

]

FCHT Engine run−up

0 500 1000 150010

11

12

13

14

15

16

17

18

19

20

[dB]0

10

20

30

40

50

60

70

80

(b) STFChT

Figure 6.3: An STFT and STFchT of an engine run-up.

81


Frequency [Hz]

Tim

e [s

]

DFT Engine run−up

600 650 700 750

20

21

22

23

24

(a) STFT

Frequency [Hz]

Tim

e [s

]

FCHT Engine run−up

600 650 700 750

20

21

22

23

24

(b) STFChT

Figure 6.4: Detail of the STFT and STFchT.

Figure 6.4 shows a detail of the STFT and STFChT. The linear warp operation successfully transforms

the instationary signal blocks into stationary blocks.

6.3 Summary

A well-known shortcoming of the DFT is its inability to detect and localise instationary signals. As

the basis vectors of the DFT are constant (rectangular), an instationary wave always projects on a

group of frequencies. The fan chirp transform (FChT) effectively provides basis functions in a fan-

geometry, matching the harmonic structure of an instationary tonal component. This is realised by

applying time-warping to the original signal, prior to being processed by the DFT.

The short-time fan chirp transform (STFChT) is in essence the STFT of time-warped signal blocks.

The improvement depends on the quality of the time-warp process. If the fundamental frequency of a

dominant tonal component is known, the required linear block chirp rates can be estimated by finite

differences. If however no fundamental frequency information is available, one can vary the chirp

rate per block and choose the chirp rate that yields the sharpest spectrum.

82

Part IV

Pitch Tracking & Order Extraction

Chapter7

Pitch Tracking Techniques

Pitch tracking can be an important task in analysis of dynamic measurements. Let us for example

consider a measurement of the vibration of the exhaust pipe of combustion engine during a run-up

from 1000 RPM to 3000 RPM. Such measurements are performed to characterise the response of the

mechanical parts to its rotational inputs. After measurement, the acquired data is stored in time-

domain signals. Short-time spectral analysis can then be performed right away, for example with

techniques described in chapter III. However, one often wants to replace the time axis by a linear

scale of RPM and the frequencies by equivalent engines orders. The first implementation is called a

Campbell diagram, the latter an order plot.

There are several ways to achieve this [2]. Regardless of the method, one will need exact information

of speed (RPM) as a function of time. In most cases this information is obtained during the

measurement by so-called tacho pulses: a pulse is released for every full or partial revolution of

the engine shaft. From these pulses, a vector can be constructed that accurately describes the

instantaneous speed, which is related to the fundamental frequency of the spectrum by some ratio

depending on the engine configuration (number of cylinders, 2 or 4-stroke).

Additionally, one may want to extract orders from the time-domain signal in order to analyse them

separately. As long as a tacho vector is supplied with the measurements, the diagrams and orders

are obtained rather easily. However, the availability of such data is not obvious, for example in the

following cases:

– Acoustic (microphone) measurements of a car while driving

– Asynchronous components in measurements

– Acoustic measurements on drive-by or fly-by vehicles

– Any other dynamic system for which no a-priori fundamental frequency data is available

For these situations, a pitch tracking algorithm can be employed to determine the instantaneous

frequency of instationary periodic components in a signal.

Note:

In most research, the term pitch is used instead of fundamental frequency. Although strongly related,

the fundamental frequency is an objective property, while the term pitch is subjective to a listener and

should actually be preserved for a perceptual context, as stated in a.o [4, 11]. Nevertheless, both terms

are used interchangeably in this chapter.

85

7. PITCH TRACKING TECHNIQUES

7.1 Pitch detection

Pitch detection is somewhat different than pitch tracking. Pitch detection (or fundamental frequency

estimation) is the procedure by which a fundamental frequency of a stationary periodic component

is sought within a signal. This is usually done without any knowledge of the expected location of

the fundamental. Pitch detection algorithms (PDA) exist for both time and frequency domain. Most

applications are the detection of pitch and automatic transcription of musical or speech signals. Often,

PDAs are limited to monophonic signals, i.e. signals with only one tonal component.

Popular time-domain techniques are often limited to detection of monophonic signals and include:

– Time event rate techniques: zero-crossing rate, peak rate, slope event rate

– Autocorrelation techniques

– Phase space techniques

Frequency domain methods operate on the Fourier transform of a signal. Many of them benefit from

the presence of harmonics. Typical methods are:

– Spectral autocorrelation

– Harmonic product spectrum

– Cepstrum

– Maximum likelihood estimators

– Multi-resolution techniques

Most methods return a vector with a certain “score” for a series of fundamental frequency candidates.

The pitch is then usually estimated as the frequency with the highest score. All methods mentioned

above are discussed in [11, 7, 8].

7.2 Pitch tracking

The aim of a pitch tracking algorithm (PTA) is to follow the fundamental frequency of an instationary

periodic component in a signal that is reasonably continuous over time. The latter means that there

are no large discontinuities in the fundamental frequency trajectory. To do this, the PTA should be

robust to noise and interfering orders and accurate in both temporal and spectral sense.

An STFT of a typical signal is shown in figure 7.1. It concerns a 50 seconds recording of a fly-by

helicopter, recorded from ground. It can be immediately observed that the first fundamental starts

at approximately 30Hz and ends around 26Hz, in this case due to the Doppler effect of the passing

helicopter. A second fundamental is located around 80Hz, decreasing to 64Hz. It is understood that

the first fundamental corresponds to the main rotor and the second fundamental to the tail rotor.

The goal of pitch tracking would be to track the frequency lines, i.e. follow the evaluation of

fundamental frequency over time.

86

7.2. PITCH TRACKING

Frequency [Hz]

Tim

e [s

]

DFT Fly−by helicopter

0 50 100 150 200 250 300

5

10

15

20

25

30

35

40

45

[dB]0

10

20

30

40

50

60

70

80

Figure 7.1: STFT of a recording of a fly-by helicopter.

7.2.1 Pitch salience

Let us recall the definition of the phase-normalised instantaneous timbre (4.10) as introduced in

section 4.2:

cθ(t) =

⎛⎜⎜⎜⎝cθ1(t)

cθ2(t)...

cθK(t)

⎞⎟⎟⎟⎠

The partials are obtained from a discrete-time signal by (4.12). The timbre is centred around a point in

time t and applies to a periodic component with fundamental frequency f0. If cθ(t) was determined

for a frequency where no tonal component is present, the vector would still exist but have much

smaller values. The highest values are presumably found at the location of the dominant tonal

component. This presumption leads to the following definition of pitch salience:

sb(f0) =

K∑k=1

|ck(f0, tb)| b = 1, . . . , B (7.1)

sb(f0) is the sum of the partial amplitudes of a harmonic series with fundamental frequency f0 and

K harmonics, centred around time instance tb. The time instances are defined similar to section 5.1.

The pitch salience of the helicopter signal is determined for t = 10 using K = 8 harmonics, block

length Tb = 1 s and a Hanning window. The results are shown in figure 7.2. The highest peak is

found at f0 = 29.5Hz, which corresponds to the main rotor fundamental.

87

7. PITCH TRACKING TECHNIQUES

0 50 100 150 200 250 3000

100

200

300

400

500

600

700

Fundamental frequency [Hz]

Sal

ienc

e

Figure 7.2: Pitch salience at t = 10 s. The highest peak is found around f0 = 29.5Hz.

7.2.2 Salience tracking

Let us say that the the fundamental frequency of the main rotor was known for the first time instance

t1, but that the evaluation over time was unknown. Then the fundamental frequency trajectory can

be followed by:

fb = maxf0

sb(f0) b = 1, . . . , B (7.2)

Equation (7.2) can be performed by a line search optimisation algorithm. Note that the search should

be performed in the vicinity of the previously found fb−1. To ensure that the algorithm does not jump

to another local maximum, a Gaussian function can be included:

fb = maxf0

⎡⎢⎢⎣sb(f0) e−

12

⎛⎝f0 − fb−1

σf0

⎞⎠

2⎤⎥⎥⎦ b = 1, . . . , B (7.3)

The Gaussian imposes a standard deviation of σf0 around the previously found frequency fb−1,

thereby masking spurious local maxima. Figure 7.3 shows the salience for σf0 = 10Hz.

Equation 7.3 allows for easy and accurate estimation of the fundamental frequency vector.

7.2.3 Pitch tracking

It was observed in chapter 4 that timbre tends to remain constant over time. This is a very useful

property for pitch tracking. For example, the timbre difference can be defined as:

sb(f0) =∥∥∥c(f0, tb)− c(f0, tb−1)

∥∥∥ b = 1, . . . , B (7.4)

By minimization of sb(f0), a similar estimate of the fundamental frequency trajectory can be found.

88

7.3. SUMMARY

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

Fundamental frequency [Hz]

Sal

ienc

e

Figure 7.3: Pitch salience including a Gaussian function.

Experiments with different minimization and maximization functions were promising. Due to limited

time, the results are not reported. The CD-ROM contains several examples of timbre-tracking locked

tracking.

7.3 Summary

The aim of pitch tracking is to follow a fundamental frequency trajectory as it develops over time.

Good results can be obtained frommaximisation of a pitch salience function: the cumulative harmonic

amplitudes obtained by DTFT. To make sure that the algorithm does not jump to another local

maximum, a Gaussian function can be included, window one the frequency axis.

Even better estimates can be obtained by minimisation of the difference in instantaneous timbre. Due

to limited time, this was not reported.

89

Chapter8

Vold-Kalman Order Filtering

The last topic of this thesis is order extraction using the Vold-Kalman filter (VKF) algorithm. The

previous chapters discussed means to identify and visualise periodic components in signals. Order

extraction tries to isolate these components and extract them from a signal. The VKF is perfectly

suited for extraction of instationary components, as will be discussed in this chapter.

8.1 Vold-Kalman Filter

The Vold-Kalman filter, introduced by Håvard Vold and Jan Leuridan in 1993, is a filter for extraction

of instationary periodic components from a signal using a known frequency vector. Being formulated

in a least-squares problem, it can be solved as a linear system. Similar to the Kalman filter, the

Vold-Kalman filter uses structural equation and a data equation. In what follows, only the second

generation VK filtering is discussed. References to VK filtering in general are [20, 10, 19].

8.1.1 Data equation

Let us consider the signal y[n], n = 0, . . . , N−1. The signal is assumed to be composed ofK sinusoidal

components plus remaining noise η[n], similar to the sinusoids plus noise model of equation (1.13):

y[n] =

K−1∑k=0

xk[n]sk[n] + η[n] (8.1)

Every sinusoidal component consists of a complex amplitude xk[n] and a complex phasor sk[n]. By

definition of (1.4a), the phasor writes in respectively continuous and discrete time:

sk(t) = exp

(i

∫ t

0

2πfk(τ)dτ

)(8.2a)

sk[n] = exp

(i

n∑m=1

2πfk[m]Δt

)(8.2b)

In equation (8.1), η[n] is the remaining noise signal, which is to be minimised by proper choosing

of the amplitude vectors xk and phasor vectors sk[n]. For extraction of the kth component, the data

equation in matrix notation reads:

y −Ckxk = δ (8.3)

with the complex phasor sk stored on the diagonal of the matrix Ck .

91

8. VOLD-KALMAN ORDER FILTERING

8.1.2 Structural equation

The structural equation imposes smoothness on the amplitude vector xk using a finite difference

sequence. For a first order difference, the structural equation writes:

∇xk[n] = xk[n]− xk[n− 1] = εk[n] (8.4a)

Higher-order differences are found by applying the backward difference equation1:

∇(2)xk[n] = xk[n]− 2xk[n− 1] + xk[n− 2] = ε(2)k [n] (8.4b)

∇(3)xk[n] = xk[n]− 3xk[n− 1] + 3xk[n− 2]− xk[n− 3] = ε(3)k [n] (8.4c)

∇(4)xk[n] = xk[n]− 4xk[n− 1] + 6xk[n− 2]− 4xk[n− 3] + xk[n− 4] = ε(4)k [n] (8.4d)

or in general for order p:

∇(p)xk[n] =

p∑r=0

(−1)r(p

r

)xk[n− r)] = ε

(p)k [n] (8.4e)

εk[n] is the error made in the smoothness of the amplitude vector xk[n], that is to be minimised.

The system of structural equations for all amplitudes xk[0], xk[1], . . . , xk[N − 1] takes the form of a

matrix equation; let us for example consider the second order differences:⎡⎢⎢⎢⎢⎣1 −2 1 0 0 0

0 1 −2 1 0 0. . .

. . .. . .

0 0 0 1 −2 1

⎤⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎣

xk[0]

xk[1]...

xk[N − 1]

⎤⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎣

εk[2]

εk[3]...

εk[N − 1]

⎤⎥⎥⎥⎥⎦ (8.5)

Note that error εk[0] and εk[1] can not be determined, as the first amplitude is xk[0] corresponding

with y[0]. Ak is therefore an (N − p)×N matrix, p being the order of the difference. The symbolic

notation reads:

Akxk = εk (8.6)

8.1.3 Least squares problem

The data equation and structural equation combine to a least squares problem. The data error (8.3)

and structural error (8.6) have the scalar quadratic forms (dropping the �k subscript for readability):(yT − xHCH

)(y −Cx) = ηHη (8.7a)

xHAHAx = εHε (8.7b)

The common unknown variable in (8.7a) and (8.7b) is the complex amplitude vector x. Introducing a

selectivity scalar r, a cost function J(x) is composed:

J(x) = r2εHε+ ηHη =

r2xHATAx+(yT − xHCH

)(y −Cx) (8.8)

1Similar schemes may also be obtained with central or forward difference equations.

92

8.2. VKF OPERATION

The minimum of J(x) is found by evaluation of the derivative dJ/dxH = 0:

dJ

dxH= 2r2ATAx+ 2

(CHCx−CHy

)= 0 (8.9)

Rewriting the equation for x and observing that CHC = I , one obtains the linear system:(r2ATA+ I

)x = CHy (8.10)

x = B−1c (8.11)

substituting B =(r2ATA+ I

)and c = CHy in the latter equation.

By construction, det r2ATA = 0 and it can be shown that r2ATA is a positive semi-definite matrix

with 2p+ 1 non-zero diagonal bands. The addition of unity I turnsB into a positive definite matrix.

Note that matrix B is independent of the tracked frequency in C and the signal y. Moreover, since

B is symmetric and sparse, it is easily factorised into a Cholesky decomposition and can be reused

again for tracking of orders with different phasors.

8.1.4 Order extraction

The above procedure operates on a single order k. The result is a complex amplitude xk[n] for a signal

with phasor sk[n]. The desired signal yk[n] for order k is simply obtained with:

yk[n] = 2 · � (xk[n] sk[n]) n = 0, . . . , N−1 (8.12)

Factor 2 is necessary since only the positive frequency part was shifted and extracted (see figure 8.1).

The obtained order yk[n] is the signal tracked at frequency fk[n] with the same phase as it appears in

the signal. Depending on the selectivity factor r, some neighbouring frequency content is included as

well.

If the aim was to filter out the component with frequency vector fk[n], the filtered signal y(f)[n] is

obtained by:

y(f)[n] = y[n]− yk[n] n = 0, . . . , N−1 (8.13)

or for multiple components k = 1, . . . ,K :

y(f)[n] = y[n]−K∑

k=1

yk[n] n = 0, . . . , N−1 (8.14)

The latter procedure can be applied per order in a step-by-step way, provided that the orders do not

have close or crossing frequency trajectories. If this is the case, multiple orders can be extracted

simultaneously using a multiple-component algorithm; see section 8.2.4.

8.2 VKF operation

Let us recall equation (8.10) for tracking of a single order k:(r2ATA+ I

)xk = CH

k y

93


At the right-hand side, signal y is pre-multiplied with the transposed phasor matrix CHk . Theoreti-

cally, this operation shifts the tracked frequency fk in y towards zero, or rather: fk and−fk become

0fk and −2fk since the original signal was real. That means that the wave of interest is modulated

to 0Hz, hence it has become an amplitude envelope at DC.

On the left-hand side, matrix B = r2ATA + I represents a zero-phase low-pass filter on the

amplitude vector x. An increase of r2 yields an increase of the smoothness constraint, corresponding

with a decreasing relative cut-off frequency fc/fs. However, the numerical stability gets compromised

once matrixB becomes close to singular due to a high r-value. As an illustration: except for the first

and last p rows, the entries in row i for p = 2 and r = 100 are:

Bi =[ · · · 0 10000 −40000 60001 −40000 10000 0 · · · ]

(8.15)

while r is typically in the range of 104 − 106 for a reasonably narrow passband (see section 8.2.2 for

details on selection of r).

The procedure in the frequency domain is illustrated in figure 8.1. An single sinusoidal wave is

considered. The red peaks represent the original signal y, the blue peaks are the modulated part

CHy. The yellow area is the low-pass filter on xk that is imposed byB.

amplitude[dB]

frequency [Hz]

fs/2−fs/2

fc0

0

−3

−∞

Figure 8.1: The VKF shifts the frequency content to DC and applies low-pass filtering.

8.2.1 Solving the linear system

The linear system to solve is typically very large, sparse and only lightly coupled. As already

mentioned, it is a common practice to factorise matrix B into a Cholesky decomposition. This is

possible since B is symmetric and positive definite. The decomposition is B = LU with U = LT .

Once the triangular matrices are calculated, an efficient forward reduction and backward substitution

can be carried out:

Lc′ = c (8.16a)

Ux = c′ (8.16b)

with c = CHy and c′ as an intermediate vector. The forward reduction is computed with the

coefficients of matrix L. L is a lower triangular matrix with p non-zero diagonals below the main

94

8.2. VKF OPERATION

diagonal, meaning that most of the lower triangle is zero. For a filter with p = 2, the computation

evolves as follows:

c′1 = c1/L1,1

c′2 = (c2 − L1,2 c′1) /L2,2

c′3 = (c3 − L1,3 c′1 − L2,3 c

′2) /L3,3

. . .

c′N = (cN − LN−2,N c′N−2 − LN−1,N c′N − 1) /LN,N

In this process, vector c undergoes low-pass filtering and some delay is gained. Steady-state values

for the entries of L can be defined:

l0 = limj→∞

Lj,j l1 = limj→∞

Lj,j+1 l2 = limj→∞

Lj,j+2 (8.17)

Using these coefficients, a transfer function HF (z) can be determined for the forward operation of

(8.16a) by applying the transformation c′j+k = c′j z−k to the z-domain:

HF (z) =c′(z)c(z)

=1

l0 + l1 z−1 + l2 z−2(8.18)

The backward substitution is performed similarly but in the opposite direction using U :

xN = c′N/UN,N

xN−1 = (c′N−1 − UN−1,N xN ) /UN−1,N−1

xN−2 = (c′N−2 − UN−2,N−1 xN−1 − UN−2,N xN ) /UN−2,N−2

. . .

x1 = (c′1 − U1,2 x2 − U1,3 x3) /U1,1

Since U = LT , the steady-state values for u0, u1, u2 equal l0, l1, l2 from (8.17). The transfer function

for the backward substitution of (8.16b) is:

HB(z) =x(z)

c′(z)=

1

u0 + u1 z1 + u2 z2(8.19)

which is (8.18) in the opposite direction. In fact, c is processed twice and the delays cancel each other

out, resulting in zero-phase filtering with transfer function:

H(z) =x(z)

c(z)= HF (z)HB(z) (8.20)

The transfer functions have the same gain characteristics. The frequency response of the combined

transfer function is found by substitution of the complex quantity z = eiΩ:

∣∣H(eiΩ)∣∣ = ∣∣HB(e

iΩ)∣∣2 =

∣∣∣∣ 1

u0 + u1 eiΩ + u2 e2iΩ

∣∣∣∣2 (8.21)

Noticing that Ω = 2πf/fs, the frequency response characteristics can be obtained from the LU

factorisation coefficients u0, u1, u2 or u0, . . . , up in general for order p.

95


100 110 120 130 140 150 160 170 180 190 200−60

−50

−40

−30

−20

−10

−30

Frequency [Hz]

Mag

nitu

de [d

B]

1−pole, r = 2.0e+0012−pole, r = 6.5e+0023−pole, r = 2.1e+0044−pole, r = 6.6e+005

Figure 8.2: Frequency response of the VKF.

8.2.2 Bandwidth and roll-off

The −3 dB bandwidth of the filter is controlled by the value of r and is usually expressed relative to

the Nyquist frequency fs:

Δf =fcfs/2

(8.22)

Equations exist that relate r to the coefficients u0, . . . , up in (8.21); see for the derivation [19]. With

the following equations, a desired bandwidth Δf can be translated into the required r value:

p = 1 r2 =

√2− 1

2− 2 cos(πΔf)

p = 2 r2 =

√2− 1

6− 8 cos(πΔf) + 2 cos(2πΔf)

p = 3 r2 =

√2− 1

20− 30 cos(πΔf) + 12 cos(2πΔf)− 2 cos(3πΔf)

p = 4 r2 =

√2− 1

70− 112 cos(πΔf) + 56 cos(2πΔf)− 16 cos(3πΔf) + 2 cos(4πΔf)

Hence, a higher value of r results in a smaller bandwidth but also a slower varying amplitude x. The

latter may be a drawback when the signal is subject to large amplitude fluctuations in time.

As an illustration, figure 8.2 shows the bandwidth and roll-off for filters with p = 1, . . . , 4. The signal

is a swept sine from 100 to 200Hz, sampled at fs = 1000Hz, while the tracked frequency is 150Hz.

The absolute bandwidth for all cases is fc = 5Hz, hence the relative bandwidth isΔf = 5/500 = 1%,

resulting in the following values for r:

rp=1 = 20.5 rp=2 = 652.2 rp=3 = 20759.5 rp=4 = 661817.5

96

8.2. VKF OPERATION

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Time [s]

Am

plitu

de

1−pole, fc = 1 Hz

1−pole, fc = 5 Hz

3−pole, fc = 1 Hz

3−pole, fc = 5 Hz

Figure 8.3: VKF impulse response

The filter roll-off is 20p dB per decade, or 40p dB on a quadratic (power) scale. This corresponds with

the number of poles, since it is a quadratic filter.

The response to an amplitude change is shown in figure 8.3. A wave with f = 100Hz undergoes

an amplitude step from 0.1 to 1.0 at t = 0.5 (the black dashed line). The response is shown for the

following filter settings:

– p = 1 and Δf = 1Hz




It can be concluded that a high frequency selectivity (1Hz) results in a slow response to amplitude

changes. In addition, the third order filter shows some overshoot. This is in accordance with filter

theory.

8.2.3 Time-varying bandwidth

Often it is required to change the bandwidth over time, for example in situations with very high slew

rates or inaccurate frequency vectors. Recalling the cost function (8.8), the introduction of a N × N

diagonal matrix R enables us to choose r-values per sample:

J(x) = εHRTRε+ ηHη =

xHATRTRAx+(yT − xHCH

)(y −Cx) (8.23)

After putting the derivative to zero, one yields a filter matrix B =(ATRTRA+ I

)that is still

positive definite, since RTR is positive and diagonal.

97


8.2.4 Multi-order tracking

The algorithm described in section 8.1 is capable of tracking a single order from a signal. Successive

orders can be tracked or extracted in a step-by-step way, as long as the orders are not close or crossing.

Equation (8.8) can be modified for simultaneous extraction ofK orders:

J(x) =

K∑k=1

r2kεHk εk + ηHη =

K∑k=1

r2kxHk AT

kAkxk +

(yT −

K∑k=1

xHk CH

k

)(y −

K∑k=1

Ckxk

)(8.24)

The derivative with respect to the amplitude xi becomes:

dJ

dxHi

= r2kATi Aixi +CH

i

K∑k=1

Ckxk −CHi y = 0 i = 1, . . . ,K (8.25)

hence coupling between the amplitude vectors is introduced. Still, CHi Ckxk = I for i = k. The

global solution reads:⎡⎢⎢⎢⎢⎢⎣

B1 CH1 C2 · · · CH

1 CK

CH2 C1 B2 · · · CH

2 CK

......

. . ....

CHKC1 CH

KC2 · · · BK

⎤⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎣x1

x2

...

xK

⎤⎥⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎢⎣CH

1 y

CH2 y

...

CHKy

⎤⎥⎥⎥⎥⎥⎦ (8.26)

The matrix on the left-hand side has grown to a dimension of NK × NK and is still complex

Hermitian. However, it is no longer banded diagonal, which implies that the solution can not be

obtained efficiently by forward reduction and backward substitution with a Cholesky factorisation.

The most effective way to solve (8.26) appears to be the preconditioned conjugate gradient method

(PCG), as discussed in [10].

The major benefit of the multi-component algorithm is the ability to track crossing orders while

preserving the correct amplitude. The off-diagonal terms take care of energy spreading for order

phasors that are not orthogonal. Note however that the cross terms CHi Ck should disappear for

harmonics, since harmonics are by definition orthogonal to each other.

8.2.5 Example 12: Helicopter signal seperation

As an illustration of VK filtering, a recording of a fly-by helicopter is considered. The original signal

is denoted by y[n], sampled at fs = 4096Hz. The STFT of y[n] is shown in figure 8.4.

The original signal y is being separated into three signals: the the main rotor signal y(m), the tail

rotor signal y(t) and the remaining signal y(r).

Main rotor

The fundamental frequency vector of the main rotor was determined by the time-domain tracking

technique of section 7.2.3 and stored as vector f(m)0 . A variable bandwidth vector was defined that

98

8.2. VKF OPERATION

Frequency [Hz]

Tim

e [s

]

0 200 400 600 800 1000

5

10

15

20

25

30

35

40

45

Figure 8.4: STFT of the original fly-by helicopter signal.

varies between 1 and 4 Hz, based on the rate of change of f(m)0 , hence the relative bandwidth lies

between 0.5 and 2%. The filter order is p = 2; the first K = 120 orders are extracted.

The STFT of the extracted signal y(m) and the filtered signal y − y(m) are shown in figure 8.5(a)

and 8.5(b). The main rotor signal contains a large part of the energy (see section 2.4.3): 84.3% of the

original signal.

Tail rotor

Having subtracted the main rotor signal, the tail rotor harmonics with fundamental starting at 80Hz

are now much easier to identify in figure 8.5(b). A frequency vector f (t) is obtained following

Frequency [Hz]

Tim

e [s

]

0 200 400 600 800 1000

5

10

15

20

25

30

35

40

45

(a) STFT of main rotor signal

Frequency [Hz]

Tim

e [s

]

0 200 400 600 800 1000

5

10

15

20

25

30

35

40

45

(b) STFT of filtered signal

Figure 8.5: Extraction of main rotor signal using VKF.

99


Frequency [Hz]

Tim

e [s

]

0 200 400 600 800 1000

5

10

15

20

25

30

35

40

45

(a) STFT of tail rotor signal

Frequency [Hz]

Tim

e [s

]

0 200 400 600 800 1000

5

10

15

20

25

30

35

40

45

(b) STFT of remaining signal

Figure 8.6: Extraction of tail rotor signal.

the same approach as before. This time, 30 harmonics are included. The filtering is performed on

the newly obtained signal y − y(m). The resulting tracked signal is y(t), the remaining signal is

y(r) = y − y(m) − y(t).

The STFTs are shown in figure 8.6(a) and 8.6(b). The tail rotor energy is 54.9% of the previously

filtered signal and only 7.6% of the original signal.

Remaining signal

The remaining signal y(r) still contains audible content of the main and tail rotor, but much less

than before and mostly in a higher frequency range. Nevertheless, some chirping birds can now be

distinguished, that were almost drowned in the original signal.

8.3 Summary

Vold-Kalman filtering can be used to extract instationary periodic components from a signal. The

algorithm is formulated as a least-squares problem, that can be solved efficiently using a Cholesky

factorisation. The filtering process is controlled by a frequency vector and a (possibly) time-varying

bandwidth. The obtained order has zero phase delay and can thus be subtracted from the original

signal to obtain a filtered signal.

The VKF can be interpreted as a running band-pass filter on a single order. If crossing orders appear

in the signal, a multi-order implementation can be employed. However, this is numerically more

demanding and in case of harmonic orders unnecessary, as harmonic orders are orthogonal.

The VKF was applied to a recording of a fly-by helicopter and appeared to be able to separate the

independent contributions of the main and tail rotor from the signal.

100

Appendices

AppendixA

C-code

A.1 hdtft

The following code is used to efficiently calculate equation (4.12), i.e. the H partials for a tonal

component at f0. It is wrapped by a Matlab R© gateway function (not shown here) that accepts the

syntax:

� � ��

� is the (windowed) signal block, �� the sample rate in Hertz, � is a 1 × F vector with one or more

fundamental frequencies and � is a H × 1 vector specifying the harmonics. The algorithm returns a

H × F array � containing the harmonic partials estimated for the given fundamental frequencies.

��

��

��

�� !�� "�# $ ��!�� "�� $ ��!�� "%$ ��!�� " $

��!�� & $ ��!�� "�$ '(�)� *$ '(�)� +$ '(�)� ,-

.

'(�)� �$ /$ �0

��# �120 �3,0 �44- .

��# /120 /3+0 /44- .

��# �120 �3*0 �44- .

" �#4�"+4/- 41 ��& �" �� " " 4/- " " �4�- " �5�&- " " %4�-0

" ��4�"+4/- 41 &�� " �� " " 4/- " " �4�- " �5�&- " " %4�-0

6

" �#4�"+4/- 1 " �#4�"+4/- 5 *0

" ��4�"+4/- 1 " ��4�"+4/- 5 *0

6

6

6

103

AppendixB

Matlab R© functions

B.1 window

This function returns time-domain windows as discussed in section 2.5. Many popular windows

are implemented as formulated by [12]. Also, the cosine-sigma window is available, as proposed in

section 2.5.6.

�� ' 1 '��' �%7� $ *$ ��#�#8�� -

9:�*;<: =��#��& � �� '��'�

9 : 1 :�*;<: >?�@ $ *- #��#�& � � �� '��' '�� * 7��& $

9 &7�� !% >?�@ � > � &�77�#�� '��' �%7�& �#�A

9

9 B��8��# *�� -

9 +��8 +�� -

9 C�&�� 7-

9 C�&��(�8� 7$ &�8�-

9 +��8

9 D��/��+�##�& E�� F-

9 G��)�& B��-

9 B��&)

9 =��&&�� &�8�-

9 D�#��

9 >#��8��#

9 ��H��&&��

9 D� ��

9 +��8��&&�� -

9 C �!%& �� 7 �-

9

9 : 1 :�*;<: >?�@ $ *$ IB=� $ IB=� - ��7�& �� #8��& ��

9 ��#�� '��' $ �& & �'� �!�� (�8� �& ��'�%& #��

9 � &�� '��' ��8� �

9

9 D% �� $ � +��8 '��' '�� * 1 �� & #��#��

9

9 ,�# �7�� '��' ��& $ #��# ��A

9 ,�J� +�##�&A <� � � �&� �� '��'& ��# �#�� %&�& '��

104

B.1. WINDOW

9 ��&�#�� ,��#��# �#��&��# � �#�� @@@ �� - $ ��

9

9 :#�� !%A ��#�� # (��K& $ �2��

�� #8�� 3 �

�%7� 1 L+��8 L0

��

�� #8�� 3 �

* 1 ��0

��&�

* 1 ��!�� *-0

��

� 1 2A * M�--0

� 1 �5*0 9 2 31 � 3 �

�� 1 � M 2��0

�%7� 1 ��&�#��8 � �# �%7� -$.LB��8��# L$ L*�� L$ L+��8 L$ ��

L+��8 L$ LC�&�� L$ LC�&��(�8� L$ LD��/��+�##�& L$ LG��)�& L$ ��

LB�� L$ LB��&)L$ L=��&&�� L$ LD�#�� L$ L>#��8��# L$ ��

L��H��&&�� L$ LD� ��L$ L+��8��&&�� L$ LC �!%& �� L6-0

&'�� %7�

��&� .LB��8��# L$L*�� L6

' 1 ��& �$*-0

��&� L+��8 L

' 1 2�� M 2�� " ��& �"7�"�--0

��&� LC�&��L

�� #8�� 11 �$ 7 1 ��#�#8�� .�60 ��&� 7 1 �0 ��

' 1 &�� 7�"�-�N70

��&� LC�&��(�8� L

�� #8�� O1 �$ & 1 ��#�#8�� .�60 ��&� & 1 2��0 ��

�� #8�� O1 �$ 7 1 ��#�#8�� .�60 ��&� 7 1 �20 ��

' 1 ��& �� 5 &"&P#� 7---�N70

' 1 ' �" �!& ��- 3 &"&P#� 7-"2��" 7�-0

��&� L+��8 L

' 1 2�� M 2�� " ��& �"7�"�--0

��&� LD��/��+�##�& L

�� #8�� 11 �$ � 1 ��#�#8�� .�60 ��&� � 1 E2�� 2�� 2�2� 2F0 ��

' 1 � �- M � �- " ��& �"7�"�-- 4 ��

� �- " ��& �" 7�"�- M 4 � �- " ��& �"7�"�-0

��&� .LG��)�& L$ LB�� L6

105

B. MATLAB FUNCTIONS

' 1 &�� "��-0

��&� LB��&)L

' 1 � M �"�� -�N�0

��&� L=��&&�� L

�� #8�� 11 �$ & 1 ��#�#8�� .�60 ��&� & 1 2��0 ��

' 1 ��7 M2�� " ��5&-�N�-0

��&� .L>#��8��# L$LD�#�� L6

' 1 � M �" �!& ��-0

��&� L��H��&&�� L

' 1 �!& �� - 312��- �" � M�" �!& �" �� --�N� �" �M�!& �"�� --- 4 ��

�!& �� - O2��- �" �" � M �!& �" �� --�N�-0

��&� LD� ��L

' 1 �M �!& �"�� -- �" ��& 7�"�!& �" ��-- 4 �57� " &�� 7�"�!& �"�� --0

��&� L+��8��&&�� L

�� #8�� 11 �$ � 1 ��#�#8�� .�60 ��&� � 1 2��0 ��

' 1 2�� " � 4 ��& �" 7�"��-- �" ��7 M�"�" �!& �� --0

��&� LC �!%& �� L

�� #8�� 11 �$ ��7 � 1 ��#�#8�� .�60 ��&� ��7 � 1 �0 ��

!�� 1 ��& �5*"��& �2N ��7 �--0

: 1 M�-�N� �" ��& *"��& !�� "��& 7�"�--- �5 ��& *"��& !�� --0

' 1 #�� :--0

' 1 ' �" *5&� '-0

�� #'�&�

�##�# L*�� '��' �%7� �L-

��

9 *�#��)� '��' �� ;C- 1 2 �DA

9 ' 1 ' �" *5&� '-0

��

106

B.2. INTERPOLANT

B.2 interpolant

The following function returns an interpolation function for a given signal y[n] and time vector t[n].

The obtained function can be used for rapid non-uniform interpolation. The syntax is:

��

��

The first command generates an 8th order spline interpolation function from the signal vector � and

time vector �. The second command evaluates the function for the time instances in ��.

�� 1 ��#7�� %$�$�� $�#�-

9�*>@B�<GI*> =��#��& � ��#7��

9 ,Q* 1 �*>@B�<GI*> ?$>$�@>+<; $<B;- #��#�& � ��

9 ��7�& � ��# �� 7��&� ? �& � � &�8�� $ > � �

9 ��##�&7��8 �� @>+<; &7��& � � ��#7��

9 <B; �� !� � � #�� 7&�7��8 $ � � &7�� #��# �# � � ��!�#

9 �� 7��& �� &�� &�� A

9

9 �@>+<; <B;

9 M ��#�&� �7&�7��8

9 M ��# �7&�7��8

9 M 7� �7 �7&�7��8

9 M ��!�� 7&�7��8

9 M &7�� &7�� #��#

9 M &�� 7��& �� &�� &��

9

9 �� > �& � &��# $ �� & ��/�� & � � &�7��8 #�� ,( ��

9 ��# �& 8�� !% > 1 2A,( A G@*=>+ ?-M�-",(-� �� > 1 M�$ #�8��#

9 �M !�&�� 8 �& �7��

9

9 D% �� $ � �M7�� &7�� #7�� & �7��

9

9 (�� & �&� � � ��8�#�� & �� I>GID C�#�� ,��8 >��!��

9 (�� &� A ��#7� $ #�&�7��

9

9 :#�� !%A ��#�� # (��K& $ �2��

�� #8�� 3 �$ �#� 1 �0 ��

�� #8�� 3 �$ �� 1 L&7��L0 ��

�� #8�� 3 �$ � 1 �0 ��

�� #8�� 3 �$ �##�# L(7��% �� &� %�L-$ ��

9 C ��/ ��7��&

��#�!��& %$.L��#�� L6$.L#�� L$L��#L6-0

��#�!��& �$.L��#�� L6$.L#�� L6-0

�� 1 ��&�#��8 �� $ .L��#�&� L$L��#L$L7� �7L$ ��

L��!��L$L&7��L$L&�� L6-0

��#�!��& �#� $.L��#�� L6$.L��8�� L$L��8�# L$L&��#L6-0

9 *�!�# �� &�7��&

107

B. MATLAB FUNCTIONS

� 1 �� %-0

9 (�� #��# 5 &�7�#&�7��8

�� &�!�# �� $.L&7�� L$L&�� L6-

# 1 �0

P 1 �#�0

��&�

# 1 �#�0

��

9 I77�% &�7�#&�7��8 '�� #�� #

�� # R1 �

% 1 #�&�7�� %$#$�-0

��

9 (�7�#&�7�� - �� #

�� &&��# �-

9 � �& � � &�7�� #��

�& 1 �0

�� & 3 2

9 B�8��# ��8 &�%��

� 1 �A �"#-0

� 1 ��5#0

��&�

9 >�� 8 &�%��

� 1 2A �"# M �-0

� 1 ��5 �&"#-0

��

��&�

9 � �& � � �� #

>� 1 � �-0

�& 1 �5 � �-M >�-0

� 1 2A �"# M �-0

� 1 ��5 �&"#- 4 >�0

��

9 C��&�#�� #7�� !K��

&'��

��&� L&�� L

�� P 11 �

P 1 ��0

��

�� 1 S ��- �&�� $�-�" ��#7&�� $%$�� $P-0

��&� L&7��L

&'�� P

��&� 2

�� 1 ��#7�� %$�$L��#�&� L$�$&�� -0

#��#�

108

B.2. INTERPOLANT

��&� �

�� 1 ��#7�� %$�$L��#L$�$&�� -0

#��#�

��

77 1 &7�7� P$�$%-0

�� 1 S ��- �&�� $�-�" &7�� 77 $��-0

�� #'�&�

77 1 ��#7� �$%$�� $L77L-0

�� 1 S ��- �&�� $�-�" 77�� 77 $��-0

��

�� 1 �&�� $!-

� 1 � O ! �-- �" � 3 ! �� ---0

��

�� 1 ��#7&�� $%$�� $P-

� 1 ��8� ��-0

�� 1 � �-M� �-0

� 1 )�#�& �$�-0

��# � 1 �A�

�� 1 �� - M ��L-�5��0

�� 1 �� !& ��- 3 P5�-0

� �- 1 % ��- " &�� --0

��

��

109

B. MATLAB FUNCTIONS

B.3 timewarp

The function �� uses an interpolant function obtained by �� (see appendix B.2)

and returns time-warped blocks for B time instances tb and linear chirp rates αb. The block size is

specified byM ; an array is returned of size B ×M .

�� T 1 ��'�#7 �� $ �& $ �$ �$ ��7 �-

9>��@:IB� B��#� �� M'�#7�� !��/& �� &�8��

9 T 1 >��@:IB� ,Q* $,( $�$>$IG�+I- #��#�& �� M'�#7�� !��/& �#� � �

9 ��#7�� ,Q*� ,( �& � � &�7�� #�� $ � � � !��/ &�)� ��

9 � � '�#7�� !��/& $ > � ��# '�� # ��&��& �� IG�+I � �

9 ��##�&7��8 ��# � �#7 #��&� > �� IG�+I �� # !� &��#&

9 �# ��#& �� 8� D� T �& �� D � � �##�% '�� M '�#7�� !��/&�

9

9 (�� &� A �*>@B�<GI*> $ :IB�@;>��DB@ �

9

9 :#�� !%A ��#�� # (��K& $ �2��

�#%

� 1 � A-0

��7 � 1 ��7 � A-0

9 *�!�# �� !��/&

D 1 �� E �� -$�� 7 � -F-0

� 1 ��& D $�- �" �0

��7 � 1 ��& D$�- �" ��7 �0

�� 9��/

�##�# L� �� 7 � & �� &�� 8� �L-

��

9 C��7 ��7 � �� &�77�#� ��

��7 �� 1 �5 �"�5�&-0

�� !& ��7 �-- O ��7 �� 0

��7 � �!& ��7 �- O ��7 �� - 1 ��7 �� 0

�7#�� L:IB*�*=A ��7 � �& ��77�� 9��U�L$��7 �� -

��

9 G�� #

�! 1 2A � M�-- M �5�-5 �&0

9 :�#7 ��

'�#7�� 1 S �- M��5� 4 &P#� � 4 ��"��" �!-�5�0

9 �#� M�� M'�#7�� !��/&

T 1 )�#�& D$�-0

��# !� 1 �AD

�� !& ��7 � !�-- 3 2�22�0 9 ;� �� '�#7

110

B.3. TIMEWARP

��'�#7 1 �!0

��&�

��'�#7 1 '�#7�� 7 � !� --0

��

9 ( �� !��/ ��

��'�#7 1 ��'�#7 4 � !�-0

9 ��#7��

� 1 �� '�#7 -0

9 *�#��)�� # ��#��%

9 � 1 �!& � 4 ��7 � !� -�"�! --�N M�5�- �" �0

9 (��#� �� #��

T !� $A- 1 �0

��

��

111

B. MATLAB FUNCTIONS

B.4 warpedtimbre

The following function computes the warped timbre (see section 4.3) for time points tb, centre

frequencies fb and chirp rates αb using an interpolant function obtained from ��. The

harmonics are specified by h, the window size isM .

�� 1 '�#7��!#� �� $�& $�$�$��7 � $ $�$'��'�%7� -

9:IB�@;>��DB@ ;��#��& � � �� M'�#7�� !#� �� &�8��

9 C 1 :IB�@;>��DB@ ,Q* $,( $>$,$IG�+I $+$�$ :;:>?�@- #��#�& � � '�#7��

9 ��!#� �� &�8�� $ ��#��&�� 7 �&� ��

9 7�#�� > � ��'��8 �#8��& �&� !� &7�� A

9

9 ,Q* ��#7�� !�� !% �*>@B�<GI*>

9 ,( (�7�� #�� +�#�)

9 > C��# �� # ��# �� # �� 7��&

9 , ,#�P��% �# ��# �� #�P��&

9 IG�+I C �#7 #�� # ��# �� #7 #��&

9 + ��!�# �� #��&

9 � :��' &�)�

9 :;:>?�@ :��' �%7�

9

9 >$,$ �� IG�+I �� !� �� # ��#& �# &��#& �� 8� D� > �

9 #��#�� !#� C �& &�)� + � D�

9

9 (�� &� A �*>@B�<GI*> $ >��@:IB� �

9

9 :#�� !%A ��#�� # (��K& $ �2��

9 H�� 7��

��#�!��& �� $.L�� L6$.L&��#L6-

��#�!��& �& $.L��#�� L6$.L7�&�� L$L#�� L$L��8�# L$L&��#L6-

��#�!��& �$.L��#�� L6$.L��8�� L$L#�� L$L��# L6-

��#�!��& �$.L��#�� L6$.L��8�� L$L#�� L$L��# L6-

��#�!��& ��7 � $.L��#�� L6$.L��8�� L$L#�� L$L��#L6-

��#�!��& $.L��#�� L6$.L7�&�� L$L#�� L$L��8�# L$L��#L6-

��#�!��& �$.L��#�� L6$.L7�&�� L$L#�� L$L��8�# L$L&��#L6-

�� #8�� 3 �

'��'�%7� 1 L=��&&�� L0

��

� 1 � A-0

� 1 � A-0

��7 � 1 ��7 � A-0

9 *�!�# �� !#� 7��&

D 1 �� E �� -$�� -$�� 7 � -F-0

� 1 ��& D$�- �" �0

� 1 ��& D$�- �" �0

112

B.4. WARPEDTIMBRE

��7 � 1 ��& D$�- �" ��7 �0

9 *�!�# �� #��&

�� &&��# -

+ 1 0

1 �A 0

��&�

+ 1 �� -0

��

9 �#� M ��

� 1 ��7�� )�#�& +$D-$2-0

9 ;�� '��'

' 1 ��'��' '��'�%7� $�-0

9 <!�� '�#7�� M!��/&

T 1 ��'�#7 �� $ �& $ �$ �$ ��7 �-0

9 B��

��# !� 1 �AD

9 ��#��# ��7��

� A$!�- 1 �� '�"T !� $A-$�& $� !�-$ -0

��# � 1 �A+

�� - 11 �

�� 1 � �$!�- �5 �!& � �$!�--0 9 C�7�� 8��

��

9 *�#��)� 7 �&� ��

� � $!�- 1 � � $!�- 5 ��N �--0

��

��

��

113

B. MATLAB FUNCTIONS

B.5 vkf

This function implements the Vold-Kalman filter as discussed in chapter 8. The syntax is:

� � ��

!��" � ��

The first command returns � orders as a single signal. The second command returns � complex

amplitudes and phasors, that may be combined to the orders by � � #$��%$��.

�� #�#8�� 1 �/� %$�& $�$��#�#8�� -

9HV, �� =��#�� H�� MV�� <#��# ,��#��8 �

9 � 1 HV, %$�& $�- ��#��& � � �#��# '�� #�P��% ��# � �#� &�8��

9 % '�� &�7��#�� & $ �&��8 � �M7�� # '�� !��'�� 9 ��

9 � � &�7��#�� > � ��7�� & � '��# �

9

9 � 1 �/� %$�& $�$V$7$D$L� �� L- ��#��& �#� &�8�� % � � �#��

9 �#��#& �� # V ��##�&7��8 �� #�P��% ��

9 ��# �$ �&��8 � 7M7�� # �$ �$ � �# �- �� &��%

9 !��'�� # D �� +�#�)� D �� !� � &��8�� $ � E&��#� ��F

9 ��# �# � ��# '�� &�� 8� �& %� ��

9 �#8�� L� �� L �& 8�� $ � C ��&/% ��#�)�� '�� !� 7�#��#�� $

9 ' �� 8 � &�� 7�� # ��#8� V ��#&�

9

9 E�$�F 1 �/� %$�& $��- #��#�& � � ��7�� 7�� 7 �&�# $ &��

9 � �� #��# '��#& �� !� #��&�#�� !% � 1 �" #�� "�-�

9

9 E�$�$#F 1 �/� %$�& $��- ��7��& �� # # '�� &��

9 &��% ��& $ ��#�� #� !��'�� # D�

9

9 :#�� !%A ��#�� # (��K& $ �2��

��

&'�� #8��

��&� �

V 1 �0

7 1 �0

D 1 �& 5�220

�� 1 L��#��L0

��&� �

V 1 ��#�#8�� .�60

7 1 �0

D 1 �& 5�220

�� 1 L��#��L0

��&� �

V 1 ��#�#8�� .�60

7 1 ��#�#8�� .�60

D 1 �& 5�220

�� 1 L��#��L0

��&� �

V 1 ��#�#8�� .�60

114

B.5. VKF

7 1 ��#�#8�� .�60

D 1 ��#�#8�� .�60

�� 1 L��#�� L0

��&� �

V 1 ��#�#8�� .�60

7 1 ��#�#8�� .�60

D 1 ��#�#8�� .�60

�� 1 ��#�#8�� .�60

�� #'�&�

�##�# 8��#��&8�� LI#8C /L-$ ��

L:#��8 ��!�# �� 7�� #8��& � Q&�A �/� %$�& $�$V$7$D$�� -�L-0

��

% 1 % A-0

* 1 ��8� %-0

�� &&��# �-

� 1 ��" ��& *$�-0

��&�� - 11 *

� 1 � A-0

��&�

�##�# 8��#��&8�� LI#8C /L-$��

LH��# � & �� $ � �# 9� ��& �L$*-

��

D 1 D A-0

V 1 V A-� L0

��&7 L MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMML-

��&7 EL �� =��#�� H�� MV�� <#��# ,��# L ��&�# 7- LM7�� LF-

��&7 L MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMML-

�#�'��'

&'�� 7

��&� �

I 1 &7��8& ��& * M� $�-"E� M�F$2A�$*M�$*-0

# 1 &P#� &P#� �- M�-�5 � M �"��& 7�5�&�"D---0

#�� 1 &P#� �5 �" �7& --0

��&� �

I 1 &7��8& ��& * M� $�-"E� M� �F$2A�$ *M�$*-0

# 1 &P#� &P#� �- M�-�5 � M �"��& 7�5�&�"D- 4 �"��& �"7�5�& �"D---0

#�� 1 &P#� �5 �" �7& --0

��&� �

I 1 &7��8& ��& * M� $�-"E� M� � M�F$2A�$ *M�$*-0

# 1 &P#� &P#� �- M�-�5 �2 M �2" ��& 7�5�&�"D- ��

4 ��" ��& �" 7�5�&�"D- M �" ��& �" 7�5�&�"D---0

#�� 1 &P#� �5 �2" �7& --0

��&� �

I 1 &7��8& ��& * M� $�-"E� M� � M� �F$2A�$ *M�$*-0

# 1 &P#� &P#� �- M�-�5 �2 M ��" ��& 7�5�&�"D- ��

115

B. MATLAB FUNCTIONS

4 ��" ��& �"7�5�&�"D- M ��" ��& �"7�5�&�"D- 4 �" ��& �" 7�5�&�"D---0

#�� 1 &P#� �5 �2" �7& --0

�� #'�&�

�##�# 8��#��&8�� LI#8C /L-$��

L*�!�# �� 7��& �&� !� �$ �$ � �# �� L-

��

9 C ��/ ��# !�� 8

�� #- O #��

�##�# 8��#��&8�� LD��C�� L-$��

LD�� DM��#�� #8� #M��& 9�� 8-�L$�� #--

��&�� #- O #�� "&P#� 2��-

'�#��8 8��#��&8�� LD��C�� L-$��

LB�&��& �% !� ��#�� #8� #M��& 9�� 8-�L$�� #--

��&�� R�&#�� #-

�##�# 8��#��&8�� LD��C�� L-$��

LD�� DM��#�� 7�� #M��&�L-

��

�� &�)� #$�- 11 �0

D 1 IL"I-�" # �-�"# �-- 4 &7�%� *-0

��&�� &�)� # $�- 11 �0

B 1 &7��8& ��&7�� # �-$# �-$*-�L$2$*$*-0

D 1 B" IL"I-"B 4 &7�%� *--0

��&�� &�)� # $�- 11 *0

B 1 &7��8& #$2$*$*-0

D 1 B" IL"I-"B 4 &7�%� *--0

��&�

�##�# ELH��# # �&� !� �� 8� �$ � �# L ��&�# *- L�LF-

��

��# I B

�� &�#�7� �� $L� �� L-0

D� 1 � �� D-0

��# D

��&7 L M DM��#�� C ��&/% ��#�)�� L-

��

�� #8�� 11 � 9 <��7�� &��8�� '��#

'�� 1 )�#�& �$*$L��!�� L-0

��# / 1 V

��&7 EL M �#��# L ��&�# /- L5L ��&�# ��8� V--F-

� 1 ��7 M��"7�"��&� /�"�-�5�& -0

�� &�#�7� �� $L� �� L-

� 1 D�U D� �LU ��"%--0

��&�

� 1 DU ��"%-0

��

'�� 1 '�� 4 �" #�� " ��K �---� L0

��

116

B.5. VKF

��#�#8�� .�6 1 '�� 0

��&�� #8�� O1 �- 9 <��7�� 7�� 7��& �� 7 �&�#&

� 1 )�#�& *$�� V-$L��!��L-0

� 1 )�#�& *$�� V-$L��!��L-0

��# / 1 V

��&7 EL M �#��# L ��&�# /- L5L ��&�# ��8� V--F-

� A$/- 1 ��7 M��"7�"��&� /�"�-�5 �&-0

�� &�#�7� �� $L� �� L-

� A$/- 1 D�U D��LU � A$/-�"%--0

��&�

� A$/- 1 DU � A$/-�"%-0

��

��

��#�#8�� .�6 1 ��L0

��#�#8�� .�6 1 ��K �-�L0

��

�� #8�� 11 �

��#�#8�� .�6 1 #�L0

��

��&7 EL " ,��& �� L ��&�# ��- L &��&�LF-

��

117

Bibliography

[1] Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions With Formulas,

Graphs, and Mathematical Tables. Applied Mathematics Series 55. National Bureau of Standards,

tenth printing edition, 1972.

[2] Jason R. Blough. Improving the analysis of operation data on rotating automotive components. PhD

thesis, University of Cincinnati, Department of Mechanical, Industrial and Nuclear Engineering

of the College of Engineering, 1998.

[3] Boualem Boashash and Senior Member. Estimating and interpreting the instantaneous frequency

of a signal. In Proceedings of the IEEE, pages 520–538, 1992.

[4] Judith C. Brown. Calculation of a constant Q spectral transform. The Journal of the Acoustical

Society of America, 89(1):425–434, 1991.

[5] Pablo Cancela, Ernesto López, and Martín Rocamora. Fan chirp transform for music

representation. In Proceedings of the 13th Conference on Digital Audio Effects (DAFX-10), pages

1–8, 2010.

[6] James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex

fourier series. Mathematics of Computation, 19:297–301, 1965.

[7] Patricio De La Cuadra and Aaron Master. Efficient pitch detection techniques for interactive

music. In Proceedings of the 2001 International Computer Music Conference, La Habana, pages

403–406, 2001.

[8] Alain de Cheveigné and Hideki Kawahara. Yin, a fundamental frequency estimator for speech

and music. Journal of the Acoustical Society of America, 111(4):1917–1930, 2002.

[9] Robert B. Dunn, Thomas F. Quatieri, and Nicolas Malyska. Sinewave parameter estimation using

the fast fan-chirp transform. In IEEE Workshop on Applications of Signal Processing to Audio and

Acoustics (WASPAA ’09), pages 349–352, 2009.

[10] Christian Feldbauer and Robert Höldrich. Realization of a Vold-Kalman tracking filter - a least

squares problem. In Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00),

Verona, Italy, December 2000.

[11] David Gerhard. Pitch extraction and fundamental frequency: History and current techniques.

Technical report, Dept. of Computer Science, University of Regina, 2003.

118

BIBLIOGRAPHY

[12] Fredric J. Harris. On the use of windows for harmonic analysis with the discrete Fourier

transform. Proceedings of the IEEE, 66(1):51–83, June 1978.

[13] Edward W. Kamen and Bonnie S. Heck. Fundamentals of Signals and Systems using the Web and

MATLAB. Prentice Hall, Upper Saddle River, New Jersey 07458, third edition edition, 2007.

[14] Alan V. Oppenheim and Ronald W. Schafer. Discrete-Time Signal Processing. Prentice Hall, Upper

Saddle River, New Jersey 07458, 1989.

[15] P.V. Sankar and L.A. Ferrari. Simple algorithms and architectures for B-spline interpolation. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 10(2):271–276, 1988.

[16] Phil Schniter. Time-frequency uncertainty principle. Connexions web site, accessed May 21

2011. ��&''��%��('��'�)*+),'#%)�'.

[17] Xavier Serra. Musical sound modeling with sinusoids plus noise. In Musical Signal Processing,

pages 497–510. Swets & Zeitlinger Publishers, 1997.

[18] Julius Orion Smith. Spectral audio signal processing, october 2008 draft. Stanford On-line book,

accessed May 19 2011. ��&''��%��%��'-.��'��'.

[19] Jiri Tuma. Setting the passband width in the Vold-Kalman order tracking filter. In Proceedings of

the International Congress on Sound and Vibration (ICSV12), Lisbon, Portugal, 2005.

[20] Håvard Vold and Jan Leuridan. High resolution order tracking at extreme slew rates, using

Kalman tracking filters. Technical Report 931288, Society of Automotive Engineers, 1993.

[21] Luis Weruaga and Márian Képesi. The fan-chirp transform for non-stationary harmonic signals.

Signal Processing, 87(6):1504 – 1522, 2007.

119

http://cnx.org/content/m10416/2.18/

http://ccrma.stanford.edu/~jos/sasp/

Index

aliasing, 27, 33

bandwidth, 64

effective, 64

basis vector, 26

orthogonality, 26

transformation, 26

chirp

rate, 50

wave, 50

complex conjugate, 20, 34

complex exponential, 20

discontinuity, 34, 35

dynamic range, 18

Euler’s formula, 20

fan chirp transform, 77

block chirp rate, 79

short-time, 78

Fourier, 25, 29

analysis, 25

series, 29

transform, 32

continuous-time, 32

discrete, 33

discrete-time, 32

fast, 35

short-time, 71

frequency, 19

base, 21

constant, 20

fundamental, 22

instantaneous, 20

frequency domain, 25

harmonic, 21

partials, 22

harmonics, 30

inharmonicity, 62

interpolation, 54

approaches, 54

cubic spline, 54

linear, 54

nearest neighbour, 54

sinc, 54

spline, 54

basis functions, 55

coefficients, 55

Kronecker delta, 21

localisation, 26, 44, 75

noise, 19

orthogonality, 21, 32

Parseval’s theorem, 35

period, 19

periodicity, 19

phase, 19

partials, 22

shift, 20

phasor, 91

pitch, 85

120

INDEX

detection, 86

salience, 87

tracking, 85, 86

Plancherel theorem, 35

quantisation, 18

resampling, 54

resolution, 18

spectral, 64, 73

temporal, 73

roll-off, 38

salience, 87

tracking, 88

sample rate, 17

sampling, 17

aliasing, 27

theorem, 27, 30, 54

short-time

blocks, 71

Fourier transform, 71

overlap, 75

resolution, 73

windowing, 75

zero-padding, 75

signal

analogue, 17

blocks, 71

continuous-time, 17

digital, 18

discrete-time, 17

modelling, 21

monophonic, 23

non-stationary, 20

periodic, 19

polyphonic, 23

stationary, 20

tonal, 22

sinusoid, 20

complex exponential, 20

orthogonality, 21

spectral

leakage, 34, 36

localisation, 43

symmetry, 34

width, 44

temporal

localisation, 43

width, 44

timbre, 59

complex normalisation, 60

definition, 59

instantaneous, 61

normalised, 60

stationary, 61

tracking, 88

warped, 67

time domain, 17

signal, 17

time warping, 49

inverse, 51

inverse function, 51

linear, 49

non-linear, 53

time-frequency product, 44

tonal component, 22

non-stationary, 22

stationary, 22

uncertainty, 43

Vold-Kalman, 91

data equation, 91

filter, 91

least squares problem, 92

structural equation, 92

window, 35, 75

bandwidth, 38

coherent gain, 37

cosine, 41

cosine-sigma, 41

equivalent noise bandwidth, 38

Gaussian, 40

Hanning, 39

main lobe width, 38

noise power, 38

properties, 37

rectangular, 39

side lobe level, 38

side lobe roll-off, 38

zero-padding, 75

121

Fan Chirp Transform

Documents