Introduction to Ambisonics - Rev. 2015

Introduction to Ambisonics – Francesca Ortolani – Rev. 2015a

1

Introduction to Ambisonics

A tutorial for beginners in 3D audio

Francesca Ortolani [email protected]

Ironbridge Electronics

(Excerpt from the technical papers written during the development of Ambisonic Auralizer)

1.1 Introduction to Surround and 3D audio techniques

During the development of techniques and technologies in the audio world, engineers have tried since the early years of the twentieth century to reproduce recorded sources or live takes in a realistic manner, with the aim of giving spaciousness to sound creations.

Research is divided into several branches. Even today most of the sound reproduction systems sold, either consumer or professional, are based on two audio channels. The main reason for that is the high cost of amplifiers, signal processors and speakers, which often limited to 2 the number of channels for musical productions, television, radio, etc.

In cinemas and theaters the widespread use of multi-channel systems started earlier than at home, since it is easier to sell sound systems of medium-to-low quality with relatively low prices for home-theatre applications.

However, only a few commercial post production studios are suitable for multi-channel mixing. The vast majority of post-production control rooms are equipped with 2 (at most 3) speakers for stereo playback and a subwoofer.

Multi-channel audio spread especially over film, theater and video games/virtual reality, whereas the 98% of music production is still stereophonic.

This is not only due to a problem of costs, but also to the diffusion of the audio formats on which the CD (2-ch, 44100 Hz, 16-bit), and earlier tapes and vinyl, prevailed.

Engineers also tried with a very little success to carry the 4-channel information on stereo devices. Among these attempts a remarkable solution was the use of subcarriers on vinyl records. Despite the introduction of new formats such as DVD-Audio and Super Audio CD (SACD) or multichannel wave files (Wave Ex), music is still stereo almost in its entirety.

Another issue that discouraged sound engineers from working on multichannel mixes has been the need to keep compatibility of the finalized audio when the number of the channels is scaled down. For example, it is a good practice to check how a stereo mix sounds when its channels are summed to mono. The compatibility with monaural listening is a crucial problem and it should not be underestimated. Because of the sum of the left and right channels, you may have phase cancellations that affect countless hours of fine-tuning during the mixing.


2

Initially, for example, sound engineers were asked to preserve the sound quality down-mixing from CD (stereo) to TV or radio (in the past these devices were only mono). However, the problem still exists today in the case of live music in which, because of the size of venues and stadiums, most listeners do not benefit from stereophonic listening.

Obviously, passing from surround to stereophonic or monophonic systems is even more critical.

Over the past few decades, since the second half of the twentieth century, sound artists tried to give more and more spatial dimension to sound and their own artistic creations. This has led to the development of techniques aimed at rendering 3D sound, which would have been alternative to classic surround - according to its several standards imposed upon the market by Dolby - and stereo techniques.

These include binaural audio, Wave Field Synthesis, OPSI and Ambisonics.

In particular Ambisonics can coexist with stereo and surround sound systems such as 5.1 or 7.1, etc.. This 3D audio technique was introduced in the 70s by the team led by Gerzon, Fellgett and Barton supported by the National Research and Development Council and the British Technology Group. It is compatible with a wide variety of speaker array configurations (either regular-symmetrical or irregular-asymmetrical with various shapes). In-depth explanation of the physical principles on which Ambisonics is founded are given later on; for the moment it is useful to know that this technique is in part an extension of the basic principles of Mid-Side miking technique [1] patented by Blumlein. This technique uses sum and difference signals between a microphone (from the family of cardioids) with its axis pointing at 0° (MID) and a figure-of-8 microphone (SIDE) with its axis rotated by 90° with respect to the mid. Figure 1.2 shows an example of Mid-Side configuration.

Figure 1.1 In order to have a correct perception of the stereophonic sound, the speaker pair and the listener must be located at the vertices of an equilateral triangle.


3

Ambisonics should not be confused with "traditional" surround. First of all, Ambisonics allows including information relative to the height (classic surround techniques are 2D instead). The principles of acoustics on which this technique is based will be explained in detail in the next section. Furthermore, for example, considering a classic quadraphonic system, while the phase difference between the signals received at the front speakers is processed quite effectively by the auditory system (at least for low frequencies), this is not the case for the rear speaker pair, so classic surround systems, quadraphonic or larger systems, do not allow a good source localization. This is due to the fact that sources in classic surround are recorded according to "discrete" channels, that is, independent on each other, and the differences in level between channel pairs are used [2], [3].

Hence the layout of the loudspeakers relative to the listener becomes crucial: you can experience it even in a simple stereo system where, if the listener and the speakers are not perfectly placed at the vertices of an equilateral triangle, the exact source localization is lost. Testing assessed that the quality of the ghost images1 between speaker pairs is poor if these are spaced by an angle greater than 60 degrees (i.e. the equilateral triangle mentioned above).

In quadraphonic sound, for example, speakers are spaced by 90 degrees causing a feel of "hole in the middle". A homogeneous sound reproduction system is defined as a system in which no direction is treated with any particular preference. Typical cinema surround systems are not homogeneous, in fact the sound coming from the front stage (screen) is usually controlled more accurately than the rear channels, since a solid match between sound and image is searched with the objective not to distract the audience. We can say, however, that surround systems are coherent, within certain limits, in the sense that the sound image remains stable, that is, not subject to significant discontinuities, if the listener changes position [4]. The consistency of the front image is guaranteed by the presence of sounds uncorrelated from the rest of the system. This can be achieved, for example, by delaying and spreading the signals sent to the surround system. What we ideally want is that the reproduced sound field (recorded or synthesized) is homogeneous and consistent (coherent) at the same time.

1 A ghost image is a sound source apparently coming from the middle of the stereo soundscape between 2 speakers.

Figure 1.3 Example of Mid-Side configuration - polar diagram (MID: cardioid microphone, SIDE: figure-of- 8 microphone). LEFT = (MID+SIDE)/2, RIGHT = (MID-SIDE)/2 [14]

Figure 1.3 Quadraphonic system


4

In Ambisonics, on the other hand, the signals sent to the speakers contain information from each microphone capsule used in the recording with different relations resulting from a decoding matrix. The effect of spatialization here is much more robust than in traditional surround techniques, in the way that the sweet spot, i.e. the optimum listening position, is wider. Ambisonics is not limited to a precise number of speakers: the higher the number the better the directional resolution you can get. The reason for that will be explained next, by introducing the concept of order in Ambisonics.

1.2 The Physics in Ambisonics

A comparison with other techniques

In sound field description, source characterization is one of the most important jobs of Auralization. Auralization involves creating audio files from simulated, measured or synthesized numerical data [5].

For example, it is possible to represent multipole or extended sources by summing a certain number of monopoles2, i.e. point sources whose dimensions are much smaller than the wavelength of the incident sound wave, or integrating over a distribution of monopoles or infinitesimally small surface elements. Each source contributes in terms of sound pressure to the acoustic field.

According to source distribution, a specific spatial radiation pattern is created depending on the position and the distance of the sources. In other words, this is expressed by Huygens’ principle saying a wavefront can be considered as a secondary distribution of sources. For example, the 3D audio technique Wave Field Synthesis is based on this and works as the acoustic equivalent of holography.

In practice, the sound field can be considered emitted by the original source or by a secondary source belonging to the wavefront. In mathematical terms, this is equivalent to saying that we can

obtain the sound pressure on the area A, knowing the sound pressure 0p and its gradient 0p∇

on the

boundary of A, by calculating the Kirchhoff-Helmholtz integral:

( ) ( ) 00 01 ,

4

jkR

A

pR ep r p n n jkR dS r A

R R Rπ

−

∂

= ∇ ⋅ − ⋅ + ∀ ∈

∫∫

(1.1)

where k is the wave-number and R

is the vector connecting the source with the listening point [6].

A detailed analysis of integral (1.1) shows how each secondary source is composed of a monopole (relative to the pressure gradient signal) and a dipole (relative to the pressure signal). However, there are slight conceptual differences in the formulations by Kirchhoff-Helmholtz and Huygens. The former is more general. The shape of the boundary does not depend on the wavefront, in addition the Kirchhoff-Helmholtz integral itself carries information relating to both amplitude and phase of the acoustic signal, whereas in Huygens’ principle it is assumed that the secondary sources are located on equiphase surfaces. In practice, we can conclude that the Kirchhoff-Helmholtz integral generalizes Huygens’ principle.

2 Monopole: point source that can be represented as a pulsating sphere with infinitesimal radius. For such

sources, emitted acoustic waves are function of the radial distance r from the source only. Dipole: sound source composed of 2 equal monopoles having opposite faces (rotated by 180°). The sound

field produced by a dipole has directional characteristics.


5

In practice, Kirchhoff-Helmholtz integral is used as represented by Figure 1.4 (Wave Field Synthesis):

The listening area is surrounded by pairs of transducers composed of a pressure microphone and a velocity/pressure gradient microphone. In section 1.8 some basics on these types of microphones are given. So, the recorded field is due to the sources external to the microphone array.

Then, the Kirchhoff-Helmholtz integral can be interpreted considering that each secondary source can be split into two elemental sources:

DIPOLE SOURCE: fed by a pressure signal 0p

MONOPOLE SOURCE: fed by a pressure gradient signal 0p∇

During playback a specular action is operated: arrange speakers having physical characteristics as shown in Figure 1.4 in the place of the microphones, that is, replace the pressure mics with acoustic dipole speakers (these speakers radiate both forwards and backwards) and replace the pressure gradient mics with monopole speakers (closed speakers radiating only forwards thus having a directional characteristic). The geometrical layout of the microphone array and the speaker array has to be the same. Each speaker is fed with the signal picked up by the respective microphone.

Similarly we can surround the source instead of the listener with the microphone array [6].

Such a system guarantees (ideally) the exact reproduction of the field within the listening area and, if the array of transducers is continuous (not possible in reality), there is no need of processing to reconstruct the field, that is, it is sufficient to feed each speaker with the respective microphone signal. What happens in reality, where a continuous array (of good quality, if possible) covering the whole surface is not available, is that the acoustic signal incident on the array has too a short wavelength compared with the distance between two transducers and it is not feasible to sample it correctly.

Figure 1.4 Application of the Kirchhoff-Helmholtz integral in holophony/WFS


6

Possibly, in that case, we encounter spatial aliasing. As in the time domain, spatial aliasing occurs when the signal is sampled in space taking an insufficient amount of points. Aliasing is revealed by the appearance of fictitious sources. The maximum frequency above which spatial aliasing occurs is calculated as (Nyquist Theorem):

max 2 trans

cf

d= (1.2)

where transd is the distance between two transducers and c is the speed of sound.

A signal of frequency above maxf produces a time difference of arrival at two transducers greater than

the signal period, while a signal of frequency below maxf is such that, being the time difference in the

range of the signal period, the phase difference at the transducers allows an unambiguous time difference evaluation.

Basically, some simplifications on the Kirchhoff-Helmholtz integral and its use are operated. We try to minimize the number of transducers to represent the most important secondary sources and we try not to use both the monopole and dipole transducers: what we normally do is to send signals recorded with cardioid or figure-of-8 microphones to monopole speakers. Note that the superposition of a monopole and a dipole gives as a result a cardioid polar characteristic. Finally, with the aim of limiting the number of spaced microphones used, it is preferred to build "virtual microphones" through the processing of the recorded signals by weighting the amplitudes and delays in the time of arrival appropriately, in order to improve the resolution of the system.

A constraint to the techniques based on the theory of the Kirchhoff-Helmholtz integral (with any simplistic changes) is the fact that within the listening area, surrounded by the speaker array, primary sources should be absent, i.e. the array is able to reproduce only the external sources. This is a false problem as it is possible, however, to reverse the phase of the signals feeding the array relatively to the secondary sources and reproduce, in this way, the internal sources, too. Therefore, we can create a concave wave front instead of a convex one [7].

Another way to describe a sound field, especially in the case of sources with spherical symmetry, is based on the decomposition of the sound field into spherical harmonics. Ambisonics is founded on this second descriptive approach. Spherical harmonics are also used in issues concerning quantum mechanics, gravitational fields and can be found in 3D graphics applications and lighting engineering.

The starting point is to express the acoustic wave equation in spherical coordinates ( ), ,r θ ϕ ,

where r is the radius, θ is the azimuth and ϕ is the elevation.

The acoustic wave equation in the time domain is:

( ) ( )22

2 2

, , ,1, , , 0

p r tp r t

c t

θ ϕθ ϕ

∂∇ − =

∂ (1.3)

where c is the speed of sound.


7

The acoustic pressure field, due to external sources, can be developed into Fourier-Bessel series,

whose terms are weighted products of the directional functions ( ),mnYσ θ ϕ - spherical harmonics –

with the radial functions ( )mj kr - spherical Bessel’s functions of the first kind:

( ) ( ) ( ) ( )0 0 , 1

2 1 ,mm mn mn

m n m

p r m j j kr B Yσ σ

σθ ϕ

∞

= ≤ ≤ =±

= +∑ ∑

(1.4)

with m = degree and n = order (the meaning of σ is the spin and it will be obvious looking at the

pictures further on), k is the wavenumber 2 f

kc

π =

. Equation (1.4) represents the solution of the

Wave Equation in the special case of plane wave.

As shown later, the ambisonic signals in the transform domain [7] are represented by the coefficients

mnBσ and behave like Fourier coefficients in a Fourier series.

Note that, unlike WFS or Holophony, the sampling and the reconstruction of the sound field in

Ambisonics are executed pointwise, rather than on an area. It follows that the number of channels needed to reconstruct the field will be much reduced compared to the other techniques mentioned

above. The information relative to sound direction is coded precisely into the coefficients mnBσ just

introduced. Ambisonics produces – in theory - a coherent and homogeneous reconstruction of the field for all frequencies and directions in the sweet spot, the optimal listening point. We will see that the area affected by a problem of incoherence gets smaller with increasing the order of Ambisonics. Similarly, there exists a frequency limit, beyond which the error exceeds a certain level that grows with the order. In other words, Ambisonics performs well in terms of coherence and homogeneity only in the sweet spot and only for low frequencies. A solution in order to counter this problem can be put in place using different types of decoding strategies according to the listener's position within the speaker array (see Chapter 3 in Ambisonic Auralizer technical papers).

Figure 1.5 Bessel spherical functions of the first kind.


8

Let's see the spherical harmonic functions in detail, analysing how ambisonic signals are obtained from these functions. Spherical harmonics are defined as:

( ) ( ) ( )( )0,

(ignore if 0)

cos if 1!, 2 1 2 sin

sin if 1 !mn n mnn

nm nY m P

nm nσ θ σ

θ ϕ δ ϕθ σ =

= +− = + − × = −+ (1.5)

where ( )mnP ξ are associated Legendre functions of degree m and order n, pqδ represents Kronecker

delta and is equal to 1 if p q= , else it’s equal to 0. The associated Legendre function is defined as:

( ) ( ) ( ) ( ) ( ) ( )2 2 22 21

1 1 12 !

mn nn m nm

mn mn m m n

d dP P

d m dξ ξ ξ ξ ξ

ξ ξ

+

+

−= − = − − (1.6)

where cosξ ϕ= .

In Ambisonics some kind of normalization of Legendre functions often takes place [8].

Schmidt Semi-Normalization is defined by:

( ) ( )( )

( )( )0,

0

! !2 1 2

! !

1 if 0

2 if 1

mn n n

n

m n m nN m e

m n m n

e n

e n

δ− −

= + − =+ +

= == ≥

(1.7)

The harmonic functions can be rewritten in Schmidt semi-normalized form (SN3D) by substituting (1.7) into (1.5):

( )(ignore if 0)

cos if 1, sin

sin if 1 mn mnn

nY P

nσ θ σ

θ ϕ ϕθ σ =

= += × = −ɶ (1.8)

The set of spherical harmonics forms an orthonormal basis in the sense of the spherical scalar product, that is:

( ) ( )4

1| , ,

4f g f g dπ θ ϕ θ ϕ

π= Ω∫∫ (1.9)


9

So, they can be linearly combined in order to define functions on the surface of a sphere.

In such a position, Equation (1.4) has to be arrested to a certain order M (because of manageability), also known as order of Ambisonics. Writing again for convenience:

( ) ( ) ( ) ( )0 0 , 1

2 1 ,mm mn mn

m n M

p r m j j kr B Yσ σ

σθ ϕ

∞

= ≤ ≤ =±

= +∑ ∑

(1.10)

We have seen that components mnBσ are tied tightly to the acoustic pressure field and its higher-order

derivatives about the origin.

In a vector form we have:

( )

( )

1 1 1 1 1 1 1 1 1 1 1 100 11 11 10 0 03

1 1 1 1 1 1 1 1 1 1 1 100 11 11 10 0 03

T

mm mm mn mn m MM MM MM D

T

mm mm mn mn m MM MM MM D

B B B B B B B B B B B B B B

Y Y Y Y Y Y Y Y Y Y Y Y Y

− − − −

− − − −

= =

=

⋯ ⋯ ⋯ ⋯ ⋯

⋯ ⋯ ⋯ ⋯ ⋯

(1.11)

(Note: m is increasing from 0 to M, n is decreasing).

So, components mnBσ are bound to the vectors of order m (overall: 2 1m+ components for order m),

returning a total amount of ( )21K M= + ambisonic CHANNELS.

Figure 1.6 High Order Ambisonics (up to 3rd order) - 3D view [© D. Courville]


10

The following example reveals how ambisonic signals can be achieved from coefficients mnBσ .

For the time being, we stop the Fourier-Bessel series at order 1M = obtaining the signals called W, X, Y, Z , which we will define better in the next sections:

( ) [ ]1 3

T

M DB WXYZ= =

100B W− = relative to PRESSURE signal

111B X+ =

111B Y− = relative to PRESSURE GRADIENT signals (or to acoustic velocity)

110B Z+ =

As can be immediately seen from the 3D illustration of the spherical harmonics (Figure 1.6), in order to achieve a higher directional resolution, the order of Ambisonics must increase.

Attention! So as to avoid confusion, it should be noted that order M of Ambisonics is different from order n defined in Legendre functions. We can rather say that it refers to the ambisonic order in terms of degree m in Legendre functions.

Ambisonics is not only a 3D audio technique. Sound field representation can be specialized in 2D environments. For this purpose, the sound field should be decomposed according to a system of cylindrical coordinates for a horizontal-only reproduction system.

One has:

( ) ( ) ( ) ( )

( ) ( ) ( )( ) ( )

1 1 100 0

1

1 1 1(2 ) 1 1(2 )00 0

1

, 2 cos 2 sin

,0 ,0

mn mn mm

D Dmn mn mn mn m

m

p r B J kr B m B m J kr

B J kr B Y B Y J kr

θ θ θ

θ θ

∞−

=

∞− −

=

= + + =

= + +

∑

∑

(1.12)

Also in this case we get an orthonormal basis as seen in the set of 3D equations. The functions denoted

by ( )mJ kr are Bessel’s functions of the first kind.

Formalism unification is achieved by saying that cylindrical harmonics (horizontal) are a subset of spherical harmonics and:

( ) ( ) ( )2

(2 ) (3 )2 !, ,

2 1 !

mD D

mn mn

mY Y

mσ σθ ϕ θ ϕ=

+ (1.13)

where 0ϕ = .

2D representation in cylindrical coordinates is very useful to understand how the sound field is decomposed and what the recorded signals actually mean.


11

The sound field represented by Equation (1.10) and unpacked in 2D form in Equation (1.12) can be re-written in the explicit form for a plane wave as (referring to Figure 1.7 and [9])

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( )

cos0

1

01 1

2 cos

2 cos cos 2 sin sin

jkr mm

m

m mm m

m m

p r P e P J kr P j J kr m

P J kr j J kr m m j J kr m m

ϕ θθ θ θ

θ

ϕ θ

ϕ θ ϕ θ

∞−

=

∞ ∞

= =

= = + − =

= + +

∑

∑ ∑

(1.14)

Equation (1.14) can be transformed into matrix form:

( ) Tp r Pθ= B h

(1.15)

where

1

00 1 2 cos 2 sin 2 cos 2 sin T

mnB B m mσ θ θ θ θ= = B … … … …

( ) ( ) ( ) ( ) ( )0 1 1

2 cos 2 sin 2 cos 2 sin T m m

m mJ kr j J kr j J kr j J kr m j J kr mϕ ϕ ϕ ϕ= h … …

The information about the spatial distribution of the plane wave is included in vector TB (which is dependent on the angle of incidence θ only). This means recording in Ambisonics consists of the

identification of the coefficients TB . The expression of these coefficients reveals that it is required to

have microphones with directivity patterns of the form cosmθ , sinmθ .

That’s why finding or building microphones fit for the purpose gets problematic, for 2m> , whereas,

for 0m= (omnidirectional microphone) and 1m= (bidirectional microphone) it is possible to find an extensive variety of microphones and capsules on the market.

Figure 1.7 Sound wave impinging on listening position [9]


12

On the other hand, the information about sound field variations relatively to the listening point is included in vector h . Source directivity changes with the emitted frequency (directivity increases with increasing frequency).

Furthermore, the use of microphones of the family of cardioids simplifies expressions (1.10) and (1.14) in a way that, separating spatial dependence from frequency dependence, one has:

( ) ( ) ( )0 0 , 1

m m mnm n m

p W B Yσ σ

σθ ω θ

∞

= ≤ ≤ =±

=∑ ∑ (1.16)

where ( )mW ω is the weighting factor:

( ) ( )( ) ( ) ( )1 'mm m MIC m MICW j j kr j j krω α α= − − (1.17)

Equation (1.17) highlights how the recording field is dependent on the frequency (in fact, k is the wave

number 2

kπλ

= , where λ is the wavelength c

fλ = , c is the speed of sound and f is the frequency).

Basically, when miking a source, besides considering source directivity, we must consider the microphone polar characteristic with respect to the frequency.

In light of the above, recorded ambisonic signals correspond to coefficients TB weighted by function

( )mW ω which depends on frequency. Equation (1.17) was obtained by weighting Equation (1.10) by

a cardioid characteristic function of the kind ( ) ( )1 cosG θ α α θ= + − . Remember that a cardioid

microphone is generated by the superimposition of an omnidirectional microphone (responsive to pressure) and a figure-of-8 microphone (responsive to pressure gradient and so the derivative of pressure).

In Chapter 3 of the technical papers we get into the details of encoding and decoding. It will be assumed that the listener is at a distance from the source such that the sound front can be approximated as plane.


13

1.3 Ambisonic Formats

As we have seen, in Ambisonics, sound directional components are encoded vectorially in a set of spherical harmonics. This paragraph shows how audio signals are recorded and processed in Ambisonics. Actually, Ambisonics is not limited to a particular number of channels: a greater number of channel provides a higher directional resolution.

In Ambisonics several formats exist for microphone recording, broadcasting and reproduction of recorded signals.

- A-Format: suitable for miking with specific microphone (e.g. Soundfield mic); - B-Format: suitable for miking and processing with studio equipment; - C-Format/UHJ: suitable for mono, stereo, 3-channel systems and broadcasting; - D-Format: suitable for decoding and playback through array of speakers; - G-Format: alike D, but decoder is not required;

A-Format

A-Format is achieved from the recording of four signals using a microphone equipped with four sub-cardioid capsules mounted on the faces of a tetrahedron and oriented as shown in Figure 1.8a:

The four signals picked up from the capsules correspond to the directions left-front (LF), right-front (RF), left-back (LB) and left-back (LB). For reasons of physical dimensions of the capsules, these will not be perfectly coincident. The same problem occurs when using other microphones in B-Format. In Chapter 3 of the technical papers, dedicated to ambisonic decoding, phase equalization of recorded signals will be discussed. Phase equalization is required in order to make the capsules coincident and represent the sound field in a way that the capsules are virtually placed exactly in the center of the tetrahedron.

Figure 1.8 Tetrahedral Soundfield mic for A-Format


14

A sub-cardioid capsule is characterized by a polar diagram of the type shown in Figure 1.8 and it is described by the equation:

( ) 0.7 0.3cosρ θ θ= +

where θ is the angle of incidence of the acoustic wave.

The summary table below includes microphones from the family of cardioids, their polar characteristics and the equations describing them:

POLAR DIAGRAM TYPE OF MICROPHONE EQUATION

Family of cardioids (general equation)

( ) ( )1 cosρ θ α α θ= + −

OMNIDIRECTIONAL ( ) 1ρ θ =

SUB-CARDIOID ( ) 0.7 0.3cosρ θ θ= +

CARDIOID ( ) 0.5 0.5cosρ θ θ= +

SUPERCARDIOID ( ) 0.37 0.63cosρ θ θ= +

HYPERCARDIOID ( ) 0.25 0.75cosρ θ θ= +

FIGURE-OF-8 ( ) cosρ θ θ=

Table 1.1 Polar diagrams and equations for the microphone from the family of cardioids

Figure 1.9 Polar diagram of a sub-cardioid capsule.


15

B-Format

B-Format consists of four signals called W, X, Y, Z. As already mentioned above, signal W is relative to the pressure component of the sound field in all directions, while X, Y, Z refer to the horizontal components of velocity on the horizontal plane (X, Y) and the vertical component (Z) of velocity. Microphone takes in B-Format are achieved using three figure-of-8 microphones for signals X, Y, Z and an omnidirectional microphone for signal W.

The axis pointing 0° in microphone X points the source (it is equivalent to MID in MS), microphone Y is rotated by 90° (the 0° axis points leftwards), with respect to X (it is equivalent to SIDE in MS). Microphone Z is oriented along the orthogonal plane with respect to the plane described by the axes X and Y (the 0° axis points upwards). Figure 1.10 shows the microphone layout just depicted in words:

However, once the B-Format signals are recorded, it is possible to rotate the array of microphones virtually through a rotation matrix.

The four B-Format polar patterns are obtained from Equation (1.8) and described as follows (in a normalized form: see Chapter 3):

Figure 1.10 W, X, Y, Z components in B-Format


16

2 cos cos

2 sin cos

2 sin

W S

X S

Y S

Z S

θ ϕθ ϕϕ

=

=

= =

(1.18)

where S is the recorded source [1]. The reader is referred to Chapter 3 for further explanations about

the factor 2 .

B-Format can be derived from A-Format through the following transformation:

( ) ( )( ) ( )( ) ( )( )

0.5

0.5

0.5

0.5

X LF LB RF RB

Y LF RB RF LB

Z LF LB RB RF

W LF LB RF RB

= − + −

= − − −

= − + −

= + + +

(1.19)

Signal W, being omnidirectional, is given by the sum of the contributions from the four capsules.

Moreover, the recorded signal W can be used to reinforce the lower frequencies, since other types of microphones do not perform so fatty in low frequency response, as omnidirectional microphones do.

Extensions of B-Format were introduced for high-definition TV: BF and BEF Formats, which include the additional channels E and F, redundant in content with respect to the channels W, X, Y, Z and used to bolster the stability of the front image and sharpen front /rear separation.

C-Format

Recordings in A and B-Format are not naturally compatible with mono and stereo systems. To ensure the portability of songs recorded in formats A and B into 2-channel media such as CD or radio and television broadcasting, a new format called C-Format or UHJ was introduced. The initials, UHJ, stand for three systems developed by the team that introduced this format, that is, U: Universal (from Nippon Columbia UD-4/UMX, quadraphonic system), H: Matrix H (quadraphonic system from BBC), J: System 45J (ambisonic system in use at that time).

C-Format is a hierarchical encoding/decoding system for ambisonic signals. Depending on the number of available channels, this system is capable of reproducing, with a certain degree of accuracy, the recorded sound field.

C-Format consists of 4 signals (L, R, T, Q) and, although it allows using up to 4 channels, it is typically used in the 2-channel UHJ format. Left signal L is compatible with a 2-channel system, R denotes the right signal, T is a third channel introduced for a more accurate horizontal decoding and Q contains the information relative to the height.

We define L RΣ = + (alike signal M in MID-SIDE) and L R∆ = − (alike signal S in MID-SIDE) and we use the following relations to change from B to C-Format:


17

( )( )

0.9397 0.1856

0.3420 0.5099 0.6555

0.1432 0.6512 0.7071

0.9772

W X

j W X Y

T j W X Y

Q Z

Σ = +∆ = − + +

= − + − =

(1.20)

( )( )

( )

0.982 0.197 0.828 0.768

0.419 0.828 0.768

0.187 0.796 0.676

1.023

W j T

X j T

Y j T

Z Q

= Σ + ∆ +

= Σ − ∆ +

= Σ + ∆ − =

(1.21)

where j denotes a 90-degree phase advance.

As it has been said, only signals L and R are exploited in stereo-compatible systems:

( )( )

0.5

0.5

L

R

= Σ + ∆

= Σ − ∆ (1.22)

The third channel, T, used in systems named “2 ½-channel systems” does not contain the whole audio band, but it is limited to 5 kHz. This third channel can be transmitted via radio, in quadrature phase modulation. The UHJ system including 2½ or 3 channels is theoretically as accurate as horizontal B-Format (WXY). It is possible to achieve the same accuracy of WXYZ B-Format by adding a fourth channel Q.

The Format using only channels L e R is called BHJ. There exist some modifications: THJ, including channel T and PHJ including channel Q.

UHJ-Format has also been successful in its stereo un-decoded version. Playback of L and R signals without the use of a decoder has the effect of a much wider stereophonic sound compared to the soundscape obtained from a pair of conventional stereo signals. This result was an outcome by chance, but actually had much success among listeners and it was given the name Super Stereo.

Finally, in order to achieve compatibility with monophonic systems, signals L and R are summed.

Table 1.2 summarizes UHJ hierarchical system:

Number of Channels

Decoder Capacity Typical applications

Signals Equivalent in B-Format

Original designation

4 Yes Full-sphere surround

DVD, HD disc, SACD

LRTQ WXYZ PHJ

3 Yes Full horizontal surround

DVD, HD disc, SACD

LRT WXY THJ

2 ½ Yes Full horizontal surround

FM Radio LR T (band-limited)

WXY SHJ

2 Yes Horizontal surround

CD, Stereo Radio

LR - BHJ


18

2-Ch systems 2 No Stereo CD, Stereo

Radio, 2-Ch systems

LR - -

1 No Mono Radio LR (summed)

- -

Table 1.2 Summary table of hierarchical system UHJ, C-Format [10].

D-Format

D-Format is the format that made Ambisonics compatible with common surround speaker systems, such as 5.1, 7.1, but also with arrays of different sizes and geometries (either regular or irregular geometries). Signals in D-Format can be derived from either B-Format or C-Format with the use of a decoder. The number of speakers is not limited in theory. The minimum requirements are, however, 4 speakers for adequate surround playback, 6 is better and full periphony (and therefore the information relative to the height) can be obtained through 8 speakers.

For example, in a periphonic (i.e. 3D) system, with B-Format input signals, the i-th loudspeaker will be fed by signal:

[ ]1cos cos sin cos sin(SN3D)

i i i i i iS W X Y ZL

θ ϕ θ ϕ ϕ= + + + (1.23)

Or, similarly, in the case of higher-order Ambisonics:

[

2 2

1cos cos sin cos sin

3 3cos 2 cos sin 2 cos

2 2

(SN3D)i i i i i i

i i i i

S W X Y ZL

U V

θ ϕ θ ϕ ϕ

θ ϕ θ ϕ

= + + + +

+ + +

⋯

(1.24)

Higher-order Ambisonics in a 2D system produces a D-Format signal for the i-th loudspeaker as follows:

1 8 3 8 3

2 cos 2 sin cos 2 sin 22 23 3

(N2D)i i i i iS W X Y U V

Lθ θ θ θ

= + + + + +

⋯ (1.25)

G-Format

G-Format has the same purpose of D-Format. The difference lies in the fact that this new format is pre-decoded, i.e., signals are already encoded and stored on multichannel audio formats such as Wave-Ex (multi-channel version of Wave file format) or DVD-Audio and SACD. In this way, the listener does not need a decoder, since he only has to play the DVD or the Wave-Ex file.

Obviously having a pre-decoded ambisonic track prevents custom signal adaptation to different speaker arrays. A track decoded for a 5.1 system will be able to play fine only on this type of surround sound system.


19

1.4 Higher Order Ambisonics

B-Format breaks off at first order. The reproduction accuracy of the sound field increases with increasing order. Table 1.3 includes Furse-Malham and Schmidt (SN3D) coefficients, used to encode the ambisonic channels of order higher than 1.

Spherical harmonics are represented in this form [8]:

( ) cos 1, sin

sin 1mn mn

n if Y P

n if ignore if n=0σ θ σ

θ ϕ ϕθ σ

== × = −ɶ (1.26)

where mnPɶ are Legendre semi-normalized functions of degree m and order n. This formulation is

called SN3D encoding (SN2D in the 2-D modification) and it is relative to 1st order Ambisonics (see Paragraph 1.1) with the exception of the weight 0.707 applied to signal W. Daniel’s modification, called MaxNormalization (MaxN), is followed by Furse-Malham (FuMa) coeffcients, as well, with the inclusion of the weight 0.707 on W (see Table 1.3 below).

The mathematical formulations of the spherical harmonics here include weighting factors ensuring the result of the integration of each harmonic on a sphere returns 1. The value each harmonic assumes increases with order. This may cause a problem for nearby sources, for which a problem in dynamics management with signals of higher order channels occurs.

The problem persists from miking to recording.

MaxN representations have weighting factors applied to each component above that of order zero (i.e., W), so the maximum value they can assume is limited to 1.

Above third order it gets difficult to determine the maxima of each polynomial, that’s why Table 1.3 reports the FuMa coefficients and SN3D representations up to third order.

Order

m,n, σ

Channel

SN3D Definition

FuMa Weight

0 0,0,1 W 1 1 2

1

1,1,1, X cos cosθ ϕ 1

1,1,-1 Y sin cosθ ϕ 1

1,0,1 Z sinϕ 1

2 2,0,1 R ( )23sin 1 2ϕ − 1

2,1,1 S ( ) ( )3 2 cos sin 2θ ϕ 2 3

2,1,-1 T ( ) ( )3 2 sin sin 2θ ϕ 2 3

2,2,1 U ( ) ( ) 23 2 cos 2 cosθ ϕ 2 3


20

2,2,-1 V ( ) ( ) 23 2 sin 2 cosθ ϕ 2 3

3 3,0,1 K ( )2sin 5sin 3 2ϕ ϕ − 1

3,1,1 L ( ) ( )23 8 cos cos 5sin 1θ ϕ ϕ − 45 32

3,1,-1 M ( ) ( )23 8 sin cos 5sin 1θ ϕ ϕ − 45 32

3,2,1 N ( ) ( ) 215 2 cos 2 sin cosθ ϕ ϕ 3 5

3,2,-1 O ( ) ( ) 215 2 sin 2 sin cosθ ϕ ϕ 3 5

3,3,1 P ( ) ( ) 35 8 cos 3 cosθ ϕ 8 5

3,3,-1 Q ( ) ( ) 35 8 sin 3 cosθ ϕ 8 5

Table 1.3 SN3D Definitions and FuMa Weights for ambisonic signals up to third order.

1.5 Near sources

In 2003 Daniel, Nicol and Moreau proposed a new formulation of B-Format with the aim of removing the limitation of the current formulation which allows reconstruction of plane waves only [7].

This restriction implies that the system is not able to handle nearby sources well, especially when

these are within the array of loudspeakers. The Fourier-Bessel expression for sound pressure on a spherical surface around a point (indicated

by radius vector r

) is reported below:

( ) ( ) ( ) ( ) ( )0 0 , 1 0 0 , 1

, ,m mm mn mn m mn mn

m n m m n m

p r j j kr B Y j h kr A Yσ σ σ σ

σ σθ ϕ θ ϕ

∞ ∞

= ≤ ≤ =± = ≤ ≤ =±

= +∑ ∑ ∑ ∑

(1.27)

where the first addend of the second term is equivalent to the current formulation of Ambisonics for sources external to the speaker array, expressed in the frequency domain.

The coefficients mnB are the gains for the spherical harmonic components, assuming that the sources

produce plane fronts. On the other hand, the second addend describes wave fronts from internal sources, which are curved and dependent on frequency.

Daniel et al. derived a formula which describes near field sources at distance R from the centre of the sphere:

( ) ( ),R cmn m mnB S F Yσ σω θ ϕ= ⋅ (1.28)

where S is the pressure field at centre and


21

( ) ( )( )0

!

! !

nm

mnn

m n jcF

m n n Rσ ω

ω=

+ − = − ∑ (1.29)

where c is the speed of sound and 2 fω π= .

It is deduced that ( )mnFσ ω has a gain that tends to infinite at low frequencies.

One can plainly see that compensation through the weighting factors mentioned at Paragraph 1.4 also helps solving this problem (gain is no more infinite).

In this manner, with the use of this formulation, it is possible to reproduce sources which are internal to the speaker array, since it allows the reconstruction of concave, plane and convex wave fronts. It is important to know beforehand, however, the dimension of the array when coding.


22

1.6 Pressure Microphones and Pressure Gradient Microphones We conclude this section with some useful smattering about the microphones that are commonly used in ambisonic arrays [11].

Pressure Microphones These microphones show only the front face to the sound field and respond in the same way to changes in the acoustic pressure for all the directions of the incident sound. In effect, pressure microphones have no directional characteristic and they are also known as omnidirectional microphones. Actually, the microphone body will cause its response to tend to become directional with increasing frequency, because its size becomes comparable with the wavelength of the incident sound on the diaphragm (this is true, in general, for all microphones, which effectively tend to become hypercardioid with increasing frequency).

Pressure Gradient Microphones

These microphones have a figure-of-8 polar diagram (see note at the end of paragraph) along the longitudinal axis. They respond to the pressure difference between two points A and B, shown in Figure 1.11, close together and immersed in the sound field.

The greater pressure difference occurs at 0° and 180°, whereas for a sound coming from 90° with respect to the axis of the microphone, the sound is received with the same sensitivity from both points A and B. In fact,

naming FT the transmission factor of the sound field (or sensitivity), the

following relation exists:

0 cosF FT T ϑ= (1.30)

where 0FT is the transmission factor we have when the sound is impinging

from direction 0° (microphone axis) and ϑ is the angle of incidence of the acoustic wave.

If 90 0FTϑ = → = .

The incident pressure is compared at points A and B. This can be achieved electrically if you use two identical adjacent capsules having opposite faces and measure the output voltages connected with reverse polarity. Alternatively, the comparison is done mechanically in case of microphones having both the front and the rear sides of the diaphragm exposed to the sound field. In this second case, only the instant differences of the forces acting on the front and the rear result in a movement of the diaphragm. The pressure difference is due to the velocity of the particles of the medium in which sound propagates. Since the microphone output voltage is proportional to the pressure difference, it is also proportional to particle velocity, hence the name velocity microphones.

Figure 1.11 Polar pattern of a pressure gradient microphone.


23

Mic behaviour in the presence of plane waves

Figure 1.12 shows how, in the presence of a plane wave front, sound affects points A and B with the same strength but with a phase difference. With a constant sound pressure, the angle swept by the sound and the pressure gradient increase with frequency. In Figure 1.12b the acoustic wave has a frequency approximately twice that of Figure 1.12a and the same pressure. As you can see, approximately, a doubling of the pressure gradient occurs. Reading the technical specifications of the microphones that are normally found on the market, specifically the frequency response graph, we note that all microphones have - sooner or later - a "hole" next to the so-called characteristic frequency of the microphone as shown in Figure 1.13. Of course, all the technical specifications should always be taken into consideration, especially when you want to build a microphone array for such a delicate system like Ambisonics, which exhibits most of its problems at high frequencies.

Usually, the distance between points A and B is very tiny in microphones. There exists a limit beyond which the microphone does not respond efficiently to very high frequencies. If the distance A-B is shorter than half the wavelength we want to reproduce, this limit

distance refers to the characteristic frequencytf .

For such a limit distance we have 180ϕ = .

Beyond the characteristic frequency the pressure gradient diminishes abruptly.

Mic behaviour in the presence of spherical waves In the case of spherical waves (Figure 1.14), the pressure gradient at points A and B depends not only on the phase difference, but also on the distance between source and microphone. In that case, for a point source radiating a spherical front, pressure decreases with increasing distance from the source

(that is, the pressure is proportional to 1 r ).

Figure 1.12 Microphone behaviour with plane waves.

Figure 1.13 Frequency response of a pressure gradient microphone: characteristic frequency effect.


24

Students at singing schools are taught that approaching the microphone to the mouth it is possible to enhance low frequencies. This is called proximity effect and is explained by the fact that the effect is noticeable especially at low frequencies for which the forces acting on the diaphragm are weaker, because the phase shift is smaller than in the case of high frequencies.

The boost at frequency f is calculated as follows:

8

0

1

cos

54.14tan

2

v

v

r f r

αλαπ

= = = ⋅

(1.31)

where r is the distance between the microphone and the source, 8

0

v

v represents the boost at frequency f

(wavelength λ) , 8v is the output voltage of a pressure gradient microphone having a directivity pattern

such as figure-of-8 and 0v is the output voltage of an omnidirectional microphone with same

sensitivity at 0° [30]. Note At the very beginning of this section, we said that pressure gradient microphones have a figure-of-8 polar diagram. This is worth pointing out because when it comes to microphones belonging to the family of cardioids, however, we refer to pressure gradient microphones, as well. Cardioid polar characteristics can be achieved in different ways:

- Superimposition of a figure-of-8 and an omnidirectional mic; - Microphone composed of a part of the diaphragm having only the front side exposed to the

sound field and another part having both sides exposed to the field; - It is possible to build a microphone in which the sound gets to its rear side passing through a

delay element.

Figure 1.14 Pressure gradient microphone behaviour with spherical front.


25

Bibliography

[1] F. Rumsey, Spatial Audio, Focal Press, 2001.

[2] M. A. Gerzon, "A year of surround sound," Hi-Fi News, August 1971.

[3] M. A. Gerzon, "A year of surround sound," Wireless World, December 1974.

[4] D. Malham, "Homogeneous and nonhomogeneous surround sound systems," in AES Second Century of Audio, London, 7 -8th June 1999.

[5] M. Vorländer, Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality, Springer , 2008.

[6] R. Nicol and M. Emerit, "3D-sound reproduction over an extensive listening area: a hybrid method derived from holophony and ambisonic," in AES 16th International Conference.

[7] J. Daniel, R. Nicol and S. Moreau, "Further Investigations of High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging," in AES 114th Convention, Amsterdam, The Netherlands, March 22–25th, 2003 .

[8] J. Daniel, "Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimedia," 2001.

[9] R. Nicol and M. Emerit, "Reproduction of 3D sound for videoconferencing: a comparison between holophony and ambisonics," in Proc. Workshop on Digital Audio Effects (DAFx-98), Barcelona, Spain, November 19-21, 1998.

[10] M. A. Gerzon, "Ambisonics in Multichannel Broadcastingand Video," Journal of Audio Engineering Society, vol. 33, no. 11, 1985.

[11] G. Boré and S. Peus, Microphones for Studio and Home-Recording Applications - Operation Principles and Type Examples, Georg Neumann GmbH, 1999.

Introduction to Ambisonics - Rev. 2015

Documents

audio channels

audio formats

sound engineers

multichannel audio

finalized audio

audio world

sound quality

d audio techniques