Time Frequency Analysis of sounds using Windowed Fourier Transform in MATLAB Author Jimin Kim Abstract The Fourier transform is one of the most effective, and powerful method of analyzing signals. However, it had a severe drawback of not being able to capture the moment in time when various frequencies were present in the signal. Windowed Fourier transform offers a solution to this problem. By adding a translational kernel into the original Fourier transform equation, Windowed Fourier transform can localize both time and frequency with certain accuracy. By using this method, signals that have time dependent frequency can be analyzed with the spectrograms and different types of translational kernel function can be used to improve the result. In this paper, implementation of Windowed Fourier transform into MATLAB and its applications to realistic sound signals will be discussed. Introduction/Overview The Windowed Fourier transform technique will be applied to two realistic sound samples: 9 seconds portion of Handel’ s ‘Messiah’, and ‘Mary had a little Lamb’ recorded with both Piano and Recorder. ‘Messiah’ sample will be used to investigate the effect of different types of translation kernel functions into the signal. Also, the idea of over sampling, under sampling and width sizes of the kernel will be explored. ‘Mary had a little lamb’ sample will be used to analyze the difference of Piano and Recorder from the spectrogram of the piece. Also its music score will be reconstructed through the spectrogram analysis. By carrying out these applications in MATLAB, the goal is to not only learn the usefulness of Windowed Fourier transform with sound analysis, but also understand the limitation of the technique in terms of attaining accuracy in both time and frequency domain. Theoretical Background Mathematically, Windowed Fourier transform is Fourier transform with slight modification. Recall that Fourier transform equation states (1) Where k is the frequency domain and x is the position (or time) domain. The Windowed Fourier transform, also known as Gabor transform implements a time translation kernel (2) Into the Fourier transform, which then becomes
16
Embed
Project 2 (Time Frequency Analyis using Windowed Fourier Transform)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Time Frequency Analysis of sounds using
Windowed Fourier Transform in MATLAB
Author
Jimin Kim
Abstract
The Fourier transform is one of the most effective, and powerful method of analyzing
signals. However, it had a severe drawback of not being able to capture the moment in time
when various frequencies were present in the signal. Windowed Fourier transform offers a
solution to this problem. By adding a translational kernel into the original Fourier transform
equation, Windowed Fourier transform can localize both time and frequency with certain
accuracy. By using this method, signals that have time dependent frequency can be analyzed
with the spectrograms and different types of translational kernel function can be used to
improve the result. In this paper, implementation of Windowed Fourier transform into
MATLAB and its applications to realistic sound signals will be discussed.
Introduction/Overview
The Windowed Fourier transform technique will be applied to two realistic sound
samples: 9 seconds portion of Handel’s ‘Messiah’, and ‘Mary had a little Lamb’ recorded
with both Piano and Recorder. ‘Messiah’ sample will be used to investigate the effect of
different types of translation kernel functions into the signal. Also, the idea of over sampling,
under sampling and width sizes of the kernel will be explored. ‘Mary had a little lamb’
sample will be used to analyze the difference of Piano and Recorder from the spectrogram of
the piece. Also its music score will be reconstructed through the spectrogram analysis. By
carrying out these applications in MATLAB, the goal is to not only learn the usefulness of
Windowed Fourier transform with sound analysis, but also understand the limitation of the
technique in terms of attaining accuracy in both time and frequency domain.
Theoretical Background
Mathematically, Windowed Fourier transform is Fourier transform with slight
modification. Recall that Fourier transform equation states
(1)
Where k is the frequency domain and x is the position (or time) domain. The Windowed
Fourier transform, also known as Gabor transform implements a time translation kernel
(2)
Into the Fourier transform, which then becomes
(3)
Here, the term induces the time localization of the Fourier integral around
. Therefore, as varies in the given time interval, g sweeps through the signal and
picks up the frequency information from each point in time, just as shown in top picture of
the Figure 1. Therefore, it is possible to investigate both time and frequency information in
the signal. However, this technique that enables simultaneous analysis of both time and
frequency domain comes with a price when it comes to accuracy.
Figure 2 well describes the principle behind the Windowed Fourier transform
technique. In time series domain, excellent resolution is obtained in time domain but this
leads to zero resolution in frequency domain. For the frequency series analysis, great
resolution is achieved in the frequency domain but in return, zero resolution is obtained in the
time domain. By introducing the time translational kernel, Windowed Fourier transform
achieves moderate resolution in both time and frequency domain by trading away some
resolution to each other. I.E, if one attempts to improve the time resolution by decreasing the
window size of the kernel, it results in poorer resolution in the frequency domain. In the
contrast, if one attempts to improve the frequency resolution by increasing the window size, it
will result in poorer resolution in the time domain. Therefore, understanding this principle
and selecting a reasonable size of the window is crucial during the time frequency analysis. A
map created by Windowed Fourier transforms that holds both time and frequency information
is called the ‘Spectrogram’.
Figure 1. This figures describes how Windowed Fourier transform is performed. The top picture shows
the overlap between the translational kernel function (red) with the signal. The middle picture shows
the filtered signal at the given timestamp. The bottom picture shows the FFT transform of the filtered
signal.
Many different types of functions can be used as the translational kernel in
Windowed Fourier transform. In this paper, three special functions will be discussed:
Gaussian, Mexican Hat and Shannon function.
Gaussian wavelet
The Gaussian function is the most commonly used wavelet in time frequency analysis. Its
equation follows
(4)
Here, the constant ‘a’ determines the width of the window and ‘ ’ determines the center
location of the function. Hence, this function produces a normal curve with the width ‘a’ and
symmetric about .
Mexican Hat wavelet
The Mexican Hat function is another type of wavelet that is similar to the Gaussian but with
trough on each side of normal distribution, resembling the sombrero. Its function is defined as
(5)
Where is the window size parameter and is the time translational parameter.
Shannon wavelet
The Shannon function is essentially a step function that only has two discrete values
throughout the domain. The function is defined as
Figure 2. The left picture shows how signal is sampled in time series domain. The center picture shows
the sampling in Frequency domain. The right picture shows the sampling in both time and frequency
domains in Windowed Fourier transform technique.
Algorithm implementation/development
The algorithm implementation in MATLAB follows the following sequence of
procedure. By following this general procedure, one can produce spectrograms for both
Handel’s ‘Messiah’ and ‘Mary had a little lamb’ played by Piano and Recorder.
1. Construct the linear space and frequency space that incorporate the sound sample.
First one should construct a framework which all the time frequency analysis will be
based on. Both time domain and frequency domain are needed to create a
spectrogram. Since the portion of ‘Messiah’ that will be analyzed is 9 seconds long
with 8192 samples per seconds for example, one should create a linspace with L=9
and n=73112. Notice that n is not in the power of 2 in this case. It is generally a good
idea to divide the domain with modes of power of 2 but FFT still does the job even
when n is not in power of 2. But one should note that this comes with a price of
decreased efficiency. After creating time domain, define the frequency domain k by
rescaling it to 2pi/L since FFT algorithm assumes 2pi periodic signals. Don’t forget to
fftshift the wave number k so that the plot comes out correctly.
2. Load the sound file.
Once the both time and frequency domain have been defined, load the music sample
(in this case, ‘Handel’) that will be analyzed. Since the original sample is a row
vector, one should transpose the vector so that dimensions are matched when the
Figure 3. Different types of translational kernel function that was used in this paper. From the top, it
shows the Gaussian wavelet, Mexican Hat wavelet and Shannon wavelet.
sample is multiplied by the translational kernel function. Also, the sample has been
divided by 2 to scale it to the right size for filtering.
3. Filter the sound signal (Optional)
If the original sound sample you have is too noisy, (for example, the signal has series
of overtones and noise around the signature frequencies) then filtering the signal prior
to sampling can help producing a cleaner spectrogram. Depending on the ultimate
goal of your time frequency analysis, different types of filter can be applied. In this
paper, a low pass filter has been applied in a purpose of cleaning up the overtones to
obtain a better music score. Procedures for designing a filter will not be discussed in
this paper but one can easily filter a signal by using a MATLAB’s built in filters as
well.
4. Define the sampling rate.
Before the signal can be analyzed, one should define how often the signal will be
sampled. First, create an empty matrix where all the time-frequency information will
be stored after the loop. Next, define the sampling frequency by creating a row vector
with desired increment. In this case, the starting point will be 0 and end point will be
9. To begin with, 0.1 second increment is a nice number since it samples the signal 91
times, which is a reasonable number. However, this value will be changed when we
explore the idea of over sampling and under sampling.
5. Define the time translational kernel function.
Now the signal is ready to be analyzed, one should create a ‘for’ loop that
incorporates the short time Fourier transform. The loop uses the row vector defined at
step 3 as collection of time stamps where the kernel will be centered at. Once the loop
parameter is set, define a translational kernel function. This function can be arbitrary
as it was mentioned earlier, but in this paper, Gaussian, Mexican Hat and Shannon
functions were used. Check the ‘Theoretical Background’ section to find the
mathematical descriptions of these functions. Make sure to include both translational
parameter and window width parameter b.
6. Implement Windowed Fourier transform
Once the function is defined, one should multiply the function to the signal at each
sampling point. Simply define another vector that multiplies the signal and the kernel.
Then create a vector that takes the Fourier transform of the result. Recall we defined
an empty matrix in section 3 where all the time frequency information will be stored.
Define this matrix to hold absolute value of the transformed data with fftshift applied.
The loop then stores the time frequency information from the each loop into the each
column of this matrix. By the end of the loop, this matrix should have a dimension of
91*73112, which is the (sampling number)*(number of samples in the signals). The
loop can end at this point since this matrix will hold all the information needed for
creating a spectrogram.
7. Create a spectrogram.
Once the time frequency matrix has been created, one can use this matrix to create a
nice spectrogram. Make sure to rescale the frequency domain by diving it with 2pi.
This is because when it comes to the sound analysis, the wave number that is
originally defined in terms of angular frequency must be converted into Hz that
describes the sound frequency. Set appropriate range of frequency to analyze different
portion of sound range.
Computation results/Analysis
This section will be divided into two parts: analysis of Handel’s ‘Messiah’ and
analysis of ‘Mary had a little lamb’ piece played by piano and recorder.
Handel’s Messiah
Spectrogram analysis
After following the procedures in previous section, one can obtain the following
spectrogram of the piece. The spectrogram used the Gaussian wavelet with window size -15
and sampling rate of 0.1 seconds. Notice the frequency ranges from about ~250Hz to 4000Hz
but one can also notice the existence of the overtones within the piece. Overtones are related
‘timbre’ of the instrument such that when one plays a certain note at frequency x, an
instrument will generate overtones at 2x, 3x, 4x…and so forth.
Figure 4. The spectrograms of Handel’s Messiah piece using the Gaussian kernel. One can see the
existence of overtones by closely inspecting the spectrogram.
Window size investigation
One can also investigate the effect of modifying the window size of the kernel with
the spectrogram. Figure 5 demonstrates the ‘uncertainty principle’ of Windowed Fourier
transform technique when it comes to attaining resolution in both time and frequency domain.
The left figure has been obtained by setting the window size of the Gaussian wavelet to -5.
Notice that it has good frequency resolution but has poor time resolution. The right figure has
been obtained by setting the window size of the Gaussian wavelet to -25. In this figure,
excellent resolution is achieved in time resolution, but relatively poor resolution in frequency
domain. By experimenting with different window sizes, one should aim to pick the window
size that gives the reasonable resolution in both time and frequency.
Over sampling and under sampling
While window size can be modified by varying the window size parameter ‘a’, the
rate of sampling can be modified by varying the translational parameter ‘ ’. The figure 6
shows the effects of over sampling and under sampling to the spectrogram. The left figure has
been produced by setting the sampling rate to 0.01, which corresponds to total 901 samplings
within the signal. The right figure has been produced by setting the sampling rate to 1, which
corresponds to only 10 samplings within the signal. Notice from the left picture that when the
window size is kept constant and signal is over sampled, it produces great resolutions in both
time and frequency domain. But when signal is under sampled, it results in poor resolutions
in both domains. However, one should be aware that the rate of sampling is directly related to
the efficiency of the code. Therefore, even if over sampling produces a high resolution
spectrogram, one should expect the code to run way slower compared to that incorporates
under sampling. The key idea is to find the sampling rate that gives both reasonable
efficiency of the code and the quality of the spectrogram.
Other types of translational kernel: Mexican Hat and Shannon wavelets
By defining different types of function as translational kernel, one can explore the
spectrograms produced by different types of wavelet.
Figure 5. The spectrograms of Handel’s Messiah piece using the large window size (left) and using the
small window size (right). Notice that using the large window size has great frequency resolution but
misses out on the time resolution. In the contrast, using the small window size has excellent time
resolution but poor frequency resolution.
In this paper, Mexican Hat and Shannon wavelets have been applied to the signal.
The figure 7 and 8 show the spectrogram produced by the Mexican Hat wavelet and
spectrogram produced by the Shannon wavelet. Both wavelets were scaled so that they have
window size of about 1 second length. The sampling rate was kept as 0.1 second. One can
notice that both wavelets produce similar spectrogram generated using Gaussian but they are
different in terms of the resolutions. One can notice that Shannon window picks up more
information in frequency domain than Gaussian does since unlike Gaussian which scales the
most in center frequency, Shannon window scales equally throughout the window. Similar
principle seems to apply with Mexican Hat wavelet. By adding two troughs at the both sides
of the Gaussian wavelet, it picks up more frequency information at each sampling than
Gaussian does.
Figure 6. The spectrograms of Handel’s Messiah piece by over sampling the piece (left) and under
sampling the piece (right). Notice that when window size is kept constant, over sampling results in
great resolutions in both domains while under sampling produces poor resolutions in both time and
frequency.
Figure 7. The spectrogram of Handel’s Messiah piece using the Mexican Hat wavelet
Mary Had a Little Lamb
Filtering the signal
For the purpose of obtaining the clean music score from the spectrograms, it is
important to get rid of the overtones beforehand. This can be done by using the built in
MATLAB low pass filter. The figure 9 shows the comparison of the unfiltered signal and
filtered signal.
Reproduction of the music scores
After filtering the initial signals to remove the overtones, one can produce
Figure 8. The spectrogram of Handel’s Messiah piece using the Shannon wavelet
Figure 9. The comparison of unfiltered and filtered signals of piano (left) and recorder (right). One can
notice that the overall amplitude of the frequencies is reduced after applying the low pass filter.
spectrograms for both piano and recorder sample by following the similar procedures that
were done with Handel’s Messiah. The right pictures of figure 9 shows the spectrogram of the
piece played by piano and the left picture shows the spectrogram of the piece played by
recorder. By using this spectrogram, one can reproduce the music score for both instruments
by converting the frequency value of center frequency of each note into corresponding
musical note.
The music score of each instrument reconstructed from the information in