48 Chapter 3. The Solution Section 3.1. A Hybrid ASK/FSK Approach Section 3.1.1. Keying on Sinusoids After considering various approaches, including options based on audio frequency spread spectrum and differential phase shift keying, we knew a technique was required that did not depend on the preservation of exact timing, phase, frequency, or amplitude. Eventually these requirements led us to develop a hybrid scheme similar to amplitude- shift keying (ASK) and frequency-shift keying (FSK) [14]. The digital signature is encoded using 167 sinusoids added to a filtered version of the audio component of the television signal. A patent disclosure has been filed covering the newly developed communications method in applications other than interactive television [15]. The 167 sinusoids consist of five groups that perform three distinct functions. Three groups contain the digital data and the error detection coding bits. Another group contains the sinusoids used in the “control function” for synchronization and for weighting estimates of received data. The final group contains a pattern of sinusoids used to detect and quantify the amount of frequency distortion experienced in the current signal block. These groups and functions will be described individually in more detail shortly. But first a succinct general description of the coding scheme will be presented. In the hybrid method the presence or absence of sinusoids in specific frequency locations conveys data. Sinusoidal frequencies are chosen to correspond to the bins of a
38
Embed
Chapter 3. The SolutionChapter 3. The Solution Section 3.1. A Hybrid ASK/FSK Approach Section 3.1.1. Keying on Sinusoids After considering various approaches, including options based
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
48
Chapter 3. The Solution
Section 3.1. A Hybrid ASK/FSK Approach
Section 3.1.1. Keying on Sinusoids
After considering various approaches, including options based on audio frequency
spread spectrum and differential phase shift keying, we knew a technique was required
that did not depend on the preservation of exact timing, phase, frequency, or amplitude.
Eventually these requirements led us to develop a hybrid scheme similar to amplitude-
shift keying (ASK) and frequency-shift keying (FSK) [14]. The digital signature is
encoded using 167 sinusoids added to a filtered version of the audio component of the
television signal. A patent disclosure has been filed covering the newly developed
communications method in applications other than interactive television [15].
The 167 sinusoids consist of five groups that perform three distinct functions.
Three groups contain the digital data and the error detection coding bits. Another group
contains the sinusoids used in the “control function” for synchronization and for
weighting estimates of received data. The final group contains a pattern of sinusoids
used to detect and quantify the amount of frequency distortion experienced in the current
signal block. These groups and functions will be described individually in more detail
shortly. But first a succinct general description of the coding scheme will be presented.
In the hybrid method the presence or absence of sinusoids in specific frequency
locations conveys data. Sinusoidal frequencies are chosen to correspond to the bins of a
49
4096 point FFT with a sampling rate of 16.0 kHz. This allows fast and simple detection
and decoding on the receiving end, where all computations must be performed
continuously and in real time. With a 16.0 kHz sampling rate, a 4096 sample block of
signal has a duration of 256 ms. After a block has been processed, the oldest 205
samples are shifted out and 205 samples of new signal are shifted in on the other end (i.e.,
the signal blocks are overlapped by 95 percent). The refreshed block is then processed,
and the cycle is repeated. All processing of a refreshed block is therefore performed
within 12.8125 ms.
Since multiple sinusoids will be packed into a relatively small frequency range,
windowing should be applied at the receiver to minimize leakage interference between
the sinusoids. If no window (i.e., a rectangular or “boxcar” window) is applied to the
data at the receiver, each sinusoid will show up as a sinc function in the frequency
domain. Each sinc function will have a main lobe width of almost two FFT bins, a first
sidelobe level thirteen dB below the main lobe peak, and sidelobes spaced one FFT bin
apart that roll off at a rate of six dB per octave. Table 3.1 below shows the peak sidelobe
levels and the rolloff rates for several commonly used windows [16]. In the table M is
the number of data points available; here M would be 4096.
50
Table 3.1 Window Characteristics
Window TypePeak Sidelobe
Amplitude(Relative, in dB)
ApproximateWidth ofMainlobe
Rolloff Rate(dB / Octave)
Rectangular -13 4π / M 6
Bartlett -25 8π / (M-1) 12
Hanning -31 8π / (M-1) 18
Hamming -41 8π / (M-1) 6
Blackman -57 12π / (M-1) 18
The on or off (binary one or zero) condition of the sinusoid is determined by
comparing the magnitude of the FFT at a signal bin with the magnitudes of its neighbor
bins some distance away to the left and right. If the magnitude at the signal bin exceeds
both its reference neighbors’ magnitudes by more than a pre-set threshold, a sinusoid is
considered to be present and a digital one is detected. If the condition is not satisfied, the
sinusoid is considered to be absent and a digital zero is received. Figure 3.1 illustrates
the sinusoid detection process. Note that although the sinusoids are chosen to nominally
correspond to the bin locations of the FFT, they will be moved in frequency by the tape
wow and flutter. Thus the sample points in the frequency domain will in general not line
up perfectly with the main lobe peaks and the zero-crossings of the sincs. Such a shifted
case is represented in the figure.
51
CandidateSinusoid Bin
RightNeighborTest Bin
SinusoidDetectionThreshold
LeftNeighborTest Bin
Figure 3.1 Sinusoid Detection Process
The spacing of the sinusoids, the distance to the neighbor reference bins, and the
detection threshold must all be set based on the window applied at the receiver. The
sinusoids must be spaced far enough apart so that their mainlobes are distinct and are able
to decay enough before the next mainlobe begins. Similarly, since the mainlobe widths
vary with the window applied, the distance to the reference neighbors should be changed
accordingly. If a rectangular window is used in a clean signal environment, the neighbor
bins could be chosen as close as a single bin away, since the sinc has its first zero
crossing a single FFT bin away from the peak of the main lobe. However, since the
sidelobes are relatively high and roll off very slowly for the rectangular window, the
sinusoids would have to be spaced far apart to ward off interference between sinusoids.
Also, for any window choice, the frequency variation due to wow and flutter must be
52
considered. Although the sinusoids are chosen to nominally fall on the FFT bin
frequencies, for any 4096 block of data the wow and flutter will cause a change in
frequency for all sinusoids. Once a sinusoid moves from its nominal location, it is
unlikely that it will be centered on an FFT bin. Instead, as mentioned above and depicted
in Figure 3.1, the FFT bin “samples” can fall anywhere on the main and side lobes.
Therefore, the reference neighbors should be spaced at least two bins away from the
center signal bin, even in the rectangular window case.
Figure 3.2 below shows how a 32-point FFT with a rectangular data window
would sample the spectrum of a complex sinusoid located at a normalized frequency of
0.25. The dotted lines show the underlying sinc function (the true spectrum), and the
circles represent the samples that are computed by the FFT. Figure 3.3 shows where the
FFT sample points fall if the sinusoid moves slightly, to a normalized frequency of
0.2656. Note that there is no sample point at the peak of the main lobe for the shifted
case, and the remaining sample points occur near the peaks of the sidelobes instead of at
the zero-crossings of the sinc. In the un-shifted case there is over 250 dB separation
between the main lobe peak sample and the other samples at the zero-crossings.
(Theoretically there is infinite separation; the values below –250 dB show up here
because of finite word-length effects in the computations.) For the case with the shifted
sinusoid, the separation has been reduced to between ten and thirty dB.
53
Figure 3.2 FFT of Sinusoid, Rectangular Window
Figure 3.3 FFT of Shifted Sinusoid, Rectangular Window
54
If a Hanning window is applied to the data before the FFT is computed, the main
lobe width will increase by slightly more than a factor of two (see Figure 3.4). The peak
sidelobe level will be reduced, and the sidelobe rolloff rate will increase. Figure 3.5
shows the Hanning window result for the shifted sinusoid case. A Blackman window
furthers the effects on the main lobe width, the peak sidelobe level, and the sidelobe
rolloff rates. Results for the Blackman window applied to the two sinusoids are shown in
Figure 3.6 and Figure 3.7. Note that for the windowed cases (as opposed to the
rectangular or “no-window” case), the sample points will not all fall on the zero-crossings
regardless of where the sinusoid is located. However, since the sidelobes are
significantly reduced, the separation in value between the “peak” sample and its
neighbors a few bins away is sufficient for our purposes, and will allow proper detection
of sinusoids.
55
Figure 3.4 FFT of Sinusoid, Hanning Window
Figure 3.5 FFT of Shifted Sinusoid, Hanning Window
56
Figure 3.6 FFT of Sinusoid, Blackman Window
Figure 3.7 FFT of Shifted Sinusoid, Blackman Window
57
After testing with various window options (see Section 4.2 later in the thesis), the
Hanning window was chosen as a compromise between mainlobe width and sidelobe
reduction. The reference neighbors were then defined as the bins two away to either side
of a signal bin, and the detection threshold was set to four dB. In a clean environment the
four dB threshold is easily surpassed, and could be set much higher. Figure 3.8 depicts
an actual case of the sinusoidal detection process using the FFT magnitude spectrum of
an audio signal containing an embedded code. The asterisks mark the candidate FFT
bins, and the circles mark the bins two away on either side of the centers. The bit
sequence [1 0 1 0 0 1 1 1] is represented in the example. Note that in this particular
example the signal-to-noise-ratio was relatively high, and so the sinusoid peaks are
clearly evident. However, after signal degradation by tape wow and flutter and by
transmission through various channels, the sinusoidal peak bins are often only about four
or five dB above their reference neighbors. (Of course at times they can also be
completely attenuated or overwhelmed by interference, and their detection is impossible.)
The four dB value for the threshold was obtained after testing in various signal
environments. If set lower, noise variance easily satisfies the detection criteria and false
detections result. If set much higher, actual sinusoids are unable to satisfy the condition
in low SNR environments. The four dB threshold is a compromise between the two
extremes that seems to work well under a wide range of signal conditions.
58
Figure 3.8 Sinusoid Detection by FFT Magnitude Results
Section 3.1.2. Control Function
As mentioned in the previous sub-section, one group of sinusoids is used in the
“control function” for synchronization and weighting of data estimates. Figure 3.9 below
demonstrates self-synchronization with the control function. As the FFTs are performed
on refreshed blocks of signal, the 21 control sinusoid locations are examined. If
sinusoids are detected at fourteen or more of these locations (a two-thirds majority), valid
data is assumed to be present in that FFT block, and the data decoding process is initiated
by polling the data sinusoid locations. In the figure the asterisks indicate when a two-
thirds majority of control sinusoids is present, and hence when the digital data is
available.
59
Figure 3.9 Control Function for Synchronization
The control sinusoids are uniformly interspersed with the data sinusoids
throughout the entire four kHz band being used (from 2.4 to 6.4 kHz). Thus the control
sinusoids not only serve a synchronization purpose, they also provide an indication of the
quality of the received data in that particular signal block. Since a triplication code is
used for the data (to be discussed in Section 3.1.3), a two-thirds majority requirement for
the control function is commensurate. Furthermore, when the data is being tabulated
over successive FFT blocks, the results of each block are weighted according to the value
of the control function in that block. For example, we have more confidence in the data
when the control function is 21/21 versus when it is 14/21. The data associated with such
60
blocks should be weighted accordingly. Thus digital ones are represented by positive
control function values and digital zeros by negative control function values.
Section 3.1.3. Data Subbands and Triplication Codes
Since the audio transmission channel is noisy and corruptive, the error rate for
individual sinusoidal detection is quite large. Tape media not only distort the signal with
wow and flutter, but also add noise. The acoustic paths from the television speakers to
the AudioLink microphone introduce multipath interference as well as additional noise.
Extraneous room noises such as human voices or music can also compete with the
desired audio signal upon reception at the AudioLink. When these multiple corruptions
are combined in the received audio signal, the resulting signal to noise ratio is often quite
low, and therefore the detection rate for a given sinusoid is likewise often low.
Most communication systems employ coding schemes to remove inefficient
redundancy in a signal source, so that fewer bits and less bandwidth are necessary for
transmission. However, efficient redundancy is usually inserted so that transmission can
be successful despite some percentage of individual bit errors [17], [13]. If the bit error
rate is relatively small for a channel, one of many efficient error correction codes can be
employed to correct bit errors at the receiver. However, as the rate of individual bit
errors rises and/or if they occur in large bursts, more redundancy is necessary, and many
of the commonly used schemes cannot effectively handle the situation. Such is the case
in our application. The individual bit error rate is often on the order of 33 percent for a
given signal block. (Note that if it rises much above 33 percent, the data block is
discarded due to the control function, which would presumably have a similar reception
61
statistic.) Furthermore, bit errors commonly occur in bursts since many signal
corruptions are frequency dependent and destroy a succession of data bits.
In an (n,k),t block code, a code word n bits long is used to transmit k data bits
(n>k). The number of check digits in a block code is m = n – k, and the number of bit
errors it can correct is given by t. Different (n,k) block codes will be capable of
correcting different numbers of bit errors (t), in general. But the error correcting
capability of all block codes (linear and non-linear) is limited by the Hamming bound
[18]:
∑=
−
≥=
t
j
kn
j
n
0
m 22
( 3.1 )
where
j
n is the binomial coefficient
( )!!
!
jnjn
j
n
−=
( 3.2 )
Note that linear codes, by definition, are those that form vector subspaces over finite
Galois fields [19].
Because of the frequency range available for our transmission, a maximum of 141
data sinusoids can be accommodated. The IVDS system requires 35 signature bits plus
twelve error detecting bits (to be described shortly), for a total of 47 data bits. Therefore,
in our application n = 141, k = 47, and m = n – k = 94. Using a computer to solve for the
maximum number of bit errors that can be corrected, we obtain tmax = 25 bits. It should
62
be noted that the Hamming bound is a necessary but not a sufficient condition, and it may
not be possible to construct codes with certain parameters even if the bound is satisfied.
Given the high error rate of our channel, we can often expect more than 25 bit errors out
of the 141 available bits. This bound shows that no block code can perfectly handle our
situation.
Since many signal corruptions encountered in our application are frequency
dependent and destroy a succession of data bits, a triplication code has a chance of
succeeding where other block codes would fail. The 141 sinusoids are divided into three
groups - one in the lower frequency region, one in the upper frequency region, and one in
the middle. The 47 data bits are then coded into each of the three regions. Since signal
degradation most often occurs within a given frequency band for our channel, it is likely
that other bands will be preserved well enough for successful transmission. In an
extreme case an entire group of 47 bits can be destroyed and the transmission can still be
successful due to the triplication code. An example would be where human voice
interference corrupts a large part of the lower frequency region. As long as the
corresponding replicated bits in the middle and high frequency regions are received
correctly, the triplication code will yield the correct result by assigning a bit value based
on a two-thirds majority.
Although block codes cannot handle the large number of errors we would
encounter if they occurred randomly, several coding techniques have been devised for the
burst error cases. One non-binary possibility is Reed-Solomon (RS) codes [19]. RS
codes, however, would be too taxing on the IVDS hardware in terms of memory and
63
computational requirements, and so they could not be used. However, BCH codes [19]
(named for their inventors, Bose, Chaudhuri, and Hocquenghem) could be used in
groups, with the bits interleaved in the frequency domain [19]. BCH codes are
considered among the most powerful linear codes because they possess the best
combination of high rate and high error correction for a given block length. One (15,5)
BCH code can correct t = 3 or fewer bit errors [9]. If ten blocks of these codes were used
in our application (fifty data bits enclosed in 150 code bits), and the bits were interleaved
in the frequency domain, an error burst up to thirty bits long could be corrected.
Figure 3.10 below provides a representation of the probability of successful
transmission with the code options discussed. The Hamming bound (the dashed line)
shows that for a (141,47) code, the maximum error correcting capability is 25 bits, after
which any block code will fail. In practice the cutoff is likely to be far to the left of
where it is shown in the figure, since it is probably not possible to create a code that
achieves the performance at the bound. If ten groups of (15,5) t=3 BCH codes were
interleaved in the frequency domain as mentioned above, the performance would be
similar to that shown by the dotted line. Since any group by definition (t=3) can correct
up to three bit errors, successful transmission is guaranteed up to that point. However, if
four errors are present, there is a small but non-zero probability that they will all occur in
the same group, and thus the probability of success drops slightly below unity at four bit
errors. At 31 errors at least one of the ten groups will have at least four errors, and the
code will fail. Therefore the probability of success is zero if more than thirty errors are
present. The triplication code is guaranteed to be able to correct only a single error. At
64
two errors there is a small but non-zero probability that they will occur for the same data
bit, and the transmission will fail. However, success is still possible until 48 bits are in
error, since at that point at least one data bit will be lost. As the figure shows, the
triplication code provides an extended range of possible success at the expense of a
higher rate of failure for smaller rates of bit errors. It should be noted that the figure is
not to scale, and the exact curves depend strongly on the statistics of the bit errors and on
the bursts in which they occur. These statistics are difficult if not impossible to compute
for our application. Since the AudioLink often operates in the region where many (20-
50) burst errors are present, the triplication code is a reasonable choice.
Number of Bit Errors
Probability of Successful Transmission
1
02 4 31 4825
Hamming Bound for (141,47) t=25 Block Codes
10 Interleaved (15,5) t=3BCH Codes Triplication
Code (141,47)
Figure 3.10 Code Comparisons
65
Because of deadlines imposed by the project sponsors, the coding method had to
be chosen and implemented in a very short time period. The digital signal processor
(DSP) used in the receiver unit was already heavily burdened, both in memory and in
computational requirements, so complicated or demanding coding methods were not
feasible. Therefore, given the large bit error rate on our channel, the constraints imposed
on hardware implementation, and the lack of time available to investigate alternative
strategies, we chose the simple triplication code.
Section 3.1.4. Bit Voting in Time
Once the detection process is initiated by the control function, valid data received
in the current and subsequent FFT blocks are tabulated. The end of data transmission is
detected by the level of the control function dropping below 14/21 and remaining low for
a specified period of time (say one second). (Because of implementation issues, this
specification was later changed. The AudioLink now listens for approximately 500 ms
after the control function first turns on, and the control function is not used for
determining the end of the data transmission. In later versions of the AudioLink, it may
be desirable to return to the original method of triggering on the descent of the control
function.) When it is determined in this manner that the data transmission is complete,
final decisions are made regarding the individual data bits. Since the bit votes from each
FFT block (weighted by the control function from each block) have been summed over
time, a final decision can be made regarding each bit’s status by a simple threshold test.
If a bit’s value is positive it is considered to be a digital one, and if it is negative it is
considered to be a zero. The triplication code is then decoded by a two-thirds majority
66
vote among the three frequency subbands. Finally the cyclic-redundancy-check (CRC) is
confirmed to verify error-free reception (to be discussed in Section 3.1.5), and the 35 bit
digital signature results.
Table 3.2 below demonstrates the data decoding process (without the CRC)
through an example. Suppose we desire to transmit a digital signature of two bits [1 0],
and on the decoding end the control function is detected as shown in the table. When the
control function is below fourteen no data is present. When the control function is
fourteen or larger the data bit locations are analyzed to see if sinusoids are present. When
a sinusoid is present, the value of the control function is added to the corresponding data
bit location. Likewise, the lack of a sinusoid represents a digital zero and the value of the
control function is subtracted from the corresponding data bit location. Once the control
function drops below fourteen and stays there, the data collection process terminates.
Any bit locations containing positive values are considered to be digital ones, and any
locations containing negative values are zeros.
67
Table 3.2 Example of the Data Decoding Process
FFT Block
Control Function Bit 1-1 Bit 2-1 Bit 1-2 Bit 2-2 Bit 1-3 Bit 2-3