-
Mesleh A, Skopin D, Baglikov S et al. Heart rate extraction from
vowel speech signals. JOURNAL OF COMPUTER
SCIENCE AND TECHNOLOGY 27(6): 12431251 Nov. 2012. DOI
10.1007/s11390-012-1300-6
Heart Rate Extraction from Vowel Speech Signals
Abdelwadood Mesleh1, Dmitriy Skopin1, Sergey Baglikov2, and Anas
Quteishat1
1Computer Engineering Department, Faculty of Engineering
Technology, Al-Balqa Applied University, Amman, Jordan2Help MediCom
Group, Kursk, Russia
E-mail: [email protected]; [email protected]; [email protected];
[email protected]
Received November 17, 2011; revised May 14, 2012.
Abstract This paper presents a novel non-contact heart rate
extraction method from vowel speech signals. The proposedmethod is
based on modeling the relationship between speech production of
vowel speech signals and heart activities forhumans where it is
observed that the moment of heart beat causes a short increment
(evolution) of vowel speech formants.The short-time Fourier
transform (STFT) is used to detect the formant maximum peaks so as
to accurately estimate theheart rate. Compared with traditional
contact pulse oximeter, the average accuracy of the proposed
non-contact heart rateextraction method exceeds 95%. The proposed
non-contact heart rate extraction method is expected to play an
importantrole in modern medical applications.
Keywords electrocardiogram, feature extraction, heart rate,
short-time Fourier transform, vowel speech signal
1 Introduction
It is known that there are more and more heartpatients. This
growth motivates researchers to developtools that monitor the heart
rate. During athletic ac-tivities, it is also desirable to monitor
the heart rate, toachieve optimal results and to insure personal
safety[1].From a medical point of view, the measurement ofheart
rate varies from investigations of central regu-lations of
autonomic state, to studies of fundamentallinks between
psychological processes and physiologicalfunctions, to evaluations
of cognitive developments andclinical risks[2]. Heart rate is
traditionally measuredby detecting arterial pulsation. The heart
electricalactivities are measured by electrocardiogram (ECG).ECG[1]
is an important non-invasive diagnostic tool forassessing the
condition of the human heart. Each beatis made up of a series of
waves: P-wave, QRS complex,T-wave and occasionally a U-wave (see
Fig.1). The sig-nal morphology and timing are indicative of
differentclinical conditions: for example, changes in the ST
seg-ment suggest a poor blood supply to heart muscle, whilemultiple
P-waves indicate low cardiac output and oftencause clots in the
atria. Recently, many algorithms havebeen developed to analyze ECG
signals using supportvector machine[3], self-organizing maps[4],
etc. Genera-lly speaking, the ECG features can be extracted in
thetime domain[5-7] or in the frequency domain[8-9] using
many feature extraction methods such as the discretewavelet
transform[5-6], Karhunen-Loeve transform[10],Hermitian basis and
other methods[11]. All the men-tioned ECG feature extraction
methods are based onECG signals and are noninvasive contact methods
ofrecording the variations of the bio-potential signals ac-quired
from human skin surface.
Fig.1. Schematic representation of a normal ECG.
Regular PaperIn noninvasive methods, medical experts use
electrodes that are placed on the patients skin to detect
bioelectrical signals such
as ECG signals.2012 Springer Science+Business Media, LLC &
Science Press, China
-
1244 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6
It is known that the demand for contactless heartmonitoring has
increased lately, especially for long du-ration monitoring and for
patients with particular con-ditions, such as burn victims, or
infants at the risk ofsudden infant syndrome. In this paper, a
novel non-contact method of heart rate extraction is proposed.The
proposed method is based on modeling the re-lationship between
speech production of vowel speechsignals and heart activities for
humans. As the humanspeech signals are recorded using standard
microphonesand transmitted via mobile communication media,
theproposed method opens new fields of applications suchas
heartbeat detection and monitoring of far locatedpatients. A
relevant heart rate detection method[12] isbased on two-dimensional
(2-D) spectrum representa-tion; however, it has not provided an
automatic mea-surement of heart rate parameters. On the other
hand,in this work, the relationship between vowel speech
pro-duction and heart activities is modeled to
automaticallyestimate heart rate parameters after recoding a
vowelspeech signal. Moreover, this work handles noise andachieves
better results.
The rest of this paper is organized as follows. Sec-tion 2
introduces the relationship between human vowelspeech production
and heart activities. Section 3 de-scribes the heart rate
extraction from vowel speech sig-nals. Experimental results and
discussion and conclu-sions are discussed in Sections 4 and 5,
respectively.
2 Human Vowel Speech Production and HeartActivities
Human speech signals[13] contain linguistic, expres-sive,
organic and biological information. The source-filter theory of
speech production model considers thehuman acoustic speech output
as the combination of asource of sound energy (e.g., the larynx)
modulated bya transfer function (filter) determined by the shape
ofthe supralaryngeal vocal tract. The result of the men-tioned
combination is a shaped spectrum with broad-band energy peaks. The
supralaryngeal vocal tract,which consists of both the oral and
nasal airways, servesas a time-varying acoustic filter that
suppresses the pas-sage of sound energy at certain frequencies and
allowsits passage at other frequencies[14]. Formants are
thosefrequencies at which local energy maxima are sustainedby the
supralaryngeal vocal tract and are determinedby the overall shape,
length and volume of the vocaltract[15]. Taking into account the
fact that the larynxcontains muscles covered by many blood vessels
that areconnected to the human circulatory system, it is con-cluded
that human heart rates are dynamically relatedto the variations of
vocal cord parameters and directlyrelated to the acoustic
properties of human speech. As a
result, it should be possible to detect changes of
speechproperties that are related to human heart activities
byobtaining the corresponding frequency characteristicsof the vowel
speech signal and the raw ECG data of thesame person. Fig.2 shows
the time domain of the vowelspeech signal and the corresponding ECG
signal aftersuppressing P and T waves using a low pass filter witha
40Hz cut-off frequency. The two signals belong tothe same patient
but they have different spectra (ECGsignal is extremely
oversampled), however they can berepresented in time domain
together, in one scale.
Fig.2. Time domain vowel speech and the ECG signals of the
same male patient. (a) Vowel speech signal (vowel /i:/ like in
the
word email). (b) ECG signal recorded at the same time when
the vowel speech signal is pronounced (after suppressing P
and
T waves using a high pass filter with a 40Hz cut-off
frequency).
STFT (short-time Fourier transform)[16-17] is used tostudy the
frequency characteristics of the vowel speechsignal; the STFT of
the sequence x(m) is defined as:
Xn(ejwi) =m
x(m)w(nm)ejwim. (1)
Taking into consideration thatXn(ejwi) is evaluated fora fixed
n, STFT is the conventional Fourier transformof the windowed signal
x(m)w(nm), evaluated at fre-quency w = wi. Since w(m) is an FIR
(finite impulseresponse) filter of a finite size, if the size of
w(m) islarge, relative to the signal periodicity, then
Xn(ejwi)gives good frequency resolution. On the other hand,
-
Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel
Speech Signals 1245
if the size of w(m) is small, then Xn(ejwi) gives poorfrequency
resolution. To extract heart rates from thevowel speech signal, the
size of w(m) should be less thanthe RR-interval of ECG signal for
the same patient.The STFT spectrogram of vowel speech
(vowel/i:/)recorded immediately after sit-up exercise with
corre-sponding ECG signal of the same patient is shown inFig.3
(horizontal lines are speech formants). It can beseen that the
heart activity (a moment of R wave ap-pearing on ECG signal)
produces a frequency modula-tion of the vowel speech signal for all
formants locatedwithin a 16KHz frequency band (see the blue
verticallines in Fig.3). Accordingly, it is possible to extract
therelevant heart rate information (RR-interval defined asa time
between two sequential R waves of ECG signal)directly from the
spectrogram.
Fig.3. Spectrogram of a vowel speech signal and the
correspond-
ing ECG signal (the vertical blue lines indicate the
frequency
modulation of the speech vowel caused by heart activities).
3 Heart Rate Extraction from Vowel SpeechSignals
Ideally, the STFT spectrogram includes speech for-mants without
any noise. But practically, the heartactivities cause distortion to
all formants, and the dura-tion of the associated distortion is
approximately equalto 0.2 seconds with 100Hz magnitude.
Fig.4 shows an ideal STFT spectrogram for a vowelspeech signal.
It represents a 40 beats per minute(bpm) heart activity which is
the lowest heart rate inreal situations[15]. In order to extract
the relevant heartrate information from the corresponding STFT
spec-trogram, a searching algorithm is proposed to horizon-tally
scan the STFT spectrogram starting from the top(Nyquist frequency)
to the bottom (DC) of the origi-nal speech signal. Each time the
algorithm scans thespectrogram of the speech signal horizontally, a
one-
dimensional (1-D) signal is obtained. Generally, thereare two
possible cases of the horizontal scanning (seeFig.4):
Fig.4. Ideal STFT spectrogram of the vowel speech signal and
the allocation of scanning lines.
When a scanning line passes through a part of thespectrogram
beyond the bounds of the formants, it con-tains background
information only and it is not able toextract a useful 1-D signal
(see the scanning line labeleda in Fig.4).When a scanning line
passes through any part of a
speech formant, it is able to extract a useful 1-D signal(see
the scanning lines labeled b and c in Fig.4).
Each extracted 1-D signal (useful 1-D signal) passesthrough a
5th-order FIR low-pass filter to suppresshigh frequency components.
Finally, a discrete Fouriertransform (DFT) is applied to the
filtered 1-D signals.
Fig.5 shows the extracted useful 1-D signals in timeand
frequency domains for the scanning lines b and c.It is clear that
the amplitude of the 4th harmonic is themaximum. Based on Fourier
transform properties[17],the harmonic number four of a six-second
speech sig-nal corresponds to 0.67Hz frequency and to 40 bits
persecond heart rate (which is exactly the heart rate of
theoriginal signal).
As a result, it is concluded that heart rates can beextracted
using a number of harmonics with maximummagnitudes, i.e., using an
order statistics filter[18] of theSTFT spectrum:
R = k max{Xk|k = 1, 2, . . . , N/2}, (2)
where N is the length of the extracted 1-D signal x(n),the
symbol denotes operation index of, and fi-nally Xk is the vector of
magnitudes of the DFT of theextracted 1-D signal x(n). The spectrum
of the DFTX(h) is defined as:
X(k) = Nn=0
x(n)ej2pikn/N, (3)
-
1246 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6
Fig.5. Time and frequency domains of the 1-D signal extracted by
the scanning lines b and c. It is noted that the amplitude of the
4th
harmonic is the maximum that corresponds to 40 bpm heart rate.
(a) Time domain of the 1-D signal extracted by the scanning line
b.
(b) Frequency domain of the 1-D signal extracted by the scanning
line b. (c) Time domain of the 1-D signal extracted by the
scanning
line c. (d) Frequency domain of the 1-D signal extracted by the
scanning line c.
where k = 0..N/2 is the number of harmonics of theone-sided
spectrum. Applying the order statistics filter(2) and the DFT (3)
on each scanning line of the spec-trogram (see Fig.4) estimates the
2-D spectrum for theheart rate (see Figs. 68).
In Fig.6, the position on x (the heart rate frequencyis
estimated in bpm) of the 2-D spectrum is estimatedwith respect to
the Nyquist and sampling theories. Ap-plying a typical STFT (see
(1)) of a vowel speech signalx(m) for L a hamming window of lengthN
samples pro-duces L/2 coefficients, and these coefficients
representthe frequencies from 0(DC) to fs/2 on the y (verti-cal
axis) of the 2-D spectrum, where fs is the samplingrate of the
original vowel speech recorded signal. On theother hand, applying
Fourier transform (see (3)) pro-duces ts/2L frequency harmonics
from 0(DC) to fs/2Lon the x (horizontal axis) of the 2-D spectrum,
wherets is duration of x(m) signal in seconds. Accordingly,the
heart frequency is evaluated using two relations:tr/2L : fs/2L and
xp : fhr, where tr is the recodingperiod of the input vowel speech
signal, xp denotes aposition for a candidate heart rate frequency
in the 2-Dspectrum, fhr is the estimated heart rate frequency
andthe operator : denotes a relation. Solving the above
relations around fhr estimates the heart rate (heart rate= 60
(xp/ts)). It is known that the heart rate variesfrom 40 to 200 bpm,
and the formants of the humanvowel speech vary from 1 to 6KHz. As a
result, thesearch region (region of interest) is the area
boundedfrom 40 to 200 bpm on the x axis and from 1 to6KHz on the y
axis. Taking into account the factthat heart rate usually are
estimated in bpm while fre-quency of harmonics in Hertz, we have
organized thehorizontal axis of 2-D spectrum in bpm units
usingrelation[2]: bpm = 60/fhr (see Fig.6).
4 Experimental Results and Discussion
4.1 Testing the Robustness of Proposed HeartRate Extraction
Method
To test the robustness and accuracy of the pro-posed heart rate
extraction method using human vowelspeech, a heart rate detection
system (HRDS) is imple-mented. The HRDS is able to capture and
analyze thefrequency characteristics of human vowel speech andECG
signals. HRDS uses a standard personal computerwith a two-channel
sound card to achieve the functionof analog-to-digital conversion.
The left channel of the
-
Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel
Speech Signals 1247
Fig.6. Heart rate estimation for an ideal spectrogram. (a) Ideal
spectrogram noised by machine noise and side talk. (b) Histogram
of
the spectrogram. (c) Same as (a) after filtering by (4). (d) 2-D
spectrum estimation with 75 bpm heart rate for the noisy
spectrogram
in (a).
Fig.7. Heart rate estimation for ideal and noisy
spectrograms.
(a) 2-D spectrum extraction for the heart rate estimation
with
40 bpm heart rate modulations in Fig.4. (b) 2-D spectrum ex-
traction for the heart rate estimation with 100 bpm heart
rate
modulations.
sound card is connected to a standard microphone;the microphone
frequency response range varies from100Hz to 16KHz, while the right
channel is connected
to a portable ECG recorder (Cardiette AR600). Thevowel speech
and the ECG signals are recorded concur-rently within six-second
periods with a 44KHz sam-pling frequency. A Matlab code processes
the recordedvowel speech and the ECG signals. In our
experiments,the P and T waves of ECG signal are suppressed bythe
microphone amplifier that contains a high pass fil-ter with a 40-Hz
cut-off frequency.
To study the frequency characteristics of the vowelspeech
signal, the STFT parameter w(m) is set to 2 048samples, considering
a 44KHz sampling frequency givesa 21-Hz spectrum resolution.
Moreover, an overlap be-tween windows is set to 1 800 samples which
producesa 41-millisecond time resolution.
Fig.3 illustrates an example of a real spectrogram fora vowel
speech signal; it plots frequency against timewith color that is
used to indicate the relative strengthsof the varied frequency
components (color varies fromdark red indicating low power
components to orangeindicating high power components). It is clear
thatthe spectrogram contains speech formants (formantsare the
observed high power spectral density values inorange color). In our
heart rate extraction method, theorder statistics filter (see (2))
deals with these high pow-ered spectral values and it ignores
background speech
-
1248 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6
Fig.8. Heart rate extraction using different vowel speech
signals. (a) A vowel speech signal /i:/. (b) Heart rate estimation
of a patient
using vowel /i:/ like in the word email. (c) Vowel speech signal
/ e:/. (d) Heart rate estimation of a patient using vowel / :/ like
in
the word four.
information. Unfortunately, the spectrogram may con-tain noise
(machine noise or a silent side talk). Fig.6(a)illustrates a noisy
ideal spectrogram with 75 bpm heartrate modulations. In general,
the formants can be af-fected by the following noise sources (see
Fig.6(a)): The variation of the vowel speech tones during the
vowel speech recording: some volunteers (patients) arenot able
to keep the same tone of vowel speech duringthe six-second
recording. This problem is common forpatients with insufficient
respiratory lung volume. Machine noise is a high-amplitude noise
that has
certain allocations of harmonics in the frequency do-main (see
the horizontal lines in the 2-D spectrogram inFig.6(a)). Machine
noise is converted to low frequencynoise in the 2-D spectrum; it
appears on the left sideof the x-scale in the 2-D spectrum, however
it is lo-cated outside the bounded search area of the spectrum(out
of the region of interest). As a result, it is ignoredduring
analysis. Side talk is the 6KHz flashes along time axis in
the 2-D spectrogram in Fig.6(a). Side talk noise maypotentially
generate a high frequency noise in the 2-Dspectrum and it is
located on the right side position ofthe region of interest. It is
completely suppressed usingthe threshold filter (see (4)).
Our heart rate extraction method ignores back-grounds during
analysis and treats them as outliers thatappear to be inconsistent
with the remaining useful partof the 2-D spectrogram and their
existence may lead towrong heart rate extraction results. Methods
based onstatistical data distributions, prior knowledge of the
na-ture of distributions, expected number of outliers, andthe
nature of expected outliers are used to detect out-liers and are
treated by histogram shape, clustering, en-tropy, and attribute
similarity threshold methods[19-20].In this work, the histogram of
the noisy spectrogram isanalyzed (see Fig.6(b)) and a one-sided
threshold im-
age filter is implemented to reduce the side talk noiseusing (4)
(see Fig.6(c)).
X (m,n) ={X(m,n), if X(m,n) > 0.1max(X),0, if X(m,n) <
0.1max(X),
(4)
where X(m,n) is the pixel value of the original spectro-gram
image of the vowel speech signal located in them-th column, n-th
row; X (m,n) is the corresponding fil-tered spectrogram image and
max(X) is the maximumbrightness of original spectrogram image. The
filter isable to suppress spectrogram pixels with brightness
lessthan 10% of maximum brightness. Fig.6(d) shows the2-D spectrum
estimation with 75 bpm heart rate forthe noisy spectrogram in
Fig.6(a). The heart rate ex-traction result confirms the accuracy
of the proposedheart rate extraction method. It is obvious that
thereis an excellent agreement with the heart activity thatrefers
to 75 Hz heart rate frequency.
Fig.7(a) shows the 2-D spectrum estimation with 40bpm heart rate
for the ideal spectrogram in Fig.4. Itis noted that the horizontal
axis of the 2-D spectrumrepresents heart rate frequency graded in
bpm; on theother hand, the vertical axis represents frequency
ofspeech formants graded in KHz. It is obvious thatthere is an
excellent agreement between the heart rateof original signal and
the heart rate evaluated by 2-Dspectrum.
This agreement is confirmed by the mentioned har-monic number
four of the six-second speech signal thatcorresponds to 0.67Hz
frequency and to a 40-bpm heartrate. As matter of fact, it conforms
with our conclusionthat human heart rates are extracted using a
numberof maximum magnitude harmonics. Fig.7(b) shows the2-D
spectrum extraction for the heart rate estimationwith 100-bpm heart
rate for a real patient, and the
-
Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel
Speech Signals 1249
two parallel lines on the top of the 2-D spectrogramrepresent
some high frequency noise which is generatedby the signal itself
and they are discarded by the pro-posed heart rate extraction
algorithm. However, it ismentioned before that the search region
(the region ofinterest) is bounded from 40 to 200 bpm. As a
re-sult, the proposed method starts searching the spectro-gram from
1 to 6KHz to extract the heart rate. Con-sequently, the proposed
heart rate extraction methodevaluates 100 bpm as the average heart
rate.
4.2 Testing the Accuracy of Proposed HeartRate Extraction
Method
With reference to the properties of DFT[17], it isknown that the
first harmonic frequency of a signalis related to its duration and
all other harmonics aremultiples of its first harmonic frequency;
in our pro-posed heart rate extraction method, vowel speech
sig-nals are six seconds in length and the correspondingfirst
harmonic frequency is 0.17Hz. Consequently, er-ror of heart rate
estimation in bpm can be evaluated byE = 60/ts, where ts is the
time duration of the originalvowel speech signal in seconds. Given
that ts does notexceed the six-second limit, the heart rate error
is al-ways acceptable (5 bpm). Experimentally, error doesnot exceed
9% as shown in Table 1.
Table 1Heart Rate Extraction Using the ProposedHeart Rate
Extraction Method
ID Age Oximeter Our Heart Percentage
(Year) Rate Extraction Error
1 22 105 98 6.67
2 25 89 85 4.49
3 27 131 120 8.40
4 27 95 90 5.26
5 23 105 100 4.76
6 25 103 100 2.91
7 23 98 90 8.16
8 27 135 130 3.70
9 26 119 124 4.20
10 27 123 120 2.44
11 23 113 120 6.19
12 23 82 80 2.44
13 22 115 120 4.35
14 23 112 120 7.14
15 23 105 98 6.67
16 15 129 125 3.10
17 8 142 150 5.63
18 7 134 128 4.48
19 38 111 105 5.41
20 39 92 90 2.17
21 37 105 100 4.76
Average percentage error (Accuracy = 95.08) 4.92
To address the accuracy of the proposed method us-ing different
English vowels, a pilot study (see Fig.8) isconducted for randomly
selected speakers; each of themis asked to pronounce different
English vowels. Resultsreveal that vowel/i:/ (like in the word
email) is moreapplicable for our proposed algorithm. Fig.8(a)
showsthe spectrum of the vowel speech signal for a 115 bpmheart
rate patient who pronounced vowel/i:/ (like inthe word email), and
the estimated heart rate is esti-mated as 110 bpm according to the
position of pointsinside the region of interest (Fig.8(b)).
Fig.8(c) showsthe spectrum of the vowel speech signal / :/ (like
theword four) for a 150 bpm heart rate patient, and the es-timated
heart rate is 140 bpm according to the positionof points inside the
region of interest (see Fig.8(d)).
It is known[21] that a physical activity increases heartrate,
cardiac output, and pulse amplitude. Immedia-tely before speech
vowel signal recording, volunteersand patients are requested to
make a number of sit-up exercises to intensify the influence of
heart activity(in our experiments, heart rate exceeded 120 beats
perminute). 21 volunteers (739 years old) are requestedto pronounce
an English vowel (vowel /i:/). And eachsix-second period vowel
speech signal is recorded by thementioned microphone, filtered by
the 40Hz cut-off lowpass filter, sampled using a 44KHz sampling
frequency,transformed by STFT with the mentioned
parameters,horizontally scanned by the scanning lines to producethe
1-D signals. Each extracted 1-D signal is filtered bya 5th-order
FIR low-pass filter and finally, a fast Fouriertransform is applied
to the filtered 1-D signal. Noise issuppressed using the threshold
filter. Finally, the ap-plication of the order statistics filter
and the Fouriertransform on each scanning line of spectrogram
yieldsthe 2-D spectrum, a bounded region (region of inter-est is
described in Section 3) of the 2-D spectrum issearched to extract
the heart rate, and finally the heartrate is estimated.
Table 1 summarizes the results of applying our non-contact heart
rate extraction method and the contacttraditional pulse oximeters
on the 21 volunteers. Cor-relation coefficient is 0.953, while the
average percent-age error is 4.92 and the root mean square error
is5.936. The heart rate estimation results of the pro-posed method
are analyzed by a paired t-test. Allanalyses and tests are
conducted in an explorative man-ner on a 5% level of significance.
The computationsare performed with the statistical software tool
EasyFit5.5. The mean (M) and the standard deviation (SD) ofthe
oximeter results are 111.571 4 and 16.326 6 respec-tively. On the
other hand, the Mand SD of the proposed
The 21 volunteers are patients and students. They were randomly
selected and when agreed, they recorded their English speechvowel
signals. Among them, there are 11 males and 10 females; their ages
vary from 7 to 39 years old (see Table 1).
-
1250 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6
method are 109.190 5 and 18.123 5 respectively. The
hy-pothesized mean difference is set as a null hypothesis(i.e.,
M1M2, where M1 is the mean of the oximetersresults and M2 is the
mean of the proposed methodsresults), is set to 0.05, the total
degree of freedomis 39.571 7, difference in sample mean is 2.381 0,
t-teststatistic is 0, the two-tailed test lower and upper
criticalvalues are 2.022 7 and 2.022 7, p-value is 1, the
confi-dence interval varies from 8.385 8 to 13.147 7 and theerror
margin is 10.766 8.
It is clear that our proposed heart rate extractionmethod works
better than the non-automatic heart rateextraction method[12] in
term of accuracy (95.08% vs92%). Moreover, our proposed heart rate
extractionmethod handles noise. As a result, the average error
ismuch less than that of [12] (4.92% vs 15%).
4.3 Discussion
It is known that the error of heart rate evaluationusing
traditional pulse oximeters (contact-based met-hods) does not
exceed 2%; on the other hand, using ourproposed heart rate
extraction method (a non-contactmethod), the average percentage
error does not exceed5% (the best accuracy of the proposed heart
rate ex-traction is 97.82%) unless the vowel speech recordingis
less than six seconds. However, the error does notexceed 9% (the
lowest accuracy of the proposed heartrate extraction is 91.60%) for
patients with insufficientrespiratory lung volume who are not able
to keep thesame tone of vowel speech during the six-second
record-ing or when the recording is less than six seconds.
Fi-nally, it should be noted that contact methods (tradi-tional
pulse oximeters and ECG evaluation methods)and non-contact methods
(our proposed heart rate ex-traction method) are not directly
comparable. Never-theless, our proposed heart rate extraction
method isapplicable especially in situation where
contact-basedheart rate extraction methods are not available or
in-applicable, for example, if patients are located in farregions
and only recording their speech signal is conve-nient using their
mobile phones.
It is obvious that the proposed heart rate extractionmethod is
not sensitive to the amplitude of formants,nor to the slope of
formants, and it is able to accu-rately extract the heart rate in
the presence of noise.The proposed approach is robust and is able
to workin noisy environments; it discards noise (machine noiseand
side talk noise). However, the worst error of theproposed heart
rate extraction method does not exceed5%, on the other hand, the
error of the oximeter is 2%.Generally speaking, the proposed heart
rate extractionmethod is promising. It is known that heartbeats
aredirectly proportional to the level of activity of a person;
more blood is needed when a person is exercising thanwhen he or
she is at rest. To some level, the heart-beats for transplanted
hearts are also proportional tothe level of activity of a person.
On the other hand, theheartbeats for artificial hearts are fixed
unless they areadapted with the patients activity. We have not
testedthe proposed heart rate extraction method for personswith
artificial or transplanted hearts (we cannot recordvowel speech
signals for such patients).
5 Conclusions
In the modern mobile communication era, we believethat
contactless heart rate monitoring methods are re-quired especially
for heart patients. In this work, anon-contact heart rate
extraction method from vowelspeech signals is proposed. The
proposed method isbased on modeling the relationship between speech
pro-duction of vowel speech signals and heart activities forhumans.
It uses STFT to estimate heart rates and cansuccessfully handles
machine noise and side talk. Ex-perimental results reveal that the
proposed method isexpected to play an important role in modern
medicalapplications. In spite of not outperforming the tradi-tional
pulse oximeters, the accuracy of the proposedheart rate extraction
method is practically accepted.We do not claim that the proposed
method works forpersons with artificial or transplanted hearts.
However,dealing with such patients is left for future work.
Heartpathology using vowel speech signals is also left for fu-ture
work.
References
[1] Nelson M, Rejeski W, Blair S et al. Physical activity and
pub-lic health in older adults: Recommendation from the Amer-ican
college of sports, medicine and the American heart as-sociation.
Medicine & Science in Sports & Exercise, 2007,39(8):
1435-1445.
[2] Berntson G, Bigger J, Eckberg D et al. Heart rate
variability:Origins, methods, and interpretive caveats.
Psychophysiol-ogy, 1997, 34(6): 623-648.
[3] Georgoulas G, Stylios C, Groumpos P. Predicting the risk
ofmetabolic acidosis for newborns based on fetal heart rate sig-nal
classification using support vector machines. IEEE Trans.Biomedical
Engineering, 2006, 53(5): 875-884.
[4] Vasios G, Prentza A, Blana D et al. Classification of fe-tal
heart rate tracings based on wavelet-transform and
self-organizing-map neural networks. In Proc. the 23rd AnnualInt.
Conf. IEEE Engineering in Medicine and Biology Soci-ety, October
2001, Vol.2, pp.1633-1636.
[5] Linh T, Osowski S, Stodolski M. On-line heart beat
recog-nition using Hermite polynomials and neuro-fuzzy network.IEEE
Trans. Instrum. Meas., 2003, 52(4): 1224-1231.
[6] Li S, Ji Y, Liu G. Optimal wavelet basis selection of
waveletshrinkage for ECG de-noising. In Proc. Int. Conf.
Manage-ment and Service Science, September 2009, pp.1-4.
[7] Hu Y, Palreddy S, Tompkins W. A patient-adaptable ECGbeat
classifier using a mixture of experts approach. IEEETrans.
Biomedical Engineering, 1997, 44(9): 891-900.
-
Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel
Speech Signals 1251
[8] Moraes J, Seixas M, Vilani F, Costa E. A real time
QRScomplex classification method using Mahalanobis distance.
InProc. Computers in Cardiology, Sept. 2002, pp.201-204.
[9] Papaloukas C, Fotiadis D, Likas A, Michalis L.
Automatedmethods for ischemia detection in long duration ECGs.
Car-diovascular Reviews & reports, 2003, 24(6): 313-319.
[10] Jager F. Feature extraction and shape representation of
am-bulatory electrocardiogram using the Karhunen-Loe`ve trans-form.
Electrotechnical Review, 2002, 69(2): 83-89.
[11] Cuesta-Frau D, Perez-Cortes J, Andreu-Garca G, Novak
D.Feature extraction methods applied to the clustering of
elec-trocardiographic signals: A comparative study. In Proc.
the16th Int. Conf. Pattern Recognition, August 2002,
Vol.3,pp.961-964.
[12] Skopin D, Baglikov S. Heartbeat feature extraction
fromvowel speech signal using 2D spectrum representation. InProc.
the 4th Int. Conf. Information Technology, June 2009.
[13] Pickett J. The Acoustics of Speech Communication:
Funda-mentals, Speech Perception Theory, and Technology. Allyn
&Bacon, 1998.
[14] Browman C, Goldstein L. Representation and reality:
Physi-cal systems and phonological structure. Journal of
Phonetics,1990, 18: 411-424.
[15] Maton A, Hopkins J, McLaughlin C et al. Human Biologyand
Health. New Jersey, USA: Prentice Hall, 1993.
[16] Allen J, Rabiner L. A unified approach to short-time
Fourieranalysis and synthesis. Proceedings of IEEE, 1977,
65(11):1558-1564.
[17] Cohen L. Time-Frequency Analysis: Theory and Applica-tions.
New Jersey, USA: Prentice Hall, 1994.
[18] Gonzales R, Woods R. Digital Image Processing (3rd
edition),Prentice Hall, 2007.
[19] Sezgin M, Sankur B. Survey over image thresholding
tech-niques and quantitative performance evaluation. Journal
ofElectronic Imaging, 2004, 13(1): 146-168.
[20] James A, Dimitrijev S. Inter-image outliers and their
applica-tion to image classification. Pattern recognition, 2010,
43(12):4101-4112.
[21] Turkbey E, Jorgensen N, Johnson W et al. Physical
activityand physiological cardiac remodelling in a community
setting:The Multi-Ethnic Study of Atherosclerosis (MESA). Heartand
Education in Heart, 2010, 96(1): 42-48.
Abdelwadood Mesleh receivedhis B.Eng and M.Sc. degrees
incomputer engineering from ShanghaiUniversity, China, in 1995 and
1998respectively. He worked as a researchand teaching assistant in
the Elec-trical Engineering Department, HongKong University of
Science and Tech-nology, China, from 2004 to 2005. Hereceived his
Ph.D. degree in feature
selection using ant colony optimization (ACO) for Arabictext
articles from the Arab Academy for Banking and Fi-nancial Sciences,
Jordan, in 2008. Since 2008, Dr. Meslehhas been an assistant
professor in the Computer Engineer-ing Department, Faculty of
Engineering Technology, at Al-Balqa Applied University. His
research interests includeoptimization, fuzzy logic, generic
algorithm, ACO, Arabicnatural language processing, feature subset
selection, Ara-bic speech recognition, MANETs, parallel processing,
crypt-analysis, medical image and signal processing,
operatingsystems etc.
Dmitriy Skopin received hisM.Sc. and Ph.D. degrees in com-puter
engineering from Kursk StateTechnical University, Russia, in1995
and 1998, respectively. SinceSeptember 2003 until August 2005he had
been an associate professorwith the Kursk State Technical
Uni-versity. Since September 2005 untilpresent time he is a staff
member of
Al-Balqa Applied University, Jordan. His research interestsfocus
on signal and image processing, advanced program-ming, and computer
graphics.
Sergey Baglikov received hisPh.D. degree in computer
engineer-ing from the Kursk State Techni-cal University in 1998. He
is cur-rently the president of Help Medi-com Group which is
specialized onnovel medical equipments and high-technology
industries. His researchinterests are the condition monitor-ing
using infrared sensors and the
control of power electronics, nanotechnologies and digitalsignal
processing.
Anas Quteishat received theBEng degree in electronics
fromPrincess Sumaya University of Tech-nology, Jordan, in 2003. He
receivedhis MSc degree in electronic systemsdesign and Ph.D. degree
in compu-tational intelligence from Universityof Science Malaysia
in 2005 and 2008respectively. Currently he is an as-sistant
professor at Al-Balqa Applied
University in Jordan. His research interests include neu-ral
networks, multi-agent systems, pattern classification andrule
extraction.
Heart Rate Extraction from Vowel Speech
SignalsAbstractIntroductionHuman Vowel Speech Production and Heart
ActivitiesHeart Rate Extraction from Vowel Speech
SignalsExperimental Results and DiscussionTesting the Robustness of
Proposed Heart Rate Extraction MethodTesting the Accuracy of
Proposed Heart Rate Extraction MethodDiscussion
ConclusionsReferences