Top Banner
Mesleh A, Skopin D, Baglikov S et al. Heart rate extraction from vowel speech signals. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(6): 1243–1251 Nov. 2012. DOI 10.1007/s11390-012-1300-6 Heart Rate Extraction from Vowel Speech Signals Abdelwadood Mesleh 1 , Dmitriy Skopin 1 , Sergey Baglikov 2 , and Anas Quteishat 1 1 Computer Engineering Department, Faculty of Engineering Technology, Al-Balqa’ Applied University, Amman, Jordan 2 Help MediCom Group, Kursk, Russia E-mail: [email protected]; [email protected]; [email protected]; [email protected] Received November 17, 2011; revised May 14, 2012. Abstract This paper presents a novel non-contact heart rate extraction method from vowel speech signals. The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans where it is observed that the moment of heart beat causes a short increment (evolution) of vowel speech formants. The short-time Fourier transform (STFT) is used to detect the formant maximum peaks so as to accurately estimate the heart rate. Compared with traditional contact pulse oximeter, the average accuracy of the proposed non-contact heart rate extraction method exceeds 95%. The proposed non-contact heart rate extraction method is expected to play an important role in modern medical applications. Keywords electrocardiogram, feature extraction, heart rate, short-time Fourier transform, vowel speech signal 1 Introduction It is known that there are more and more heart patients. This growth motivates researchers to develop tools that monitor the heart rate. During athletic ac- tivities, it is also desirable to monitor the heart rate, to achieve optimal results and to insure personal safety [1] . From a medical point of view, the measurement of heart rate varies from investigations of central regu- lations of autonomic state, to studies of fundamental links between psychological processes and physiological functions, to evaluations of cognitive developments and clinical risks [2] . Heart rate is traditionally measured by detecting arterial pulsation. The heart electrical activities are measured by electrocardiogram (ECG). ECG [1] is an important non-invasive diagnostic tool for assessing the condition of the human heart. Each beat is made up of a series of waves: P-wave, QRS complex, T-wave and occasionally a U-wave (see Fig.1). The sig- nal morphology and timing are indicative of different clinical conditions: for example, changes in the ST seg- ment suggest a poor blood supply to heart muscle, while multiple P-waves indicate low cardiac output and often cause clots in the atria. Recently, many algorithms have been developed to analyze ECG signals using support vector machine [3] , self-organizing maps [4] , etc. Genera- lly speaking, the ECG features can be extracted in the time domain [5-7] or in the frequency domain [8-9] using many feature extraction methods such as the discrete wavelet transform [5-6] , Karhunen-Loeve transform [10] , Hermitian basis and other methods [11] . All the men- tioned ECG feature extraction methods are based on ECG signals and are noninvasive contact methods of recording the variations of the bio-potential signals ac- quired from human skin surface. Fig.1. Schematic representation of a normal ECG. Regular Paper In noninvasive methods, medical experts use electrodes that are placed on the patient’s skin to detect bioelectrical signals such as ECG signals. 2012 Springer Science + Business Media, LLC & Science Press, China
9

Heart Rate Xtraction From Vowel Speech Signals

Nov 25, 2015

Download

Documents

Poonam Kaur

heart rate
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Mesleh A, Skopin D, Baglikov S et al. Heart rate extraction from vowel speech signals. JOURNAL OF COMPUTER

    SCIENCE AND TECHNOLOGY 27(6): 12431251 Nov. 2012. DOI 10.1007/s11390-012-1300-6

    Heart Rate Extraction from Vowel Speech Signals

    Abdelwadood Mesleh1, Dmitriy Skopin1, Sergey Baglikov2, and Anas Quteishat1

    1Computer Engineering Department, Faculty of Engineering Technology, Al-Balqa Applied University, Amman, Jordan2Help MediCom Group, Kursk, Russia

    E-mail: [email protected]; [email protected]; [email protected]; [email protected]

    Received November 17, 2011; revised May 14, 2012.

    Abstract This paper presents a novel non-contact heart rate extraction method from vowel speech signals. The proposedmethod is based on modeling the relationship between speech production of vowel speech signals and heart activities forhumans where it is observed that the moment of heart beat causes a short increment (evolution) of vowel speech formants.The short-time Fourier transform (STFT) is used to detect the formant maximum peaks so as to accurately estimate theheart rate. Compared with traditional contact pulse oximeter, the average accuracy of the proposed non-contact heart rateextraction method exceeds 95%. The proposed non-contact heart rate extraction method is expected to play an importantrole in modern medical applications.

    Keywords electrocardiogram, feature extraction, heart rate, short-time Fourier transform, vowel speech signal

    1 Introduction

    It is known that there are more and more heartpatients. This growth motivates researchers to developtools that monitor the heart rate. During athletic ac-tivities, it is also desirable to monitor the heart rate, toachieve optimal results and to insure personal safety[1].From a medical point of view, the measurement ofheart rate varies from investigations of central regu-lations of autonomic state, to studies of fundamentallinks between psychological processes and physiologicalfunctions, to evaluations of cognitive developments andclinical risks[2]. Heart rate is traditionally measuredby detecting arterial pulsation. The heart electricalactivities are measured by electrocardiogram (ECG).ECG[1] is an important non-invasive diagnostic tool forassessing the condition of the human heart. Each beatis made up of a series of waves: P-wave, QRS complex,T-wave and occasionally a U-wave (see Fig.1). The sig-nal morphology and timing are indicative of differentclinical conditions: for example, changes in the ST seg-ment suggest a poor blood supply to heart muscle, whilemultiple P-waves indicate low cardiac output and oftencause clots in the atria. Recently, many algorithms havebeen developed to analyze ECG signals using supportvector machine[3], self-organizing maps[4], etc. Genera-lly speaking, the ECG features can be extracted in thetime domain[5-7] or in the frequency domain[8-9] using

    many feature extraction methods such as the discretewavelet transform[5-6], Karhunen-Loeve transform[10],Hermitian basis and other methods[11]. All the men-tioned ECG feature extraction methods are based onECG signals and are noninvasive contact methods ofrecording the variations of the bio-potential signals ac-quired from human skin surface.

    Fig.1. Schematic representation of a normal ECG.

    Regular PaperIn noninvasive methods, medical experts use electrodes that are placed on the patients skin to detect bioelectrical signals such

    as ECG signals.2012 Springer Science+Business Media, LLC & Science Press, China

  • 1244 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

    It is known that the demand for contactless heartmonitoring has increased lately, especially for long du-ration monitoring and for patients with particular con-ditions, such as burn victims, or infants at the risk ofsudden infant syndrome. In this paper, a novel non-contact method of heart rate extraction is proposed.The proposed method is based on modeling the re-lationship between speech production of vowel speechsignals and heart activities for humans. As the humanspeech signals are recorded using standard microphonesand transmitted via mobile communication media, theproposed method opens new fields of applications suchas heartbeat detection and monitoring of far locatedpatients. A relevant heart rate detection method[12] isbased on two-dimensional (2-D) spectrum representa-tion; however, it has not provided an automatic mea-surement of heart rate parameters. On the other hand,in this work, the relationship between vowel speech pro-duction and heart activities is modeled to automaticallyestimate heart rate parameters after recoding a vowelspeech signal. Moreover, this work handles noise andachieves better results.

    The rest of this paper is organized as follows. Sec-tion 2 introduces the relationship between human vowelspeech production and heart activities. Section 3 de-scribes the heart rate extraction from vowel speech sig-nals. Experimental results and discussion and conclu-sions are discussed in Sections 4 and 5, respectively.

    2 Human Vowel Speech Production and HeartActivities

    Human speech signals[13] contain linguistic, expres-sive, organic and biological information. The source-filter theory of speech production model considers thehuman acoustic speech output as the combination of asource of sound energy (e.g., the larynx) modulated bya transfer function (filter) determined by the shape ofthe supralaryngeal vocal tract. The result of the men-tioned combination is a shaped spectrum with broad-band energy peaks. The supralaryngeal vocal tract,which consists of both the oral and nasal airways, servesas a time-varying acoustic filter that suppresses the pas-sage of sound energy at certain frequencies and allowsits passage at other frequencies[14]. Formants are thosefrequencies at which local energy maxima are sustainedby the supralaryngeal vocal tract and are determinedby the overall shape, length and volume of the vocaltract[15]. Taking into account the fact that the larynxcontains muscles covered by many blood vessels that areconnected to the human circulatory system, it is con-cluded that human heart rates are dynamically relatedto the variations of vocal cord parameters and directlyrelated to the acoustic properties of human speech. As a

    result, it should be possible to detect changes of speechproperties that are related to human heart activities byobtaining the corresponding frequency characteristicsof the vowel speech signal and the raw ECG data of thesame person. Fig.2 shows the time domain of the vowelspeech signal and the corresponding ECG signal aftersuppressing P and T waves using a low pass filter witha 40Hz cut-off frequency. The two signals belong tothe same patient but they have different spectra (ECGsignal is extremely oversampled), however they can berepresented in time domain together, in one scale.

    Fig.2. Time domain vowel speech and the ECG signals of the

    same male patient. (a) Vowel speech signal (vowel /i:/ like in the

    word email). (b) ECG signal recorded at the same time when

    the vowel speech signal is pronounced (after suppressing P and

    T waves using a high pass filter with a 40Hz cut-off frequency).

    STFT (short-time Fourier transform)[16-17] is used tostudy the frequency characteristics of the vowel speechsignal; the STFT of the sequence x(m) is defined as:

    Xn(ejwi) =m

    x(m)w(nm)ejwim. (1)

    Taking into consideration thatXn(ejwi) is evaluated fora fixed n, STFT is the conventional Fourier transformof the windowed signal x(m)w(nm), evaluated at fre-quency w = wi. Since w(m) is an FIR (finite impulseresponse) filter of a finite size, if the size of w(m) islarge, relative to the signal periodicity, then Xn(ejwi)gives good frequency resolution. On the other hand,

  • Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals 1245

    if the size of w(m) is small, then Xn(ejwi) gives poorfrequency resolution. To extract heart rates from thevowel speech signal, the size of w(m) should be less thanthe RR-interval of ECG signal for the same patient.The STFT spectrogram of vowel speech (vowel/i:/)recorded immediately after sit-up exercise with corre-sponding ECG signal of the same patient is shown inFig.3 (horizontal lines are speech formants). It can beseen that the heart activity (a moment of R wave ap-pearing on ECG signal) produces a frequency modula-tion of the vowel speech signal for all formants locatedwithin a 16KHz frequency band (see the blue verticallines in Fig.3). Accordingly, it is possible to extract therelevant heart rate information (RR-interval defined asa time between two sequential R waves of ECG signal)directly from the spectrogram.

    Fig.3. Spectrogram of a vowel speech signal and the correspond-

    ing ECG signal (the vertical blue lines indicate the frequency

    modulation of the speech vowel caused by heart activities).

    3 Heart Rate Extraction from Vowel SpeechSignals

    Ideally, the STFT spectrogram includes speech for-mants without any noise. But practically, the heartactivities cause distortion to all formants, and the dura-tion of the associated distortion is approximately equalto 0.2 seconds with 100Hz magnitude.

    Fig.4 shows an ideal STFT spectrogram for a vowelspeech signal. It represents a 40 beats per minute(bpm) heart activity which is the lowest heart rate inreal situations[15]. In order to extract the relevant heartrate information from the corresponding STFT spec-trogram, a searching algorithm is proposed to horizon-tally scan the STFT spectrogram starting from the top(Nyquist frequency) to the bottom (DC) of the origi-nal speech signal. Each time the algorithm scans thespectrogram of the speech signal horizontally, a one-

    dimensional (1-D) signal is obtained. Generally, thereare two possible cases of the horizontal scanning (seeFig.4):

    Fig.4. Ideal STFT spectrogram of the vowel speech signal and

    the allocation of scanning lines.

    When a scanning line passes through a part of thespectrogram beyond the bounds of the formants, it con-tains background information only and it is not able toextract a useful 1-D signal (see the scanning line labeleda in Fig.4).When a scanning line passes through any part of a

    speech formant, it is able to extract a useful 1-D signal(see the scanning lines labeled b and c in Fig.4).

    Each extracted 1-D signal (useful 1-D signal) passesthrough a 5th-order FIR low-pass filter to suppresshigh frequency components. Finally, a discrete Fouriertransform (DFT) is applied to the filtered 1-D signals.

    Fig.5 shows the extracted useful 1-D signals in timeand frequency domains for the scanning lines b and c.It is clear that the amplitude of the 4th harmonic is themaximum. Based on Fourier transform properties[17],the harmonic number four of a six-second speech sig-nal corresponds to 0.67Hz frequency and to 40 bits persecond heart rate (which is exactly the heart rate of theoriginal signal).

    As a result, it is concluded that heart rates can beextracted using a number of harmonics with maximummagnitudes, i.e., using an order statistics filter[18] of theSTFT spectrum:

    R = k max{Xk|k = 1, 2, . . . , N/2}, (2)

    where N is the length of the extracted 1-D signal x(n),the symbol denotes operation index of, and fi-nally Xk is the vector of magnitudes of the DFT of theextracted 1-D signal x(n). The spectrum of the DFTX(h) is defined as:

    X(k) = Nn=0

    x(n)ej2pikn/N, (3)

  • 1246 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

    Fig.5. Time and frequency domains of the 1-D signal extracted by the scanning lines b and c. It is noted that the amplitude of the 4th

    harmonic is the maximum that corresponds to 40 bpm heart rate. (a) Time domain of the 1-D signal extracted by the scanning line b.

    (b) Frequency domain of the 1-D signal extracted by the scanning line b. (c) Time domain of the 1-D signal extracted by the scanning

    line c. (d) Frequency domain of the 1-D signal extracted by the scanning line c.

    where k = 0..N/2 is the number of harmonics of theone-sided spectrum. Applying the order statistics filter(2) and the DFT (3) on each scanning line of the spec-trogram (see Fig.4) estimates the 2-D spectrum for theheart rate (see Figs. 68).

    In Fig.6, the position on x (the heart rate frequencyis estimated in bpm) of the 2-D spectrum is estimatedwith respect to the Nyquist and sampling theories. Ap-plying a typical STFT (see (1)) of a vowel speech signalx(m) for L a hamming window of lengthN samples pro-duces L/2 coefficients, and these coefficients representthe frequencies from 0(DC) to fs/2 on the y (verti-cal axis) of the 2-D spectrum, where fs is the samplingrate of the original vowel speech recorded signal. On theother hand, applying Fourier transform (see (3)) pro-duces ts/2L frequency harmonics from 0(DC) to fs/2Lon the x (horizontal axis) of the 2-D spectrum, wherets is duration of x(m) signal in seconds. Accordingly,the heart frequency is evaluated using two relations:tr/2L : fs/2L and xp : fhr, where tr is the recodingperiod of the input vowel speech signal, xp denotes aposition for a candidate heart rate frequency in the 2-Dspectrum, fhr is the estimated heart rate frequency andthe operator : denotes a relation. Solving the above

    relations around fhr estimates the heart rate (heart rate= 60 (xp/ts)). It is known that the heart rate variesfrom 40 to 200 bpm, and the formants of the humanvowel speech vary from 1 to 6KHz. As a result, thesearch region (region of interest) is the area boundedfrom 40 to 200 bpm on the x axis and from 1 to6KHz on the y axis. Taking into account the factthat heart rate usually are estimated in bpm while fre-quency of harmonics in Hertz, we have organized thehorizontal axis of 2-D spectrum in bpm units usingrelation[2]: bpm = 60/fhr (see Fig.6).

    4 Experimental Results and Discussion

    4.1 Testing the Robustness of Proposed HeartRate Extraction Method

    To test the robustness and accuracy of the pro-posed heart rate extraction method using human vowelspeech, a heart rate detection system (HRDS) is imple-mented. The HRDS is able to capture and analyze thefrequency characteristics of human vowel speech andECG signals. HRDS uses a standard personal computerwith a two-channel sound card to achieve the functionof analog-to-digital conversion. The left channel of the

  • Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals 1247

    Fig.6. Heart rate estimation for an ideal spectrogram. (a) Ideal spectrogram noised by machine noise and side talk. (b) Histogram of

    the spectrogram. (c) Same as (a) after filtering by (4). (d) 2-D spectrum estimation with 75 bpm heart rate for the noisy spectrogram

    in (a).

    Fig.7. Heart rate estimation for ideal and noisy spectrograms.

    (a) 2-D spectrum extraction for the heart rate estimation with

    40 bpm heart rate modulations in Fig.4. (b) 2-D spectrum ex-

    traction for the heart rate estimation with 100 bpm heart rate

    modulations.

    sound card is connected to a standard microphone;the microphone frequency response range varies from100Hz to 16KHz, while the right channel is connected

    to a portable ECG recorder (Cardiette AR600). Thevowel speech and the ECG signals are recorded concur-rently within six-second periods with a 44KHz sam-pling frequency. A Matlab code processes the recordedvowel speech and the ECG signals. In our experiments,the P and T waves of ECG signal are suppressed bythe microphone amplifier that contains a high pass fil-ter with a 40-Hz cut-off frequency.

    To study the frequency characteristics of the vowelspeech signal, the STFT parameter w(m) is set to 2 048samples, considering a 44KHz sampling frequency givesa 21-Hz spectrum resolution. Moreover, an overlap be-tween windows is set to 1 800 samples which producesa 41-millisecond time resolution.

    Fig.3 illustrates an example of a real spectrogram fora vowel speech signal; it plots frequency against timewith color that is used to indicate the relative strengthsof the varied frequency components (color varies fromdark red indicating low power components to orangeindicating high power components). It is clear thatthe spectrogram contains speech formants (formantsare the observed high power spectral density values inorange color). In our heart rate extraction method, theorder statistics filter (see (2)) deals with these high pow-ered spectral values and it ignores background speech

  • 1248 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

    Fig.8. Heart rate extraction using different vowel speech signals. (a) A vowel speech signal /i:/. (b) Heart rate estimation of a patient

    using vowel /i:/ like in the word email. (c) Vowel speech signal / e:/. (d) Heart rate estimation of a patient using vowel / :/ like in

    the word four.

    information. Unfortunately, the spectrogram may con-tain noise (machine noise or a silent side talk). Fig.6(a)illustrates a noisy ideal spectrogram with 75 bpm heartrate modulations. In general, the formants can be af-fected by the following noise sources (see Fig.6(a)): The variation of the vowel speech tones during the

    vowel speech recording: some volunteers (patients) arenot able to keep the same tone of vowel speech duringthe six-second recording. This problem is common forpatients with insufficient respiratory lung volume. Machine noise is a high-amplitude noise that has

    certain allocations of harmonics in the frequency do-main (see the horizontal lines in the 2-D spectrogram inFig.6(a)). Machine noise is converted to low frequencynoise in the 2-D spectrum; it appears on the left sideof the x-scale in the 2-D spectrum, however it is lo-cated outside the bounded search area of the spectrum(out of the region of interest). As a result, it is ignoredduring analysis. Side talk is the 6KHz flashes along time axis in

    the 2-D spectrogram in Fig.6(a). Side talk noise maypotentially generate a high frequency noise in the 2-Dspectrum and it is located on the right side position ofthe region of interest. It is completely suppressed usingthe threshold filter (see (4)).

    Our heart rate extraction method ignores back-grounds during analysis and treats them as outliers thatappear to be inconsistent with the remaining useful partof the 2-D spectrogram and their existence may lead towrong heart rate extraction results. Methods based onstatistical data distributions, prior knowledge of the na-ture of distributions, expected number of outliers, andthe nature of expected outliers are used to detect out-liers and are treated by histogram shape, clustering, en-tropy, and attribute similarity threshold methods[19-20].In this work, the histogram of the noisy spectrogram isanalyzed (see Fig.6(b)) and a one-sided threshold im-

    age filter is implemented to reduce the side talk noiseusing (4) (see Fig.6(c)).

    X (m,n) ={X(m,n), if X(m,n) > 0.1max(X),0, if X(m,n) < 0.1max(X),

    (4)

    where X(m,n) is the pixel value of the original spectro-gram image of the vowel speech signal located in them-th column, n-th row; X (m,n) is the corresponding fil-tered spectrogram image and max(X) is the maximumbrightness of original spectrogram image. The filter isable to suppress spectrogram pixels with brightness lessthan 10% of maximum brightness. Fig.6(d) shows the2-D spectrum estimation with 75 bpm heart rate forthe noisy spectrogram in Fig.6(a). The heart rate ex-traction result confirms the accuracy of the proposedheart rate extraction method. It is obvious that thereis an excellent agreement with the heart activity thatrefers to 75 Hz heart rate frequency.

    Fig.7(a) shows the 2-D spectrum estimation with 40bpm heart rate for the ideal spectrogram in Fig.4. Itis noted that the horizontal axis of the 2-D spectrumrepresents heart rate frequency graded in bpm; on theother hand, the vertical axis represents frequency ofspeech formants graded in KHz. It is obvious thatthere is an excellent agreement between the heart rateof original signal and the heart rate evaluated by 2-Dspectrum.

    This agreement is confirmed by the mentioned har-monic number four of the six-second speech signal thatcorresponds to 0.67Hz frequency and to a 40-bpm heartrate. As matter of fact, it conforms with our conclusionthat human heart rates are extracted using a numberof maximum magnitude harmonics. Fig.7(b) shows the2-D spectrum extraction for the heart rate estimationwith 100-bpm heart rate for a real patient, and the

  • Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals 1249

    two parallel lines on the top of the 2-D spectrogramrepresent some high frequency noise which is generatedby the signal itself and they are discarded by the pro-posed heart rate extraction algorithm. However, it ismentioned before that the search region (the region ofinterest) is bounded from 40 to 200 bpm. As a re-sult, the proposed method starts searching the spectro-gram from 1 to 6KHz to extract the heart rate. Con-sequently, the proposed heart rate extraction methodevaluates 100 bpm as the average heart rate.

    4.2 Testing the Accuracy of Proposed HeartRate Extraction Method

    With reference to the properties of DFT[17], it isknown that the first harmonic frequency of a signalis related to its duration and all other harmonics aremultiples of its first harmonic frequency; in our pro-posed heart rate extraction method, vowel speech sig-nals are six seconds in length and the correspondingfirst harmonic frequency is 0.17Hz. Consequently, er-ror of heart rate estimation in bpm can be evaluated byE = 60/ts, where ts is the time duration of the originalvowel speech signal in seconds. Given that ts does notexceed the six-second limit, the heart rate error is al-ways acceptable (5 bpm). Experimentally, error doesnot exceed 9% as shown in Table 1.

    Table 1Heart Rate Extraction Using the ProposedHeart Rate Extraction Method

    ID Age Oximeter Our Heart Percentage

    (Year) Rate Extraction Error

    1 22 105 98 6.67

    2 25 89 85 4.49

    3 27 131 120 8.40

    4 27 95 90 5.26

    5 23 105 100 4.76

    6 25 103 100 2.91

    7 23 98 90 8.16

    8 27 135 130 3.70

    9 26 119 124 4.20

    10 27 123 120 2.44

    11 23 113 120 6.19

    12 23 82 80 2.44

    13 22 115 120 4.35

    14 23 112 120 7.14

    15 23 105 98 6.67

    16 15 129 125 3.10

    17 8 142 150 5.63

    18 7 134 128 4.48

    19 38 111 105 5.41

    20 39 92 90 2.17

    21 37 105 100 4.76

    Average percentage error (Accuracy = 95.08) 4.92

    To address the accuracy of the proposed method us-ing different English vowels, a pilot study (see Fig.8) isconducted for randomly selected speakers; each of themis asked to pronounce different English vowels. Resultsreveal that vowel/i:/ (like in the word email) is moreapplicable for our proposed algorithm. Fig.8(a) showsthe spectrum of the vowel speech signal for a 115 bpmheart rate patient who pronounced vowel/i:/ (like inthe word email), and the estimated heart rate is esti-mated as 110 bpm according to the position of pointsinside the region of interest (Fig.8(b)). Fig.8(c) showsthe spectrum of the vowel speech signal / :/ (like theword four) for a 150 bpm heart rate patient, and the es-timated heart rate is 140 bpm according to the positionof points inside the region of interest (see Fig.8(d)).

    It is known[21] that a physical activity increases heartrate, cardiac output, and pulse amplitude. Immedia-tely before speech vowel signal recording, volunteersand patients are requested to make a number of sit-up exercises to intensify the influence of heart activity(in our experiments, heart rate exceeded 120 beats perminute). 21 volunteers (739 years old) are requestedto pronounce an English vowel (vowel /i:/). And eachsix-second period vowel speech signal is recorded by thementioned microphone, filtered by the 40Hz cut-off lowpass filter, sampled using a 44KHz sampling frequency,transformed by STFT with the mentioned parameters,horizontally scanned by the scanning lines to producethe 1-D signals. Each extracted 1-D signal is filtered bya 5th-order FIR low-pass filter and finally, a fast Fouriertransform is applied to the filtered 1-D signal. Noise issuppressed using the threshold filter. Finally, the ap-plication of the order statistics filter and the Fouriertransform on each scanning line of spectrogram yieldsthe 2-D spectrum, a bounded region (region of inter-est is described in Section 3) of the 2-D spectrum issearched to extract the heart rate, and finally the heartrate is estimated.

    Table 1 summarizes the results of applying our non-contact heart rate extraction method and the contacttraditional pulse oximeters on the 21 volunteers. Cor-relation coefficient is 0.953, while the average percent-age error is 4.92 and the root mean square error is5.936. The heart rate estimation results of the pro-posed method are analyzed by a paired t-test. Allanalyses and tests are conducted in an explorative man-ner on a 5% level of significance. The computationsare performed with the statistical software tool EasyFit5.5. The mean (M) and the standard deviation (SD) ofthe oximeter results are 111.571 4 and 16.326 6 respec-tively. On the other hand, the Mand SD of the proposed

    The 21 volunteers are patients and students. They were randomly selected and when agreed, they recorded their English speechvowel signals. Among them, there are 11 males and 10 females; their ages vary from 7 to 39 years old (see Table 1).

  • 1250 J. Comput. Sci. & Technol., Nov. 2012, Vol.27, No.6

    method are 109.190 5 and 18.123 5 respectively. The hy-pothesized mean difference is set as a null hypothesis(i.e., M1M2, where M1 is the mean of the oximetersresults and M2 is the mean of the proposed methodsresults), is set to 0.05, the total degree of freedomis 39.571 7, difference in sample mean is 2.381 0, t-teststatistic is 0, the two-tailed test lower and upper criticalvalues are 2.022 7 and 2.022 7, p-value is 1, the confi-dence interval varies from 8.385 8 to 13.147 7 and theerror margin is 10.766 8.

    It is clear that our proposed heart rate extractionmethod works better than the non-automatic heart rateextraction method[12] in term of accuracy (95.08% vs92%). Moreover, our proposed heart rate extractionmethod handles noise. As a result, the average error ismuch less than that of [12] (4.92% vs 15%).

    4.3 Discussion

    It is known that the error of heart rate evaluationusing traditional pulse oximeters (contact-based met-hods) does not exceed 2%; on the other hand, using ourproposed heart rate extraction method (a non-contactmethod), the average percentage error does not exceed5% (the best accuracy of the proposed heart rate ex-traction is 97.82%) unless the vowel speech recordingis less than six seconds. However, the error does notexceed 9% (the lowest accuracy of the proposed heartrate extraction is 91.60%) for patients with insufficientrespiratory lung volume who are not able to keep thesame tone of vowel speech during the six-second record-ing or when the recording is less than six seconds. Fi-nally, it should be noted that contact methods (tradi-tional pulse oximeters and ECG evaluation methods)and non-contact methods (our proposed heart rate ex-traction method) are not directly comparable. Never-theless, our proposed heart rate extraction method isapplicable especially in situation where contact-basedheart rate extraction methods are not available or in-applicable, for example, if patients are located in farregions and only recording their speech signal is conve-nient using their mobile phones.

    It is obvious that the proposed heart rate extractionmethod is not sensitive to the amplitude of formants,nor to the slope of formants, and it is able to accu-rately extract the heart rate in the presence of noise.The proposed approach is robust and is able to workin noisy environments; it discards noise (machine noiseand side talk noise). However, the worst error of theproposed heart rate extraction method does not exceed5%, on the other hand, the error of the oximeter is 2%.Generally speaking, the proposed heart rate extractionmethod is promising. It is known that heartbeats aredirectly proportional to the level of activity of a person;

    more blood is needed when a person is exercising thanwhen he or she is at rest. To some level, the heart-beats for transplanted hearts are also proportional tothe level of activity of a person. On the other hand, theheartbeats for artificial hearts are fixed unless they areadapted with the patients activity. We have not testedthe proposed heart rate extraction method for personswith artificial or transplanted hearts (we cannot recordvowel speech signals for such patients).

    5 Conclusions

    In the modern mobile communication era, we believethat contactless heart rate monitoring methods are re-quired especially for heart patients. In this work, anon-contact heart rate extraction method from vowelspeech signals is proposed. The proposed method isbased on modeling the relationship between speech pro-duction of vowel speech signals and heart activities forhumans. It uses STFT to estimate heart rates and cansuccessfully handles machine noise and side talk. Ex-perimental results reveal that the proposed method isexpected to play an important role in modern medicalapplications. In spite of not outperforming the tradi-tional pulse oximeters, the accuracy of the proposedheart rate extraction method is practically accepted.We do not claim that the proposed method works forpersons with artificial or transplanted hearts. However,dealing with such patients is left for future work. Heartpathology using vowel speech signals is also left for fu-ture work.

    References

    [1] Nelson M, Rejeski W, Blair S et al. Physical activity and pub-lic health in older adults: Recommendation from the Amer-ican college of sports, medicine and the American heart as-sociation. Medicine & Science in Sports & Exercise, 2007,39(8): 1435-1445.

    [2] Berntson G, Bigger J, Eckberg D et al. Heart rate variability:Origins, methods, and interpretive caveats. Psychophysiol-ogy, 1997, 34(6): 623-648.

    [3] Georgoulas G, Stylios C, Groumpos P. Predicting the risk ofmetabolic acidosis for newborns based on fetal heart rate sig-nal classification using support vector machines. IEEE Trans.Biomedical Engineering, 2006, 53(5): 875-884.

    [4] Vasios G, Prentza A, Blana D et al. Classification of fe-tal heart rate tracings based on wavelet-transform and self-organizing-map neural networks. In Proc. the 23rd AnnualInt. Conf. IEEE Engineering in Medicine and Biology Soci-ety, October 2001, Vol.2, pp.1633-1636.

    [5] Linh T, Osowski S, Stodolski M. On-line heart beat recog-nition using Hermite polynomials and neuro-fuzzy network.IEEE Trans. Instrum. Meas., 2003, 52(4): 1224-1231.

    [6] Li S, Ji Y, Liu G. Optimal wavelet basis selection of waveletshrinkage for ECG de-noising. In Proc. Int. Conf. Manage-ment and Service Science, September 2009, pp.1-4.

    [7] Hu Y, Palreddy S, Tompkins W. A patient-adaptable ECGbeat classifier using a mixture of experts approach. IEEETrans. Biomedical Engineering, 1997, 44(9): 891-900.

  • Abdelwadood Mesleh et al.: Heart Rate Extraction from Vowel Speech Signals 1251

    [8] Moraes J, Seixas M, Vilani F, Costa E. A real time QRScomplex classification method using Mahalanobis distance. InProc. Computers in Cardiology, Sept. 2002, pp.201-204.

    [9] Papaloukas C, Fotiadis D, Likas A, Michalis L. Automatedmethods for ischemia detection in long duration ECGs. Car-diovascular Reviews & reports, 2003, 24(6): 313-319.

    [10] Jager F. Feature extraction and shape representation of am-bulatory electrocardiogram using the Karhunen-Loe`ve trans-form. Electrotechnical Review, 2002, 69(2): 83-89.

    [11] Cuesta-Frau D, Perez-Cortes J, Andreu-Garca G, Novak D.Feature extraction methods applied to the clustering of elec-trocardiographic signals: A comparative study. In Proc. the16th Int. Conf. Pattern Recognition, August 2002, Vol.3,pp.961-964.

    [12] Skopin D, Baglikov S. Heartbeat feature extraction fromvowel speech signal using 2D spectrum representation. InProc. the 4th Int. Conf. Information Technology, June 2009.

    [13] Pickett J. The Acoustics of Speech Communication: Funda-mentals, Speech Perception Theory, and Technology. Allyn &Bacon, 1998.

    [14] Browman C, Goldstein L. Representation and reality: Physi-cal systems and phonological structure. Journal of Phonetics,1990, 18: 411-424.

    [15] Maton A, Hopkins J, McLaughlin C et al. Human Biologyand Health. New Jersey, USA: Prentice Hall, 1993.

    [16] Allen J, Rabiner L. A unified approach to short-time Fourieranalysis and synthesis. Proceedings of IEEE, 1977, 65(11):1558-1564.

    [17] Cohen L. Time-Frequency Analysis: Theory and Applica-tions. New Jersey, USA: Prentice Hall, 1994.

    [18] Gonzales R, Woods R. Digital Image Processing (3rd edition),Prentice Hall, 2007.

    [19] Sezgin M, Sankur B. Survey over image thresholding tech-niques and quantitative performance evaluation. Journal ofElectronic Imaging, 2004, 13(1): 146-168.

    [20] James A, Dimitrijev S. Inter-image outliers and their applica-tion to image classification. Pattern recognition, 2010, 43(12):4101-4112.

    [21] Turkbey E, Jorgensen N, Johnson W et al. Physical activityand physiological cardiac remodelling in a community setting:The Multi-Ethnic Study of Atherosclerosis (MESA). Heartand Education in Heart, 2010, 96(1): 42-48.

    Abdelwadood Mesleh receivedhis B.Eng and M.Sc. degrees incomputer engineering from ShanghaiUniversity, China, in 1995 and 1998respectively. He worked as a researchand teaching assistant in the Elec-trical Engineering Department, HongKong University of Science and Tech-nology, China, from 2004 to 2005. Hereceived his Ph.D. degree in feature

    selection using ant colony optimization (ACO) for Arabictext articles from the Arab Academy for Banking and Fi-nancial Sciences, Jordan, in 2008. Since 2008, Dr. Meslehhas been an assistant professor in the Computer Engineer-ing Department, Faculty of Engineering Technology, at Al-Balqa Applied University. His research interests includeoptimization, fuzzy logic, generic algorithm, ACO, Arabicnatural language processing, feature subset selection, Ara-bic speech recognition, MANETs, parallel processing, crypt-analysis, medical image and signal processing, operatingsystems etc.

    Dmitriy Skopin received hisM.Sc. and Ph.D. degrees in com-puter engineering from Kursk StateTechnical University, Russia, in1995 and 1998, respectively. SinceSeptember 2003 until August 2005he had been an associate professorwith the Kursk State Technical Uni-versity. Since September 2005 untilpresent time he is a staff member of

    Al-Balqa Applied University, Jordan. His research interestsfocus on signal and image processing, advanced program-ming, and computer graphics.

    Sergey Baglikov received hisPh.D. degree in computer engineer-ing from the Kursk State Techni-cal University in 1998. He is cur-rently the president of Help Medi-com Group which is specialized onnovel medical equipments and high-technology industries. His researchinterests are the condition monitor-ing using infrared sensors and the

    control of power electronics, nanotechnologies and digitalsignal processing.

    Anas Quteishat received theBEng degree in electronics fromPrincess Sumaya University of Tech-nology, Jordan, in 2003. He receivedhis MSc degree in electronic systemsdesign and Ph.D. degree in compu-tational intelligence from Universityof Science Malaysia in 2005 and 2008respectively. Currently he is an as-sistant professor at Al-Balqa Applied

    University in Jordan. His research interests include neu-ral networks, multi-agent systems, pattern classification andrule extraction.

    Heart Rate Extraction from Vowel Speech SignalsAbstractIntroductionHuman Vowel Speech Production and Heart ActivitiesHeart Rate Extraction from Vowel Speech SignalsExperimental Results and DiscussionTesting the Robustness of Proposed Heart Rate Extraction MethodTesting the Accuracy of Proposed Heart Rate Extraction MethodDiscussion

    ConclusionsReferences