Page 1
64
CHAPTER IV
PREPROCESSING & FEATURE EXTRACTION IN ECG
SIGNALS
The proposed ECG classification approach consists of three phases. They
are
• Preprocessing
• Feature Extraction and Selection
• Classification
The complete process of the proposed approach is shown in the figure 4.1.
Figure 4.1: Block Diagram of the Complete Process of ECG Signal
Classification
4.1. DATASET DESCRIPTION
The experiment conducted on the basis of ECG data from the MIT–BIH
arrhythmia database [107]. This database was the first commonly available set of
standard test material for evaluation of arrhythmia detectors and has been
MIT-BIH Arrhythmia
Database�
Noise Removal using
Morphology Filter�
Feature Extraction and
Selection using DWT and
AR Modeling
Classification using Machine
Learning Techniques
Page 2
exploited for that purpose in add
more than 500 sites worldwide.
The MIT-BIH Arrhythmia Database
excerpts of two-channel ambulatory ECG recordings, obtained from 47
studied by the BIH Arrhythmia Laboratory between 19
Figure 4.2: MIT
65
exploited for that purpose in addition for basic research into cardiac dynamics at
more than 500 sites worldwide.
BIH Arrhythmia Database (Figure 4.2) includes 48 half
channel ambulatory ECG recordings, obtained from 47
studied by the BIH Arrhythmia Laboratory between 1975 and 1979.
Figure 4.2: MIT–BIH Arrhythmia Database
ition for basic research into cardiac dynamics at
includes 48 half-hour
channel ambulatory ECG recordings, obtained from 47 subjects
75 and 1979.
Page 3
Twenty-three recordings were selected arbitrari
hour ambulatory ECG recordings collected from a mix
(about 60%) and outpatients (about 40%) at Boston's
remaining 25 recordings were chosen from the same s
but clinically significant arrhythmias that would n
random sample. The recordings were digitized at 360
channel with 11-bit resolution over a 10 mV range.
shown in figure 4.3.
Figure 4.3: Sample ECG Wave from Physionet 2011 (MIT
In particular, the considered beats refer to the fo
sinus rhythm (N), Atrial premature beat (A), Ventri
bundle branch block (RB), left bundle branch block
beats were selected from the recordings of 20 patients, which corresp
following files: 100, 102, 104, 105, 106, 107, 118,
208, 209, 212, 213, 214, 215 and 217.
describe the content of the
66
three recordings were selected arbitrarily from a set of 4000 24
hour ambulatory ECG recordings collected from a mixed population of inpatients
(about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the
remaining 25 recordings were chosen from the same set to include less comm
but clinically significant arrhythmias that would not be well-represented in a small
random sample. The recordings were digitized at 360 samples per second per
bit resolution over a 10 mV range. Sample ECG wave form is
: Sample ECG Wave from Physionet 2011 (MIT
In particular, the considered beats refer to the following classes: Normal
sinus rhythm (N), Atrial premature beat (A), Ventricular premature beat (V), Right
bundle branch block (RB), left bundle branch block (LB) and paced beat (/). The
d from the recordings of 20 patients, which corresp
following files: 100, 102, 104, 105, 106, 107, 118, 119, 200, 201, 202, 203, 205,
208, 209, 212, 213, 214, 215 and 217. Notes and statistics shown
the content of the record 100.
ly from a set of 4000 24-
ed population of inpatients
Beth Israel Hospital; the
et to include less common
represented in a small
samples per second per
Sample ECG wave form is
�
: Sample ECG Wave from Physionet 2011 (MIT-BIH)
llowing classes: Normal
cular premature beat (V), Right
(LB) and paced beat (/). The
d from the recordings of 20 patients, which correspond to the
119, 200, 201, 202, 203, 205,
shown in figure 4.4
Page 4
Figure 4.4: Notes and Statistics of
The wave forms of different diseases are shown in t
(a) Wave form of Normal Beat for Patient ID: 100
67
Notes and Statistics of Record 100 in MIT–BIH Arrhythmia
Database
The wave forms of different diseases are shown in the following figure.
(a) Wave form of Normal Beat for Patient ID: 100
BIH Arrhythmia
he following figure.
Page 5
(b) Wave form of
(c) Wave form of Ventricular Premature Beat for Pat
(d) Wave form of Right Bundle Branch Block Disease
(e) Wave form of Paced Heart Beat for Patient ID: 1
68
(b) Wave form of Atrial Fibrillation Disease for Patient ID: 201
(c) Wave form of Ventricular Premature Beat for Patient ID: 106
(d) Wave form of Right Bundle Branch Block Disease for Patient ID: 118
(e) Wave form of Paced Heart Beat for Patient ID: 102
Disease for Patient ID: 201
ient ID: 106
for Patient ID: 118
(e) Wave form of Paced Heart Beat for Patient ID: 102
Page 6
(f) Wave form of
Figure 4.
4.2. PREPROCESSING
The performance of the classification not only base
however it is also based on the features and enhanced ECG signal
Morphology Filter (MF), a built
the noise component at the same time preserving the
domain features.
ECG signals taken from the MIT
based pre-processing removes high frequency noise components and base
in addition to preserve ECG morphology. MF has the
the sharpness of the QRS complex.
frequency ECG noise with low distortion as
with less computational burden [118].
The following command
Morphology Filter.
69
(f) Wave form of Left Bundle Branch Block for Patient ID: 214
Figure 4.5: Six Types of ECG Signal Wave forms
PREPROCESSING
The performance of the classification not only based on the classifier,
also based on the features and enhanced ECG signal
Morphology Filter (MF), a built-in function in MATLAB which is used to remove
the noise component at the same time preserving the ECG morphology and time
ECG signals taken from the MIT-BIH arrhythmia database is used. MF
ng removes high frequency noise components and base
in addition to preserve ECG morphology. MF has the good quality of preserving
the sharpness of the QRS complex. MF filters the baseline drift and high
frequency ECG noise with low distortion as present in the original ECG signal and
with less computational burden [118].
The following command is used for preprocessing ECG signal using
Rsig=bwmorph(sig, ‘clean’);
for Patient ID: 214
: Six Types of ECG Signal Wave forms
d on the classifier,
also based on the features and enhanced ECG signal processing.
in function in MATLAB which is used to remove
ECG morphology and time
BIH arrhythmia database is used. MF
ng removes high frequency noise components and baseline drift,
good quality of preserving
MF filters the baseline drift and high
present in the original ECG signal and
ECG signal using
Page 7
Figure 4.6: Preprocessed ECG Signal
4.3. FEATURE EXTRACTION AND SELECTION
ECG beat recognition and classification is performe
morphological features. Since these features are ve
morphology and the temporal characteristics of ECG,
one from the other on the basis of the time wavefor
[66, 68]. In this phase two different classes of fe
isolated ECG beats including; auto
of discrete wavelet transform detail coefficients f
scales) [97].
A. Wavelet Transformation
In this research work, the feature extraction was d
Wavelet Transform. The
to highlight the significant amount of information
70
Figure 4.6: Preprocessed ECG Signal of Record 100
EXTRACTION AND SELECTION
ECG beat recognition and classification is performed with temporal and
morphological features. Since these features are very at risk to variations of ECG
morphology and the temporal characteristics of ECG, it is difficult to
one from the other on the basis of the time waveform or frequency representation
[66, 68]. In this phase two different classes of feature set are used belonging to the
isolated ECG beats including; auto-regressive model parameters and the varia
of discrete wavelet transform detail coefficients for the different scales (1
Wavelet Transformation
In this research work, the feature extraction was done by applying Discrete
Wavelet Transform. The benefit of the wavelet transformation lies in its capacity
to highlight the significant amount of information about the ECG signal.
of Record 100
d with temporal and
ry at risk to variations of ECG
it is difficult to distinguish
m or frequency representation
ature set are used belonging to the
regressive model parameters and the variance
or the different scales (1–6
one by applying Discrete
n lies in its capacity
signal.
Page 8
71
Physiological signals used for diagnosis are frequently characterized by a
non-stationary time behavior. For such patterns, time and frequency
representations are desirable. The frequency characteristics in addition to the
temporal behavior can be described with respect to uncertainty principle. The
wavelet transform can represent signals in different resolutions by dilating and
compressing its basis functions [72]. While the dilated functions adapt to slow
wave activity, the compressed functions captures fast activity and sharp spikes.
The most favorable choice of types of wavelet functions for pre-processing is
problem dependent. In this phase, Daubechies wavelet function (Db5) which is
called compactly supported orthonormal wavelets [101]. By making discretization
the scaling factor and position factor the DWT is obtained. For orthonormal
wavelet transform, x(n) the discrete signal can be expanded in to the scaling
function at j level, as follows:
�� � ������� � � ������� �� � � � (4.1)
where ���� represents the detailed signal at j level. Note that j controls the dilation
or contraction of the scale function ��� and � denotes the position of the wavelet
function ��� and � represents the sample number of the �� . Here � � �
represents the set of integers. The frequency spectrum of the signal is classified
into high frequency and low frequency for wavelet decomposition as the band
increases �� � ��� � � . Wavelet transform is a two-dimensional timescale
processing method for non-stationary signals with adequate scale values and
shifting in time [102].
Multi resolution decomposition can efficiently provide simultaneous
characteristics, in term of the representation of the signal at multiple resolutions
corresponding to different time scales. Feature vectors are constructed by the
normalized variances of detail coefficients and P-QRS-T coefficients of the DWT
which belongs to the related scales.
Page 9
72
The wavelet decomposition of ECG signal was done by the following
commands.
axes(handles.axes1);
plot(sig);
ylabel('Signal');
axes(handles.axes8);
[a1 d1]=dwt(sig,'db5');
plot(d1);
ylabel('d1');
axes(handles.axes7);
[a2 d2]=dwt(a1,'db5');
plot(d2);
ylabel('d2');
axes(handles.axes6);
[a3 d3]=dwt(a2,'db5');
plot(d3);
ylabel('d3');
axes(handles.axes5);
[a4 d4]=dwt(a3,'db5');
plot(d4);
ylabel('d4');
axes(handles.axes4);
[a5 d5]=dwt(a4,'db5');
plot(d5);
ylabel('d5');
axes(handles.axes3);
[a6 d6]=dwt(a5,'db5');
plot(d6);
ylabel('d6');
axes(handles.axes2);
plot(a6);
ylabel('a6');
Page 10
73
The following figures show the output wave forms of preprocessing and
morphology feature extraction.
Figure 4.7: Output Screen Demanding the Signal Number from User
Figure 4.8: Resulted Signal of Record 100 after Preprocessing & DWT
Page 11
74
Then the P-QRS-T points are constructed from the normalized variances of
detail coefficients of the DWT which belongs to the related scales. According to
the following procedure the points are constructed.
In order to detect the peaks, specific details of the signal were
selected. R peaks are the Largest amplitude points which are greater than
threshold points are located in the wave. Those maxima points are stored and
the R-R interval is determined. Their mean value is found which is used to find the
portion of the single wave. The Q and S points are found as local minimum points
before and after R wave. Calculating the distance from zero point or close zero left
side of R peak within the threshold limit denotes Q peak. The onset is the
beginning of the Q wave and the offset is the ending of the S-wave. Normally, the
onset of the QRS complex contains the high-frequency components, which are
detected at finer scales. Calculating the distance from zero point or close zero
right side of R peak within the threshold limit denotes Q peak. Based on the
PR interval and QT interval the P and Q points are determined respectively.
Figure 4.9: Detected P-QRS-T Features for Signal 100
Page 12
75
B. Higher-order Statistics and AR Modeling
Additional statistical data will be utilized for ECG signal feature detection.
For this purpose this research work proposed a complete procedure to extract
temporal features using third order cumulant based AR modeling.
The main problem in automatic ECG beat recognition and classification is
that related features are very susceptible to variations of ECG morphology and
temporal characteristics of ECG. In the study [96] the set of original QRS
complexes typical for six types of arrhythmia taken from the MIT/BIH arrhythmia
database, there is a great variations of signal among the same type of beats
belonging to the same type of arrhythmia. Therefore, in order to solve such
problem, this approach will rely on the statistical features of the ECG beats. For
this purpose, third-order cumulant has been taken into account, which can be
determined (for zero mean signals) as follows
����� � !�� �� � � " (4.2)
�#���� $ � !�� �� � � �� � $ " (4.3)
�%���� $� & � !�� �� � � �� � $ �� � & "
' ����� ����& ' $ ' ����$ ����& ' �
' ����& ����$ ' �
(4.4)
where E represents the expectation operator, and k, l, and m are the time lags. In
this phase, third-order cumulant of selected ECG beats is used. Normalized ten
points represents the cumulant evenly distributed within the range of 25 lags. Each
succeeding samples of a signal as a linear combination of previous samples, that
is, as the output of an all-pole IIR filter is modeled by linear prediction. This
process locates the coefficients of an nth
order auto-regressive linear process that
models the time series x as
�� � '(�) �� ' � ' (�* �� ' ) ' +
' (�� � � �� ' � ' �
(4.5)
Page 13
76
where x represents the real input time series (a vector) and n is the order of the
denominator polynomial a(z). In the block processing, autocorrelation method is
one of the modeling methods of all-pole modeling to find the linear prediction
coefficients. This method is as well called as the Maximum Entropy Method
(MEM) of spectral analysis.
The following commands are used for temporal feature extraction from
preprocessed ECG signal using AR modeling and third order cumulant.
Inputs to the function are x-input signal vector, p-the optimal AR model
order, Fs-sampling frequency.This part of the code to determine the AR
parameters.
% Spectrum(f)=e(L)/ 1+A(L,1)*exp(-j*2*Pi*f/Fs)+
% ...+A(L,L)*exp(-j2*Pi*f*L/Fs)^2
for i=1:Nfreq
den=0;
for k=2:order+1
den=a(k)*exp(-j*2*pi*(i-1)*(k-1)/Nfreq)+den;
end
power(i)=e(order)/abs(1+den)^2;
end
freq=0:fsamp/Nfreq:(Nfreq/2-1)*fsamp/Nfreq;
function[A,E,K]=AR(x,p,Fs)
A=zeros(p+1,p+1);
K=zeros(1,p);
E=zeros(1,p);
N=length(y);
% y is a raw vector
% initialization
Rxx=(y*y’)/N;
ef=y; % ef(n)=y(n)
eb=y; % eb(n)=y(n)
L=1;
DEN=y(2:N)*y(2:N)’+y(1:N-1)*y(1:N-1)’;
Num=y(2:N)*y(1:N-1)’;
K(1)=2*Num/DEN; %K(L)=-R(L)
A(1,1)=-K(1);
Page 14
77
E(1)=Rxx*(1-K(1)^2);
ef(2:N)=y(2:N)-K(1)*y(1:N-1);
eb(1:N-1)=y(1:N-1)-K(1)*y(2:N);
% Calculation
for L=2:p
Num=ef(L+1:N)*eb(1:N-L)’;
Den=ef(L+1:N)*ef(L+1:N)’+eb(1:N-L)*eb(1:N-L)’;
K(L)=2*Num/Den;
E(L)=E(L-1)*(1-K(L)^2);
A(L,L)=-K(L);
for j=1:L-1
A(L,j)=A(L-1,j)-K(L)*A(L-1,L-j);
end;
efm=ef;
ebm=eb;
ef(L+1:N)=efm(L+1:N)-K(L)*ebm(1:N-L);
eb(1:N-L)=ebm(1:N-L)-K(L)*efm(L+1:N);
end;
B(2:p+1,2:p+1)=A(1:p,1:p);
B=zeros(p+1,p+1);
B(:,1)=ones(p+1,1);
B(2:p+1,2:p+1)=A(1:p,1:p);
A=B(p+1,:);
% End
The outputs are the
A: AR parameters matrix, the pth row is the final set of p AR
parameters i.e. A=[1 A1 A2...Ap];
e(n)=y(n)+A1*y(n-1)+A2*y(n-2)+... +Ap*y(n-p);
E: error variance vector=[E(0),E(1),E(2),...,E(p)];
K: a raw vector of reflection coefficients at each calculating
step (from 1 to order p)
Hence, the noise components in ECG signal are removed in preprocessing
phase using morphology filter. From the preprocessed signal, P-QRS-T points are
constructed using DWT. Temporal features of the preprocessed ECG signal are
extracted using third order cumulant based AR modeling. These two feature sets
will construct the input vectors for the classifiers.