Characterization and Segmentation of Acoustic Swallowing Signals
Collected Concurrently with Dual-axis Accelerometry
by
Navid Zohouri Haghian
A thesis submitted in conformity with the requirementsfor the degree of Master of Health Science
Graduate Department of Institute of Biomaterials and Biomedical EngineeringUniversity of Toronto
c� Copyright 2014 by Navid Zohouri Haghian
Abstract
Characterization and Segmentation of Acoustic Swallowing Signals Collected Concurrently with
Dual-axis Accelerometry
Navid Zohouri Haghian
Master of Health Science
Graduate Department of Institute of Biomaterials and Biomedical Engineering
University of Toronto
2014
Dysphagia, a physiological impairment that involves di�culty swallowing, can arise due to a variety of
di↵ererent disease and injury processes. The current gold standard in assessment for dysphagia is the use
of videofluoroscopy (VFS). Due to the need for radiation exposure in VFS, of the development of valid,
noninvasive screening technologies is desirable. In this thesis, acoustics were explored as a potential
means of screening for dysphagia. The foundation of this work requires pinpointing swallow segments in
acoustic signals, which was done through the implementation of a novel acoustic swallow segmentation
algorithm. The application of this algorithm to data from 44 healthy participants swallowing water
samples resulted a sensitivity of 86%, 94% and 92% for detecting 10mL, 5mL and saliva swallows,
respectively. Moreover, this work analyzed the suitability of di↵erent signal features for characterizing
acoustic swallows and artifacts.
ii
Acknowledgements
I would like to dedicate this work to my mother, father and sister who provided the best possible support
for the continuation of my education. I would also like to thank my supervisor, Dr. Catriona M. Steele,
without whom this work would not be possible. One could not ask for better guidance and supervision.
Lastly, I would like to express my gratitude to all of my colleagues at the Swallowing Rehabilitation
Research Lab who supported my research towards the completion of my thesis.
iii
Contents
1 Introduction 1
1.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Deglutition Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Oral Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Pharyngeal Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Esophageal Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Dysphagia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Swallowing Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Videofluoroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2 Cervical Auscultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.3 Standardized Swallow Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.4 Accelerometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Acoustic Swallowing Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Automatic Swallow Segmentation 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Signal Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Stationary Wavelet Transform Decomposition (STW) . . . . . . . . . . . . . . . . 11
2.2.5 Signal Envelope Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.6 Signal Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Discusssion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Swallow Signal Characterization 22
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Overview of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Signal Segment Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
iv
3.2.3 Fisher Feature Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.4 Logistic Regression - Binary Classification . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Acoustics and Swallow Screening 47
4.1 Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Acoustic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
v
List of Tables
2.1 Statistics comparing acoustic and nasal cannula segmentation . . . . . . . . . . . . . . . . 18
2.2 Statistics comparing acoustic and Accel-1 segmentation . . . . . . . . . . . . . . . . . . . 18
2.3 Statistics comparing acoustic and Accel-2 segmentation . . . . . . . . . . . . . . . . . . . 18
2.4 Performance comparisons between accelerometry and acoustic automatic segmentation . 19
3.1 Automatic Segmentation Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Descriptive statistics of features for samples 5 mL, 10 mL and saliva sample types . . . . 32
3.3 Fisher’s projection and feature weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Sensitivity and Specificity of binary classification . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1 Descriptive statistics of features from 5 mL, 10mL swallow samples . . . . . . . . . . . . . 49
4.2 Descriptive statistics of features from 5 mL, 10mL artifact samples . . . . . . . . . . . . . 50
vi
List of Figures
1.1 A) A liquid bolus is taken into the mouth and held between the tonque and hard palate.
B) The bolus is squeezed backwards in the mouth. C) The bolus enters the upper pharynx.
D) The bolus travels down through the pharynx and the upper esophageal sphincter. The
opening to the airway is closed during this phase. E) The bolus has entered the esophagus
and the upper esophageal sphincter closes behind the bolus tail. F) The pharynx returns
to a rest position and the airway opens to allow breathing to resume. . . . . . . . . . . . . 2
1.2 Accelerometry screening setup [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Participant task flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Stationary Wavelet Transform Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 3 level approximation of Stationary Wavelet Decomposition . . . . . . . . . . . . . . . . . 12
2.4 3 level details of Stationary Wavelet Decomposition . . . . . . . . . . . . . . . . . . . . . . 13
2.5 original (top) and modified (bottom) signal envelope . . . . . . . . . . . . . . . . . . . . . 14
2.6 Envelope signal (blue) and Binary signal (red) superimposed . . . . . . . . . . . . . . . . 16
2.7 Segmentation signal superimposed on acoustic signal (top) and nasal cannula signal (bottom) 16
3.1 Signal Characterization Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Sigmoid Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Amplitude Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Amplitude Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Amplitude Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Amplitude Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Dominant Frequency (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Centroid Frequency (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.10 Average Wavelet Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.11 DWT Energy Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.12 Fractals Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.13 Wavelet Filtered ENtropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.14 Dominant Frequency - PSD (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.15 Signal Energy - 75 to 100 (Hz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.16 Amplitude Distribution Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.17 Wavelet Energy Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.18 Variance - Signal Squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
vii
3.19 Feature weights of fisher projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.20 Binary classification using logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 AKG 411 PP transducer frequency response and polarity . . . . . . . . . . . . . . . . . . 48
viii
Chapter 1
Introduction
1.1 Rationale
The work presented in this thesis builds on previous research in the field of automated detection and clas-
sification of swallowing events. It explores the realm of acoustics through analysis of acoustic swallowing
signals, which were collected concurrently with swallowing accelerometry signals. This chapter presents
the background knowledge required to understand the anatomy, physiology and pathology of swallow-
ing. It further explains dysphagia (swallowing impairment) and the incentive to develop non-invasive
screening tools to detect dysphagia.
1.2 Deglutition Overview
Ingestion of food or liquid is typically done through the oral cavity via the act of mastication and
deglutition (swallowing). Mastication consists of the mechanical breakdown of food via the use of the
teeth and the tongue. Once this process is complete, a bolus is formed through the mixture of the
fragmented food and saliva. This is followed by transfer of the bolus to the pharynx and swallowing.
The entire act requires the coordination of many di↵erent neuromuscular components, and involves both
voluntary and involuntary responses. Swallowing involves three main phases [2]:
• Oral phase
• Pharyngeal phase
• Esophageal phase
1
Chapter 1. Introduction 2
A B C
D E F
Figure 1.1: A) A liquid bolus is taken into the mouth and held between the tonque and hard palate. B)The bolus is squeezed backwards in the mouth. C) The bolus enters the upper pharynx. D) The bolustravels down through the pharynx and the upper esophageal sphincter. The opening to the airway isclosed during this phase. E) The bolus has entered the esophagus and the upper esophageal sphinctercloses behind the bolus tail. F) The pharynx returns to a rest position and the airway opens to allowbreathing to resume.
1.2.1 Oral Phase
The oral phase involves mastication (chewing) of food in order to mechanically break it down. Further-
more, it involves the addition of saliva in order to produce a bolus, which possesses a suitable consistency
for swallowing. Concurrently, the lips are sealed and the soft palate is lowered to prevent spillage of
bolus material into the pharynx (see Figures 1.1A and B). The tongue aids this action through forming
a cup shape against the hard palate. This stage occurs while the vocal cords remain open to allow
breathing via the nasal passages. Subsequent to mastication, the tongue pushes the bolus to the back
of the mouth via a squeezing movement [2] .
Chapter 1. Introduction 3
1.2.2 Pharyngeal Phase
During the pharyngeal phase, the bolus enters the pharynx (throat) as the soft palate moves up to seal
the entrance to the nasal cavity [2]. The contraction of many of the pharyngeal and suprahyoid muscles
causes the distance between the thyroid cartilage and the hyoid (hyolaryngeal complex) to shorten. This
phenomenon is referred to as the hyolaryngeal excursion or the hyoid burst. As the bolus enters the
pharynx and is carried down towards the esophagus, the epiglottis deflects to cover the entrance to the
airway and the larynx. Moreover, both the true and false vocal folds close to protect the airway from
aspiration (entry of foreign material into the airway). This involuntary process is followed by the opening
of the Upper Esophageal Sphincter (UES) to allow the entrance of the bolus into the esophagus. These
components are illustrated in Figures 1.1C and D.
1.2.3 Esophageal Phase
As the bolus enters the esophagus the nasal and oral cavities open to allow respiration to resume. As
the vocal folds open, a small period of exhalation occurs. Moreover, the larynx and the epiglottis return
to their original resting position. This process is followed by peristalsis, which is a continual contraction
of the esophageal muscles to propel the bolus down to the stomach.
1.3 Dysphagia
Dysphagia is a term that broadly describes di�culty swallowing; this disorder can occur as part of
many di↵erent disease and injury processes. A patient with dysphagia has di�culty in implementing
one or more of the phases described above. A main concern in dysphagia is Aspiration and Penetration.
Aspiration describes the entrance of foreign matter (food/liquids) into the airway below the level of the
true vocal folds. Penetration entails diminished severity, where foreign matter enters the supraglottic
space but does not pass below the true vocal folds [3],[2]. Aspiration can contribute to negative outcomes
including pneumonia. Dysphagia may also lead to other negative sequelae including malnutrition, weight
loss, immune system deficiencies and psychological burden. Etiologies of dysphagia include, but are not
limited to stroke, neurological disorders such as Multiple Sclerosis, trauma, head/neck surgeries, and
head/neck cancer [2]. Di↵erent clinical signs point to the specific component/process at fault during
a swallow. In addition to penetration and aspiration, residue is a serious concern in dysphagia, which
occurs when materials are left behind in the oral/pharyngeal cavities after a swallow.
Chapter 1. Introduction 4
1.4 Swallowing Assessment
1.4.1 Videofluoroscopy
The gold standard diagnostic tool for the assessment of swallowing is videofluoroscopy(VFS) [4]. During
this x-ray procedure a participant who is suspected of potential dysphagia is given a variety of di↵erent
food and liquid samples to eat/drink, with specific consistencies. The samples are typically mixed
with barium contrast, allowing visualization on the x-ray video. Multiple indicators are continuously
monitored on the x-ray by the speech-language pathologist (SLP) and/or other clinicians. This includes
observation for indicators of aspiration, penetration or residue.
VFS is of great value because it provides detailed, high resolution imaging of the anatomical structures
while providing continuous monitoring of anatomical landmarks and functions [4]. However, the VFS
has some disadvantages. The device is bulky and requires large hospital/clinical space allocation. It
cannot be used for regular or repeated bed-side assessment and requires appointments and long waiting
lists. Furthermore, although the x-ray exposure falls within a safe range, the patients are still exposed
to radiation and dose deposition.
1.4.2 Cervical Auscultation
Cervical Auscultation (CA) is the practice of listening to swallows via a device such as a stethoscope
in order to screen for signs of dysphagia. The main phase that is analyzed is the pharyngeal phase
[5]. The device is typically placed laterally on the neck, over the cricoid cartilage [6]. Distinguishing
features, which may aid in identifying dysphagia are 1) time of deglutition obtained from onset and
o↵set of swallow 2) delay based on the period of time from initiation of deglutition apnea to the first
acoustic burst; and post-swallow changes in breath sounds [5]. Moreover, dysphagic individuals tend to
have a higher number of swallows per bolus. Cervical auscultation is considered a subjective method
of swallowing assessment and currently lacks validation. Sources of possible error and artifact include
variability between dysphagia patients, inconsistent methods of sound amplification and silent aspiration.
1.4.3 Standardized Swallow Screening
Many best practice guidelines recommend the use of screening tests to identify signs of dysphagia early
in a patient’s healthcare episode [7]. The standardized protocols described for swallow screening include:
Chapter 1. Introduction 5
V-VST (volume-viscosity swallowing test)
During this examination, the participant is fed with three di↵erent sample viscosities in the following
sequence: nectar, water, and pudding. The bolus volumes also range from 5mL, 10 mL to 20 mL. The
samples are presented to the participant using a syringe. In addition to observing for signs of aspiration,
peripheral blood oxygen saturation is measured using pulse oximetry. The base line value is obtained
prior to the examination and is then compared to the values obtained after swallowing. Drops in oxygen
saturation are interpreted as a sign of possible aspiration [8],[9],[10].
TOR-BSST (Toronto Bedside Swallowing Screening Test)
Designed specifically for stroke patients, this screening test starts by evaluating the cognitive state of the
participant, and performance of basic oral motor function and vocalization tasks. This is then followed
by feeding of 10 x 1 tsp. sips of water [9]. The test outcome is a simple binary fail or pass and stops at
the first failed item. A failed response flags the subject for further assessment.
Yale Swallow Protocol/3.02 Water Swallow Test
The main principle in this approach involves presentation of a large bolus size to stimulate potential
aspiration/penetration in suspected participants [11]. Potential signs of di�culty include a wet voice,
choking or coughing [12],[9].
1.4.4 Accelerometry
The use of accelerometry for swallow screening is based on cervical auscultation, which implies a distinc-
tion between healthy and unhealthy swallows on the basis of the sound and vibrations produced during
a swallow. The use of dual-axis accelerometry provides a systematic, non-invasive screening tool that
can be used at the bedside with minimal training. Figure 1.2 demonstrates the general setup for using
this technology.
Using this tool requires the attachment of the accelerometry sensor to the person’s neck in midline over
the cricoid cartilage. The signals are then collected while individual performs di↵erent tasks such as
drinking/eating di↵erent samples. As the method is still under development, many di↵erent papers
have been published on the use of di↵erent signal processing, pattern classification and machine learning
techniques towards improving performance [13].
Chapter 1. Introduction 6
Figure 1.2: Accelerometry screening setup [1]
1.5 Acoustic Swallowing Signals
The practice of cervical auscultation revolves around the analysis of swallowing acoustics for screening
patients for dysphagia. This concept has been under study and it is the principle idea behind the use
of accelerometry and acoustics. The work by Morinie’re, has suggested that the swallow sounds are
comprised of three main components, which include 1) the ascension of the larynx 2) opening of the
upper esophageal sphincter 3) the laryngeal release [14].
The laryngeal ascension causes sound production as the hyoid bone moves up, while the bolus is in
the oropharynx. The opening of the upper esophageal sphincter and the passage of the bolus through
the sphincter then produce a second sound component. Lastly, opening of the larynx and the pharynx
produces sound while the bolus is passing through the esophagus [14].
The supporting work builds the foundation for the evaluation of acoustics as a mean of non-invasive
screening of swallows. Prior to the analysis of swallow acoustics, the localization of acoustic swallow
segments within time (temporal domain) is requried. The work in chapter 2 demonstrates a signal
processing technique which segments swallow signals as a basis for further analysis of swallow acoustics.
Chapter 2
Automatic Swallow Segmentation
2.1 Introduction
Dysphagia is a term that describes discomfort and di�culty with the act of swallowing. Dysphagia
encompasses a large spectrum of patient populations and can arise due to factors such as age, trauma,
cancer, surgery, psychological and/or neurological conditions. The act of swallowing itself involves the
contraction of a series of both voluntary and involuntary muscles. Thus, damage to either the somatic
or autonomic nervous systems can result in dysphagia [3].
One of the primary concerns for people with dysphagia is aspiration (the entry of foreign matter into
the airway). This may happen during swallowing of solid/liquid food or saliva. Aspiration is a serious
concern, and may lead to severe coughing, choking, airway obstruction, or lower respiratory infections
(i.e., aspiration pneumonia) [3]. A second primary concern for those living with dysphagia is swallowing
e�ciency; many people with dysphagia collect food or liquid residues in the pockets of the pharynx or
require multiple swallows to move a single food bolus from the mouth to the esophagus. Malnutrition as
a result of swallowing ine�ciency can contribute to constant fatigue and other medical anomalies such
as weakening of the immune system. In addition to these concerns, dysphagia can negatively impact
one’s quality of life by restricting a person’s ability to engage in the many social rituals that involve
eating and drinking [3].
The current gold standard for evaluating swallowing is the videofluoroscopic swallowing study (VFS).
During this process, the patient is placed within an x-ray imaging system. However, rather than taking
a single x-ray image, a dynamic x-ray video is obtained while the participant is provided with di↵er-
ent samples to swallow. This provides a high-resolution display of the anatomy and physiology of the
7
Chapter 2. Automatic Swallow Segmentation 8
oro-pharyngeal region. All samples are barium coated to provide high contrast on the x-ray. The test
itself requires large fluoroscopy equipment, exposure to radiation and it is not available at the point
of care. Due to these constraints, the development of valid and reliable non-invasive approaches to
swallowing assessment is considered desirable. Methods that have been explored include the analysis of
parameters such as peripheral blood oxygen saturation (using pulse oximetry) [15], nasal airflow (using
nasal cannula), and either sounds or vibrations monitored from the neck (using either microphones or
accelerometers). Several recent studies explore the use of dual-axis accelerometers to measure neck vi-
brations during di↵erent tasks such as coughing and swallowing [16]. Advancements in the field of digital
signal processing and pattern recognition have permitted the extraction of valuable information from
these biomedical signals towards segmenting swallow portions and eventually distinguishing individuals
with either healthy or impaired swallowing.
The analysis of swallowing signals first requires the temporal localization of swallow segments (i.e.,
segmentation). The goal of the present study was to explore the utility of acoustic swallowing signals for
segmenting swallowing signals in comparison to dual-axis accelerometry signals. As the acoustic signal
is obtained in the time domain, proper segmentation serves as an important first step in ensuring the
characterization of the signal using the correct time segments. An automatic segmentation algorithm
was developed towards locating these segments on the basis of signal energy distribution in acoustic
swallowing signals. Nasal airflow signals contain information identifying the periods of swallowing apnea
(SA) that occurs during swallowing. This feature was used as the reference standard for identifying the
temporal location of swallows.
2.2 Methods
2.2.1 Participants
The study involved 44 participants (22 males and 22 females) with a mean age of 35 (standard deviation
of 13). Two participants were excluded as they did not fully complete the required tasks. Participants
were all healthy individuals who reported no symptoms of dysphagia or prior neurological dysfunction.
The local institutional research ethics committee approved the study. Each participant was formally
consented prior to data collection.
Chapter 2. Automatic Swallow Segmentation 9
2.2.2 Signal Acquisition
The data collection protocol required the attachment of 4 sensors:
• A dual-axis accelerometer, attached to the neck in midline, right below the thyroid cartilage by a
licensed speech-language pathologist.
• A contact microphone (model AKG C411 PP), attached to the neck via double-sided tape, 1-2 cm
laterally to the right of the accelerometer.
• A nasal cannula attached to the nares to measure the breathing of the participant during each
task.
• A head set microphone to measure ambient sound in the room. This was used to validate sources
of artifacts picked up by contact microphone and to confirm any protocol deviations.
!
Task!i!
Repetition!1!
Repetition!2!
Repetition!3!
!i!=!i+1!
Figure 2.1: Participant task flowchart
The protocol was comprised of di↵erent tasks including: counting out loud from 1-5; coughing;
throat clearing; humming; breathing quietly; swallowing saliva;, drinking 5mL/10mL of water by cup;
and drinking water from a straw. For the purpose of developing the segmentation algorithm, the following
three tasks were chosen: 1) swallowing 10mL of water by cup; 2) swallowing 5mL of water by cup; 3)
Chapter 2. Automatic Swallow Segmentation 10
swallowing saliva. The water samples were prepared before the data collection session in appropriate
portions and placed in front of the participant during the session.
The tasks were performed in a randomized sequence with each task performed twice for a total of 20
tasks. During each task, a total of three realizations/repetitions of that particular task were obtained
subsequent to a visual cue. A sample protocol schematic is presented in figure 2.1.
2.2.3 Pre-processing
Prior to the decomposition of the acoustic signal, the raw data were preprocessed for the purpose of
removing unwanted artifacts and noise. The data collection sessions required the use of an isolation
transformer as part of the equipment setup as a safety factor. As a result, it was necessary to remove a
dominant 60Hz frequency component in the raw data, which was accomplished using a notch filter with
a corner frequency, Fc = 60Hz.
The acoustic signals were subsequently low-pass filtered using an IIR filter. The filter was designed
with a corner frequency set to remove components above the threshold ⌧ , i.e., high frequencies contribut-
ing to the last 5% of the energy spectrum. In doing so, each signal was filtered with a unique corner
frequency, Fc
.
Es
(jw) =1
N
N�1X
i=0
|X(jw)|2 (2.1)
⌧ = (0.95)Es
(jw) = Es
(jwc
) (2.2)
Fc
=w
c
2⇡(2.3)
Here Es
(jw) is the signal energy and N is the length of the Fourier spectrum X(jw). The corner
frequency, Fc
, at the energy value ⌧ was then used to develop an IIR low-pass filter. Upon low-passing the
acoustic data, the signals were downsampled by a factor of 75. This resulted in reducing the sampling
frequency from the original 22.5kHz to 300Hz, which was well above the minimum Nyquist rate of
⇠ 200Hz. The low-pass filtering allowed for proper downsampling of the signal, to reduce processing
complexity and mitigation of potential signal aliasing. Lastly, each sample was normalized by dividing
the signal by the maximum value of its absolute value in order to give an amplitude range of -1 to 1 for
all samples.
Chapter 2. Automatic Swallow Segmentation 11
2.2.4 Stationary Wavelet Transform Decomposition (STW)
The first step after signal pre-processing was the decomposition of the acoustic signals into di↵erent
energy levels. As the acoustic biomedical signals in this study were non-stationary, the use of the
frequency spectrum alone is not adequate. The Discrete Wavelet Transform (DWT) allows for multi-
resolution analysis in both time and frequency domains. However, the DWT possesses the property
of time/shift variance, which causes the loss of valuable time information, as each decomposition level
reduces the time resolution by a factor of two. Thus, the Stationary Wavelet Transform was utilized
instead, where the high-pass, H, and the low-pass, G, filters are zero-padded (upsampled) at each
decomposition level.
!
!!!!!!!!!!!!!!!!!!! !
ℋ!!
X[n]!
!""#$%!!
!!!
ℋ!!!!!
!""#$%!! ℋ!!!!!
!""#$%!!
!"#$%&!!
!"#$%&!!
!"#$%&!!
ℋ!/!! ! 2!! ℋ!!!/!!+1 !
Figure 2.2: Stationary Wavelet Transform Flowchart
Chapter 2. Automatic Swallow Segmentation 12
y[n]approximations
=1X
n=�1x[n]G[k � n] (2.4)
y[n]details
=1X
n=�1x[n]H[k � n] (2.5)
The signals were each broken down using a 3 level SWT decomposition resulting in decomposition
matrix containing 6 rows of N sampled coe�cients. These were comprised of three approximations,
outputted from the low-pass filter and three details, outputted from the high-pass filter. Figure 2.2
provides a schematic of the SWT workflow. The mother wavelet used for the decomposition was the
Daubechies 6 wavelet. This kernel was chosen on its basis of morphological properties. It allowed the
capture of desirable components, such as the transient aspects of the swallows. The 3 levels were chosen
on the basis of the energy distribution of the signals. The second level decomposition proved to contain
the most valuable and useful information and was used for further analysis. Figure 2.3, 2.4 presents a
sample 3 level approximations and details wavelet decomposition respectively.
Figure 2.3: 3 level approximation of Stationary Wavelet Decomposition
Chapter 2. Automatic Swallow Segmentation 13
Figure 2.4: 3 level details of Stationary Wavelet Decomposition
2.2.5 Signal Envelope Extraction
A signal envelope in the time domain is a very useful and informative representation of the energy
distribution. The envelope of the acoustic signal was determined using the logical AND (mathematical
product) of the envelopes obtained from the approximation and detail coe�cients of the 2nd decompo-
sition level described above. This was done to maintain similar features across signals, and to diminish
the magnitude of features that di↵ered.
The envelopes y[n]1,2, were obtained using a stepwise process, beginning with filtering each signal
using the Savitsky Golay filter, S. This is a polynomial based operation that fits a p-ordered polynomial
function to a dataset of length k. As the order of the polynomials increases, the more the output of the
filter contains higher frequency components. Following this logic, a 3rd ordered polynomial filter was
used in order to contour the overall shape of the signal. The 3rd ordered polynomial was chosen in order
to limit higher frequency components while maintaining lower frequency components in the obtained
envelope. This was executed using a window length of k = 401 samples. The window length was chosen
to optimize the e↵ect of the filter in obtaining a smooth envelope. The advantage of using this operator
is the provision of a real and flat frequency response. Based on the following, the absolute value of the
approximations and details were smoothed using the filter, normalized and multiplied together. The
closed form of theses steps can be seen in equations 2.6, 2.7 and 2.8.
Chapter 2. Automatic Swallow Segmentation 14
The resulting envelope was then further processed via using a Non-linear Energy Operator (NEO).
The main goal of this step was to further accentuate potential swallows within the signal, as this operator
is very sensitive to transient portions [17]. Equation 2.9 presents the NEO in discrete form. Lastly, to
obtain the final signal envelope used for segmentation, the NEO signal was filtered using the smoothing
filter, while negative values were set to zero (equation 2.11). Figure 2.5 compares the original and final
envelope signal.
y[n]level
= |x[n]|⌦ S (2.6)
y[n]level
= y[n]level
� 1
N
NX
n=0
y[n]level
(2.7)
[n] = y[n]2 approximations
⇤ y[n]2 details
(2.8)
�[n] = 2[n]� [n+ 1] [n� 1] (2.9)
[n]final
= |�[n]|⌦ S (2.10)
[n]final
=
8><
>:
0 [n]final
< 0
[n]final
[n]final
� 0
9>=
>;(2.11)
Figure 2.5: original (top) and modified (bottom) signal envelope
Chapter 2. Automatic Swallow Segmentation 15
2.2.6 Signal Segmentation
Segmentation of the acoustic signal requires a time-domain binary signal �, where a magnitude of 1
corresponds to a potential swallow portion and a magnitude of 0 corresponds to non-swallow portions.
Using the signal envelope, the binary segmentation signal was generated in the following manner:
a) A global threshold ↵ was obtained as the mean value of the envelope signal.
↵ =1
N
N�1X
n=0
[n]final
(2.12)
b) The maximum value of [n]final
was obtained. The past and future values of [m]final
were
recursively compared to ↵ to obtain 0[n]final
.
argmaxn
[n]final
= {n | [n] = maxn
[n]} (2.13)
C = [m± k]final
for k = 1, 2, ...(N � 1)�m (2.14)
0[n]final
=
8><
>:
0 C > ↵
[n]final
C ↵
9>=
>;(2.15)
Here the future and past samples of point m were compared to ↵ in order to obtain the signal
0[n]final
. In this recursive process, the values that no longer meet the condition of being greater
than the threshold are used to obtain the time indices that correspond to the pre and post swal-
lows. These indices are placed in matrix I, where row 1 contains pre-swallow and row 2 contains
post-swallow indices for N potential swallows.
Ii,j
=
2
64t1,1 t1,2 t1,N
t2,1 t2,2 t2,N
3
75 (2.16)
c) This process was repeated until maxn
[n] < ↵ . Figure 2.6 shows the binary signal superimposed
on the envelope signal. Furthermore, figure 2.7 demonstrates the binary segmentation signal (red)
Chapter 2. Automatic Swallow Segmentation 16
superimposed on the acoustic signal (blue). For the envelope signal [n]final
, a binary signal �[n]
is obtained using matrix I:
Figure 2.6: Envelope signal (blue) and Binary signal (red) superimposed
0 5 10 15 20 25 30−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Participant number:1 −−T02
time − s
0 5 10 15 20 25 30−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
time − s
Figure 2.7: Segmentation signal superimposed on acoustic signal (top) and nasal cannula signal (bottom)
Chapter 2. Automatic Swallow Segmentation 17
�[n] =
8><
>:
1 I1,j : I2,j
0 otherwise
9>=
>;for j = 1, 2, 3...N (2.17)
As apparent in the top signal in figure 2.7 there are other transient portions which were not segmented
by the algorithm. This is due to two processing steps, which include the application of the Savitzky
Golay filter and the Non-linear Energy Operator. The former step diminishes the magnitude of transients
which have very short durations, while the later emphasizes this phenomenon as a function of the energy
of the transient component.
2.3 Validation
The results of this signal segmentation algorithm were compared with the nasal airflow reference signals,
in which the temporal locations of swallowing apnea were identified by two trained SLPs. Each nasal
airflow signal was used to obtain a pre-apnea and post-apnea time index, which together represented
the total period of a potential swallow apnea event. In addition, the results of the acoustic signal
segmentation algorithm were compared to two automatic segmentation algorithms, accel-1 and accel-
2, applied to the concurrently collected accelerometry signals. Accel-1 and accel-2 were designed to
segment swallows from temporal accelerometry signals. Based on this process, comparison of the four
segmentation methods was performed by identifying the mid-point of the segmented portion, M , from
each signal and determining time di↵erences in the locations of this mid-point.
M =Swallow Onset+ Swallow Offset
2(2.18)
Furthermore, if the signal segment identified through the segmentation algorithm did not overlap by
at least 50% with the reference swallow apnea event, segmentation was considered to have failed. The
validation of the acoustic segmentation algorithm was quantified through a “sensitivity” parameter. The
sensitivity of the algorithm was obtained with reference to all the other sources such as the accelerometry
signal, the acoustic signal and the nasal cannula signal to validate swallow presence. Thus, the number
of true swallows (true positives, A), and the number of falsely detected artifacts (false negative, B) were
used to quantify sensitivity.
Sensitivity =A
A + B⇤ 100 (2.19)
Chapter 2. Automatic Swallow Segmentation 18
Similar to sensitivity, specificity is a parameter, which quantifies the ability of the algorithm to
correctly identify non-swallow events. However, as true-negative events cannot be distinguished from
other non-swallowing artifacts in our data, this parameter was not used as a validation tool.
2.4 Results
Descriptive statistics for the time di↵erences in segment midpoint between the acoustic and accelerome-
try segmentation results and the nasal cannula reference signal are shown in tables 2.1, 2.2, 2.3 by task.
Table 2.1: Statistics comparing acoustic and nasal cannula segmentation
Sample Type MAirflow
� MAcoustic
95% Confidence Interval
10 mL 0.05± 0.375(s) 0.004 — 0.1045 mL 0.07± 0.381(s) 0.019 — 0.127Saliva �0.09± 0.327(s) �0.139 — �0.051
Table 2.2: Statistics comparing acoustic and Accel-1 segmentation
Sample Type MAccel�1 � M
Acoustic
95% Confidence Interval
10 mL �0.22± 0.382(s) �0.274 — �0.1675 mL �0.24± 0.391(s) �0.296 — �0.185Saliva �0.27± 0.343(s) �0.331 — �0.221
Table 2.3: Statistics comparing acoustic and Accel-2 segmentation
Sample Type MAccel�2 � M
Acoustic
95% Confidence Interval
10 mL �0.21± 0.354(s) �0.265 — �0.1655 mL �0.255± 0.385(s) �0.310 — �0.201Saliva �0.27± 0.329(s) �0.332 — �0.226
The performance of the acoustic segmentation algorithm was quantified through the sensitivity pa-
rameter. True swallows and acoustic artifacts were first manually distinguished via the use of all four
signals i.e. (contact acoustics, accelerometry, nasal cannula, ambient sound). Therefore, all four modali-
ties were used to evaluate each signal individually and classify it as a true swallow. Given a contradiction
between the modalities, the signals were individually analyzed and a decision was made on the basis
of majority compliance. These data were used as the reference for evaluating the performance of the
algorithm. Table 2.4 shows the di↵erent sensitivity values obtained for both acoustic and accelerometry
segmentation.
Chapter 2. Automatic Swallow Segmentation 19
Table 2.4: Performance comparisons between accelerometry and acoustic automatic segmentation
Sample Type Accel-1 Sensitivity Acoustic Sensitivity
10 mL 88% 86%5 mL 92% 94%Saliva 71% 92%
2.5 Discusssion
The results demonstrate similar performance between the acoustic segmentation and the other accel-1/2
algorithms. The sensitivity values obtained for the acoustics are very similar for the 5mL and 10 mL
samples. However, acoustic segmentation out-performed the accel-1 algorithm by 21% for the saliva
swallow examples. This indicates a potential di↵erence in the source of the signal as saliva samples
display a more prominent acoustic signal during a swallow.
Based on the average values of the segment midpoints, M, the di↵erence between the two accel-
1 and acoustic segmentation suggests that the accel-1 window centers are located at an earlier time
index relative to the acoustic segment midpoints. This results in a negative time di↵erence between
the acoustic and accel-1 M values. On the other hand, the timing of segment midpoints between the
nasal cannula and acoustic signals was much smaller for all three tasks. The correspondence between
nasal cannula and acoustic segmentation is further emphasized through the analysis of the statistics in
table 2.1, demonstrating very close M values.
Given the di↵erence between transducing vibrations through an accelerometer and a microphone,
the frequency components of acoustic signals are expected to contain lower values than accelerometry
signals. This is due to the relationship between acceleration and displacement. While acceleration is the
second derivative of displacement, the mathematical operation on the signal results in the accentuation
of higher frequency components.
2.6 Conclusion
One of the crucial steps towards the advancement of automatic dysphagia diagnosis is the temporal local-
ization of di↵erent swallowing phenomena in swallowing signals. This work demonstrates a new method
of acoustic signal segmentation on the basis of signal energy distribution within transient components.
Acoustics provide potential benefits and novel information that is not redundant with accelerometry
signals. This was proven through the comparison of accelerometry and acoustic based segmentation
algorithms.
Chapter 2. Automatic Swallow Segmentation 20
The segmentation of acoustic swallowing signals has proven comparable to accelerometry signals.
Furthermore, the developed automatic acoustic segmentation has outperformed the accelerometry-based
algorithms for saliva swallow detection. This method has proven to be e�cacious in automatically seg-
menting acoustic signals of 5mL, 10mL and saliva sample swallows. This paves the way for further
analysis of acoustic signals to evaluate their feasibility for use in classification algorithms for discrimi-
nating impaired versus healthy swallows.
Chapter 3
Swallow Signal Characterization
3.1 Introduction
The current practice of dysphagia screening involves limitations that provide the motivation to develop
more intelligent, non-invasive tools for bedside screening [18], [19]. Current e↵orts to develop such
technologies include the use of the accelerometry [20], [21]. Similar to accelerometry, the realm of
acoustics has demonstrated potential for use in swallow screening. In either case, the acquired signal
must be properly segmented. A viable technique for segmentation was described in chapter 2.
Given the current performance and the desire to improve the sensitivity of the segmentation algo-
rithm, a characterization module has been developed with the goal of making the segmentation algorithm
intelligent. This module would allow each segmented portion to be evaluated in order to ensure that only
true swallow portions have been detected. Thus any artifacts such as coughs, speech, heavy breathing
sounds and similar non-swallowing portions would not be incorrectly identified as swallows. This chapter
describes the characterization technique used to di↵erentiate real swallows from artifacts. The outcome
was compared to results which were manually categorized as “swallow” or “non-swallow” on the basis
of all four collected signals: acoustics, accelerometry, nasal cannula and ambient sound.
3.1.1 Overview of Algorithm
The decision of whether a signal segment is an actual swallow or an artifact is made using the following
machine learning algorithm. The segmented portions outputted from the algorithm in chapter 2 are first
fed into a feature extraction subroutine where di↵erent signal features are extracted. These features are
then collapsed into a smaller feature set using the Fisher discriminant analysis tool. This is followed by
22
Chapter 3. Swallow Signal Characterization 23
the application of a logistic regression model, which learns from a 40 sample training set. The output
is a binary classification of “Swallows” and “Non-swallows”. Figure 3.1 presents a flow of the algorithm
and the individual steps.
!
Segmented!Acoustic!Signal! Feature!Extraction! Fisher!Discriminant!
Analysis!
Logistic!Regression!Modeling!!
Binary!Classification!
True!Swallow! Artifact!
Figure 3.1: Signal Characterization Flow Chart
3.2 Methods
3.2.1 Signal Segment Extraction
Using the algorithm explained in chapter 2, the signal segmentation was implemented to obtain time
indices of potential swallow onsets and o↵sets. All of the segmented portions were extracted and com-
pared to the other signal sources, which included the acoustic signals, accelerometry, the nasal cannula
signal and the ambient sound. This was manually executed to obtain a gold standard, categorizing true
swallows and artifacts in order to validate the performance of the binary classifier. Table 3.1 shows the
number of true swallows and falsely detected artifacts for all three sample types. It must be noted that
these numbers do not represent the entire number of swallows as the algorithm might not pick up every
single swallow. Therefore, the addition of the true swallows and falsely detected artifacts do not add to
Chapter 3. Swallow Signal Characterization 24
the same total for all three sample types.
Table 3.1: Automatic Segmentation Output
Sample Type Swallows Artifacts
5 mL 249 2310 mL 256 29Saliva 233 11
3.2.2 Feature Extraction
Due to large variability within the signals, representative features were extracted to aid di↵erentiation
between real swallows and artifacts. Features chosen were on the basis of the time, frequency and
time-frequency parameters that best emphasized distinct attributes within signal samples.
• The mean amplitude represents the average of amplitudes, it is obtained in the following manner:
µ =1
N
NX
n=1
x[n] (3.1)
• The variance of the signal quantifies the average deviation of samples from the mean value:
1
N
NX
n=1
(x[n]� µ)2 (3.2)
• Skewness is a measure of the asymmetry based on a distribution. The skewness represents if the
majority of the amplitudes are lower or higher in magnitude.
1N
PN
n=1(x[n]� µ)3
{ 1N
PN
n=1(x[n]� µ)2}3/2(3.3)
• Kurtosis is a way of quantifying the peakedness of the amplitude distribution, and it is quantified
in the following manner:1N
PN
n=1(x[n]� µ)4
{ 1N
PN
n=1(x[n]� µ)2}2(3.4)
• Dominant frequency corresponds to the frequency component in the Fourier Transform, which has
the highest weight. It is quantified in the following manner:
X(jw) =N�1X
i=0
x[n]e�jwn (3.5)
Chapter 3. Swallow Signal Characterization 25
argmaxw
(w2) Fdominant
= w/2⇡ (3.6)
• Entropy is a measure of the signals randomness; it is quantified in the following manner [22]:
H(x[n]) =+1X
�1FT (x[n])log(FT (x[n])) (3.7)
• Centroid frequency is a way of quantifying the weighted frequency distribution in the frequency
domain, it is quantified in the following manner:
f =
PF
max
0 f |X(jw)|2P
F
max
0 |X(jw)|2(3.8)
• The average wavelet energy represents the energy distribution, in the scale domain, where a 3 level
continuous wavelet decomposition is implemented using the db6 mother wavelet.
y1,2,3(⌧, s) =1p|s|
Z 1
0x(t) ⇤{ t� ⌧
s}d(t) (3.9)
s =y1 + y2 + y3
3(3.10)
E =1X
n=1
|FT (s)|2 (3.11)
• Wavelet energy ratio builds on the previous feature. It takes the energy ratio between the 2nd de-
composed level and the original signal. This is obtained through first decomposing the signal using
a 2 level discrete wavelet decomposition, via a db6 mother wavelet. Chosen based on obtaining
useful morphologies from the signal and the db6 filter characteristics.
y1,2[n] = x[n]⌦H[n] =1X
k=�1x[n]H[n� k]
X1(jw) =1X
n=�1y1[n]e
�jwn E1 =1X
n=0
|X1(jw)|2
X2(jw) =1X
n=�1x[n]e�jwn E2 =
1X
n=0
|X2(jw)|2
Ratio =E1
E2
(3.12)
Chapter 3. Swallow Signal Characterization 26
• The wavelet filtered fractal parameter looks at the fractals complexity feature of a signal, which
has been filtered through a wavelet decomposition. The parameter is useful as it removes noisy
components and evaluates the complexity of the signal. The filtering is done through a 3 level
stationary wavelet decomposition using the db6 mother wavelet. The 2nd energy level was used to
obtain the fractals feature in the following manner:
FD =log(L/a)
log(d/a)(3.13)
Here, L, is the sum of the successing time indices and d is distance between the data n = 1 and
the data with the largest distance [23]. Furthermore, as di↵erent time series exist in a variety of
di↵erent time scales, the basic time unit a is used to normalize the series [23].
• Wavelet filtered entropy is a way of quantifying the randomness of the signal, where the signal was
filtered using a stationary continuous wavelet transform. The decomposition was done using a db6
mother wavelet, breaking down the signal into 3 energy levels. The second energy level is used to
obtain signal entropy. This is obtained in the following manner:
y(t) =1p|s|
Z 1
0x(t) ⇤{ t� ⌧
s}d(t) (3.14)
H(y2) =+1X
�1y2(t)log(y2(t)) (3.15)
• Dominant frequency from Power Spectral Density (PSD). The power spectral density estimation
is derived from taking the Fourier transform of the autocorrelation signal. This allows for more
emphasis on the dominant signal components in the signal. It is obtained in the following manner:
Rxx
[n] =1X
�1x(⌧)x(⌧ � n) (3.16)
Sxx
[jw] =1X
�1R
xx
[n]e�jwn (3.17)
wmax
= argmaxw
(Sxx
[w]2) F 0dominant
= wmax
/2⇡ (3.18)
• Signal energy of top 25% frequency distribution. This is to quantify the distribution corresponding
to higher frequency components. It is obtained through the fourier transform of the signal:
Chapter 3. Swallow Signal Characterization 27
X[jw] =1X
�1x[n]e�jwn
E = |X[ws
]|2
ws
= 2⇡f where f = (75 : 100Hz)
(3.19)
• Amplitude distribution width, W , is a way of quantifying the range of di↵erent amplitudes that
exist within the signal. It is obtained through taking the absolute value of the signal, filtering it
using the 8th order Savitzky-Golay filter S, and taking the width of the amplitude distribution
at half the maximum point of the distribution. The application of S has a normalizing e↵ect on
the signal, hence allowing for a normal distribution. This method is another way of quantifying
the organization/disorganization of the signal [24]. Both c1 and c2 are points on the amplitude
distribution, which are half the value of the maximum point in the distribution. Therefore, taking
the di↵erence between these two points gives a width at half the amplitude distribution.
y = x[n]⌦ S (3.20)
Y ⇠ N(µ,�2) c1,2 =max(Y )
2(3.21)
W = Y (c2)� Y (c1) (3.22)
• Continuous Wavelet Energy Ratio takes advantange of the filtering property of wavelet decompo-
sition. The signal is decomposed using a continuous stationwary wavelet transform via the db6
mother wavelet. The second level decomposition, y2 is chosen for analysis. y2 is then taken into
the frequency domain using the Fourier transform. The ratio of its energy between f = 50 : 100Hz
and f = 0 : 49Hz is taken as a feature. This represents the relative high frequency content in the
transient portions of sample signals. The Stationary Wavelet Transform in explained in detail in
subsection 2.2.4 of chapter 2.
y1,2,3(t) =1p|s|
Z 1
0x(t) ⇤{ t� ⌧
s}d(t) (3.23)
X(jw) =
Z 1
�1y2(t)e
�jwtdt (3.24)
E1 =
Zf
max
/2
0|X(jw)|2 (3.25)
Chapter 3. Swallow Signal Characterization 28
E2 =
Zf
max
f
max
/2|X(jw)|2 (3.26)
Ratio =E2
E1(3.27)
• Variance of signal squared is obtained by multiplying the signal by itself in the time domain. This
emphasizes large transients portions, while diminishing low frequency portions of the signal. Next,
the variance of this new signal was obtained in the following manner:
y = x[n] ⇤ x[n] (3.28)
µ =1
N
N�1X
n=0
y[n] �2 =1
N
N�1X
n=0
(y[n]� µ)2 (3.29)
3.2.3 Fisher Feature Projection
The application of the Fisher’s Projection is to collapse all the obtained features into a few or single
quantity, which is the optimum representation of all features through a weighted projection on a line.
This allows for dimensionality reduction [25]. In cases where multiple features exists, a higher number
of features does not necessarily benefit the decision making process. In using the Fisher’s projection, we
aim to reduce the dimensionality without losing vital information. The purpose of this step is to prepare
for a binary logistic classification using features that provide high discriminatory characteristics.
The direction of the projection line is obtained through an optimization of the criterion function
J(v). This function is obtained through a set of training data. In our case, the criterion function was
optimized towards obtaining the weights, v, for a set of 16 di↵erent signal features.
J(v) =vTS
B
v
vTSw
v(3.30)
This function is derived through first obtaining the within class scatter matrix (equation 3.31) and
the between class scatter matrix (equation 3.32)[25]. These two matrices are obtained on the basis of
the two class means µ1 and µ2.
S1,2 =X
x✏D
i
(x� µi
)(x� µi
)T (3.31)
Sw
= S1 + S2 (3.32)
Chapter 3. Swallow Signal Characterization 29
Therefore, the vector v, which maximizes the criterion function J(v) is obtained in the following
manner, where the � is a constant for an eigenvalue in equation 3.33. Thus, we can reduce the dimensions
of the d-dimensional samples.
S�1w
SB
v = �v (3.33)
v = S�1w
(µ1 � µ2) (3.34)
The actual magnitude of the vector v does not play a role in the Fisher’s projection. However, its
directionality provides the optimum weights for each feature. This projection aims to maximize the
di↵erence between the means of the two classes (µ1 and µ2) [25]. This is implemented in the following
manner, consisting of the dot product of the weight vector and the d-dimensional feature vector shown
in equation 3.35 [26].
y = vTx (3.35)
3.2.4 Logistic Regression - Binary Classification
The logistic regression model was applied as a binary classifier towards categorizing sample signals
as real swallows or falsely detected artifacts. This was done through quantifying the correlation of
dependent variables and independent variables through probabilistic measures on the basis of a binomial
distribution. The classifier outputs a 0 value, corresponding to an artifact, and a value of 1 corresponding
to an actual swallow.
As the ratio of real swallows samples to artifacts is very large, simple bootstrapping was applied to
the original dataset towards mitigating this imbalance. This was done to obtain a sample of n = 10000,
where each sample was replaced at each iteration. The large number of samples were obtained through
bootstrapping in order to mimic a uniform distribution of the original features. This is desirable to
maintain consistency through each run of the algorithm. Therefore, this step was implemented prior
to the fishers feature projection. Execution of the classifier requires the use of a training set, T . The
training set consisted of evenly divided set of 20000 samples (i.e. 10000 swallows and 10000 artifacts)
obtained through the aformentioned bootstrapping method. This training set was used to obtain the
classification boundaries.
Three main parameters were used for logistic regression classification. They consist of the Fisher’s
Chapter 3. Swallow Signal Characterization 30
projection (obtained using equation 3.36), the variance of the projection (obtained using equation 3.37
) and the di↵erence between the variance and the training set’s variance (obtained using equation 3.39).
Where eµ is the projected mean.
y = vTT (3.36)
M1 = argmini
(y � eµi
)2 for i = 1, 2 (3.37)
�2i
=1
N
X
y2Class
i
(y � eµi
)2 for i = 1, 2 (3.38)
M2 = argmini
(�2i
� eµi
) (3.39)
The three main parameters y,M1 and M2 were subsequently used to obtain the coe�cients of the �
matrix. Obtaining the � matrix is an optimization problem of the “cost” function shown in equation
3.41. We use the following function, J(�) in equation 3.43, through an optimization process to obtain
the coe�cients.
J(�) =1
m
mX
n=1
Cost(h�
(Tn), y0) y0 2 (0, 1) (3.40)
Cost(h�
(Tn), y0) =
8><
>:
y0 = 1 �log(h�
(T ))
y0 = 0 �log(1� h�
(T ))
9>=
>;(3.41)
Cost(h�
(Tn), y0) = �y0log(h�
(T ))� (1� y)log(1� h�
(T )) (3.42)
J(�) =1
m
mX
n=0
y(n0)log(h
�
(T (n)) + (1� y(n0))log(1� h
�
(T (n))) (3.43)
The logistic regression model uses the sigmoid function as a basis of obtaining a value that falls
between 0 and 1. The sigmoid function is applied to our feature vector h�
(T ) using equation 3.45.
h�(T ) = g(�T y) (3.44)
g(z) =1
1 + e�z
g(�TT ) =1
1 + e��
T
T
(3.45)
Chapter 3. Swallow Signal Characterization 31
The cost function is the penalty for wrongly identifying a sample. Therefore, it is desirable to
minimize this function. The minimization of the cost function revolves around the use of the gradient
descent. Using the obtained coe�cients, we obtain a probability value, which can be used to categorize
the samples into the two sets. Given the probability from the sigmoid function being greater or equal
to 0.5, we predict a value of 1, corresponding to a true swallow. Similarely, a probability value smaller
than 0.5 is predicted as a falsely detected artifact. This then converts a spectrume of probabilities into
a binary output. Equations 3.46 and 3.47 demonstrate this binary conversion, illustrated in figure 3.2.
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3.2: Sigmoid Function
z = �TT (3.46)
�TT � 0 or h�
(T ) � 0.5 (3.47)
3.3 Results
The characterization of acoustic signals was implemented through a feature extraction protocol. Table
3.2 presents the descriptive statistics of the obtained features. The features were then fed through a
fisher’s projection subroutine. The obtained weights can been seen in figure 3.19 with the corresponding
table 3.3. Figure 3.3 to 3.13 are forrest plots of the features, which show the mean feature value with a
95% confidence interval for the three sample types and non-swallows.
Chapter 3. Swallow Signal Characterization 32
Tab
le3.2:
Descriptive
statistics
offeaturesforsamples5mL,10
mLan
dsaliva
sample
types
5mL
10mL
Saliva
Non-swallows
Featu
res
Mean±SD
⇢95%
Mean±SD
⇢95%
Mean±SD
⇢95%
Mean±SD
⇢95%
Mean
-0.006
±0.08
0.002
-0.001
±0.021
0.002
-0.001
±0.003
0.002
0.0001
±0.003
0.002
Variance
0.09
±0.53
0.008
0.06
±0.059
0.002
0.06
±0.058
0.002
0.029±
0.021
0.01
Skewness
0.123±
0.873
0.014
0.152±
0.91
0.026
-0.008
±0.63
0.02
0.136±
0.819
0.404
Kurtosis
13.53±
13.12
0.22
13.69±
14.04
0.422
10.17±
10.34
0.342
10.14±
9.65
4.76
Dom
inan
tFrequ
ency
27.32±
16.79
0.28
26.41±
15.82
0.48
22.94±
11.00
0.362
21.91±
12.13
5.98
Entropy
0.819±
0.082
0.002
0.817±
0.062
0.002
0.817±
0.058
0.002
0.783±
0.078
0.038
CentroidFrequ
ency
52.51±
18.49
0.308
51.40±
17.71
0.532
53.04±
16.65
0.548
44.29±
21.07
10.404
Average
Wavelet
Energy
34.95±
41.04
0.684
31.92±
39.11
1.178
39.71±
41.96
1.384
41.88±
40.79
20.14
Wavelet
Energy
Ratio
0.052±
0.104
0.002
0.05
±0.043
0.002
0.041±
0.032
0.002
0.033±
0.035
0.016
Fractal
Dim
enstions
2.007±
0.321
0.006
1.98
±0.299
0.01
2.075±
0.315
0.01
2.072±
0.382
0.188
Wavelet
FilteredEntropy
0.767±
0.111
0.002
0.773±
0.104
0.004
0.791±
0.071
0.002
0.795±
0.065
0.032
Dom
inan
tFrequ
ency
-PSD
27.02±
16.86
0.282
26.39±
15.40
0.464
22.66±
10.98
0.362
21.31±
12.24
6.048
75-100Signal
Energy
2.518±
7.160
0.118
1.581±
2.122
0.064
1.04
±1.587
0.052
5.029±
15.48
7.646
Amplitude-Distribution
Width
0.103±
0.095
0.002
0.101±
0.051
0.002
0.111±
0.058
0.002
0.0849
±0.042
0.022
CW
TEnergy
Ratio
0.995±
0.810
0.012
0.913±
0.862
0.024
0.996±
0.611
0.02
1.502±
1.674
0.826
Variance
ofSignal
Squ
ared
0.006±
0.005
0.012
0.006±
0.006
0.0002
0.007±
0.006
0.002
0.006±
0.006
0.002
Chapter 3. Swallow Signal Characterization 33
−0.025 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02
10 mL
5 mL
Saliva
Non−swallow
Figure 3.3: Amplitude Mean
−0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
10 mL
5 mL
Saliva
Non−swallow
Figure 3.4: Amplitude Variance
Chapter 3. Swallow Signal Characterization 34
−0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
10 mL
5 mL
Saliva
Non−swallow
Figure 3.5: Amplitude Skewness
4 6 8 10 12 14 16 18
10 mL
5 mL
Saliva
Non−swallow
Figure 3.6: Amplitude Kurtosis
Chapter 3. Swallow Signal Characterization 35
14 16 18 20 22 24 26 28 30 32
10 mL
5 mL
Saliva
Non−swallow
Figure 3.7: Dominant Frequency (Hz)
0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84
10 mL
5 mL
Saliva
Non−swallow
Figure 3.8: Entropy
Chapter 3. Swallow Signal Characterization 36
30 35 40 45 50 55 60
10 mL
5 mL
Saliva
Non−swallow
Figure 3.9: Centroid Frequency (Hz)
20 25 30 35 40 45 50 55 60 65
10 mL
5 mL
Saliva
Non−swallow
Figure 3.10: Average Wavelet Energy
Chapter 3. Swallow Signal Characterization 37
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
10 mL
5 mL
Saliva
Non−swallow
Figure 3.11: DWT Energy Ratio
1.8 1.85 1.9 1.95 2 2.05 2.1 2.15 2.2 2.25 2.3
10 mL
5 mL
Saliva
Non−swallow
Figure 3.12: Fractals Dimension
Chapter 3. Swallow Signal Characterization 38
0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83
10 mL
5 mL
Saliva
Non−swallow
Figure 3.13: Wavelet Filtered ENtropy
14 16 18 20 22 24 26 28 30 32
10 mL
5 mL
Saliva
Non−swallow
Figure 3.14: Dominant Frequency - PSD (Hz)
Chapter 3. Swallow Signal Characterization 39
−4 −2 0 2 4 6 8 10 12 14
10 mL
5 mL
Saliva
Non−swallow
Figure 3.15: Signal Energy - 75 to 100 (Hz)
0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13
10 mL
5 mL
Saliva
Non−swallow
Figure 3.16: Amplitude Distribution Width
Chapter 3. Swallow Signal Characterization 40
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
10 mL
5 mL
Saliva
Non−swallow
Figure 3.17: Wavelet Energy Ratio
3 4 5 6 7 8 9 10x 10−3
10 mL
5 mL
Saliva
Non−swallow
Figure 3.18: Variance - Signal Squared
Chapter 3. Swallow Signal Characterization 41
Table 3.3: Fisher’s projection and feature weights
Feature ID Feature Weight
A Mean -129.57B Variance 36.7C Skewness 0.0125D Kurtosis 1.03E Dominant Frequency 0.002F Entropy 0.0136G Centroid Frequency -0.59H Average Wavelet Energy -0.028I Wavelet Energy Ratio -17.56J Fractal Dimensions 1.702K Wavelet Filtered Entropy -1.285L Dominant Frequency from PSD -0.016M 75-100 Hz Signal Energy -0.035N Amplitude - Distribution Width 31.04O Continuous Wavelet Energy Ratio -1.07P Variance of squared signal -10.577
A B C D E F G H I J K L M N O A
−120
−100
−80
−60
−40
−20
0
20
Feature ID
Wei
ghts
Figure 3.19: Feature weights of fisher projections
The application of the binary classifier resulted in a 96.77% sensitivity and a 14.51% specificity. Table
3.4 shows the results of the binary classification. It can be seen that the algorithm is biased towards
classification of samples in the “True Swallow” category. This can further be quantified using equations
3.50, indicating how well the classifier identifies a sample as a true swallow or an artifact using positive
Chapter 3. Swallow Signal Characterization 42
and negative predictive values.
Figure 3.20 demonstrates the binary classification of a set of 124 signal samples evenly distributed
between swallows (n = 1:62) and artifacts (n = 63:125). A value of 1 indicating classification as a “true
swallow” and a value of 0 indicating classification as an “artifact”.
0 20 40 60 80 100 120 1400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sample Number
Figure 3.20: Binary classification using logistic regression
Table 3.4: Sensitivity and Specificity of binary classification
Real Swallow Artifacts Total
Swallow Detected True Positive = 60 False Positive = 53 113
Artifact Detected False Negative = 2 True Negative = 9 11
Sensitivity =True Positive
True Positive+ False Negative⇤ 100 =
60
60 + 2⇤ 100 = 96.77% (3.48)
Specificity =True Negative
False Positive+ True Negative⇤ 100 =
9
53 + 9⇤ 100 = 14.51% (3.49)
Chapter 3. Swallow Signal Characterization 43
PPV =True Positive
True Positive+ False Positive⇤ 100 =
60
60 + 54⇤ 100 = 53.1% (3.50)
NPV =True Negative
True Negative+ False Negative⇤ 100 =
9
9 + 2⇤ 100 = 82% (3.51)
Table 3.5: Algorithm Performance
Acoustic Segmentation Acoustic Segmentation + CharacterizationSensitivity (%) 90.49 96.77Specificity (%) 66 14.51
3.4 Discussion
While a large array of features was extracted from each sample, the fisher’s projection clearly suggests
that the majority of the features do not prove to be highly discriminatory between the two classes. This
is due to one class containing distinct attributes (true swallows), while the other exhibits an enormous
amount of variability. The work by Youmans and Stierwalt in [27] reports dominant frequencies for
acoustic swallowing signals in healthy individuals, ranging from 2-3 kHz. Our finding in section 3.3
shows a much lower dominant frequency, ranging from 22 to 27 Hz. The discrepancy could be explained
due to di↵erent sensor modalities used for data collection. While Youmans and Stierwalt [27] use an
accelerometry sensor, our data was transduced through a true acoustics transducer (contact microphone
model: AKG C411). As the contact microphone is not prone to picking up motion artifacts, it is less
sensitive to motion artifacts produced via non-swallowing tasks. As the source of these artifacts can be
anything such as breathing, movement and unintentional sounds such as speech or other physiological
acoustics, the concept of finding a specific discriminatory attribute from the artifacts is not feasable.
Forest plots of all extracted features, shown in Figures 3.3 to 3.18, were also used towards assessing the
suitability of each feature towards classification. These demonstrate the mean feature value for all sample
data and their respective 95% confidence interval. They were categorized on the basis of real swallows
for 5mL, 10mL, Saliva swallows and non-swallow artifacts. Based on the results in figure 3.19, the
discriminatory features that maximize the di↵erence between the “projection means”, eµ, are amplitude
variance and amplitude distribution width. This was further evaluated by running the algorithm to
evaluate the discriminatory power of di↵erent feature combinations.
Due to the large variability of artifacts and the overlap of their features with real swallow features,
the binary classification has a tendency towards over-detecting true swallows. Hence, the number of false
Chapter 3. Swallow Signal Characterization 44
positives is larger than false negatives. The comparison of the segmentation performance, before and
after the addition of the binary classifier, reveals an overall improvement in the sensitivity. Comparing
the performance of segmentation and segmentation with classifier, the results were obtained through
comparison with the manually segmented samples. As seen in table 3.5, the results show an overall
sensitivity of 90.49% for segmentation and 96.77% sensitivity for segmentation with binary classification.
Similarly, the negative predictive value (NPV) was improved from 70.76% to 82%; the implications being
the enhancement of the algorithm towards identifying a sample as an artifact, given the sample being
an actual artifact. Contrary to this advancement, specificity of the algorithm was lowered from 66% to
14.51%. This denotes a process, which is skewed towards identifying samples as true swallows. Moreover,
the positive predictive value, which is the probability of identifying samples as swallow given it being a
true swallow, changed from an 88.54% to a 53.1%.
3.5 Conclusion
The addition of the logistic regression binary classifier to the previously designed segmentation subroutine
has yielded an overall performance advancement for true swallow detection. The segmentation algorithm
performs on the basis of energy distributions within the acoustic signals. Many of the candidate features
used for the classifier revolved around signal energy parameters. Moreover, due to the large variability of
the samples, di↵erent filtering methodologies such as discrete wavelet, continuous wavelet and Savitzky-
Golay filters were applied to remove noisy, undesirable components.
Furthermore, the 16 chosen features provided a large array of descriptive information from the set
of training data. However, based on the fisher projection, it was evident that only a few features were
discriminatory enough to aid in classification. Features with a relatively high projection weight stem
from quantifying signal complexity. Table 3.3 shows positive values for fractals dimensions, variance and
amplitude distribution width. These features quantify signal complexity in di↵erent ways.
The sensitivity of the the segmentation algorithm improved by 6.28% from an original 90.49% to
96.77%. The advanced ability to pick out true swallows was an incentive for the development of the
binary classifier. However, the added benefits of the classifier are accompanied with a loss in correctly
identifying artifacts. This can be seen via the dramatic drop of the specificity; from an original 66%
to 14%. These figures clearly describe a system, which is prone to falsely classifying artifacts as true
swallows. This work demonstrates the both the benefits and pitfalls of using machine learning towards
signal segmentation. Moreover, it has demonstrated the discriminatory power of di↵erent signal features
obtained in time, frequency and time-frequency.
Chapter 3. Swallow Signal Characterization 45
Potential work contributing to a better classification of these two classes can be done through devel-
oping a better training set for the logistic regression step. This can be done through the use of intentional
artifact recordings such as coughs, throat clears, vocalization, quiet breathing and other potential arti-
facts. By having a better training set, a more e�cacious model can be developed towards obtaining a
better classifier.
Chapter 4
Acoustics and Swallow Screening
4.1 Acoustics
The current use of accelerometry, which possesses many similar attributes as our sensor, has proven
the feasibility of monitoring surface vibrations for screening. This work has evaluated the suitability
of acoustic signals towards the development of non-invasive tools and our analysis has shown that the
realm of acoustics definitely has potential for providing novel information. Moreover, the use of a contact
microphone as a transducer has demonstrated added benefit as it overcomes some of the deficiencies of
the accelerometer. These include a flat frequency response and neglection of participant movement.
Similarly, the acoustic signals have shown to be useful towards temporal localization of swallow
segements. Our work has demonstrated similar performance of swallow segmentation in comparison
to accelerometry signals for 5 mL and 10 mL sample types. Surprisingly, the acoustic segmentation
algorithm outperformed the accelerometry segmentation with a 21% sensitivity di↵erence on Saliva
samples.
Since the long term goal is to have a device that is used by the bed-side, it is important to be
able to reduce any potential recording of unwanted information. This may include other sounds in
the environment. As the accelerometry doesn’t record sound it does not have this problem. From the
acoustic side, the chosen transducer has a figure 8 polarity, which makes it bidirectionally sensitive.
Figure 4.1 demonstrates the frequency response and polarity of the transducer. Moreover, only sound
that is transmitted through the participant’s tissue will be picked up. This property permits the use of a
contact microphone in a noisy environment. Therefore, given the aforementioned attributes of acoustics,
it must be noted that the use of acoustics presents a great potential towards designing a standalone or
47
Chapter 4. Acoustics and Swallow Screening 48
hybrid accelerometry-acoustic screening tool.
Figure 4.1: AKG 411 PP transducer frequency response and polarity
4.1.1 Acoustic Features
The features used for this work were chosen on the basis of the characteristics of the acoustic signals.
In depth analysis of di↵erent samples in time and frequency domain led to the chosen list in table
3.3. The completion of our analysis revealed a short list of features, which successfully contributed
to discriminating swallows and artifacts. It must be noted that these features still possesses potential
for classification by other means. As the non-swallow class possesses low discriminatory characteristics
and a large variability, the discriminatory power of these features is only evaluated in the realm of
discriminating artifacts from true swallows.
The analysis of the features through descriptive statistics shown in table 3.2 demonstrates that the
top four features contributing to discriminating the two mentioned classes are 1) Amplitude variance
2) Amplitude distribution width 3) Fractals dimensions and 4) Amplitude kurtosis with a fisher’s pro-
jection weight of 36.7, 31.04, 1.7 and 1.3 respectively. The commonality between these features is the
quantification of signal complexity in di↵erent ways. Contrary to the original hypothesis, the use of
frequency and time-frequency features such as dominant frequency, centroid frequency and wavelets did
not outperform these features. Tables 4.1 and 4.2 provide detailed statistics of features for each sample
type and each class.
Chapter 4. Acoustics and Swallow Screening 49
Tab
le4.1:
Descriptive
statistics
offeaturesfrom
5mL,10mLsw
allow
samples
5mL
10mL
Saliva
Features
Mean±
SD
⇢95%
Mean±
SD
⇢95%
Mean±
SD
⇢95%
Mean
-0.006
±0.080
�0.016—
0.004
-0.001
±0.021
�0.003—
0.002
-0.001
±0.022
�0.004—
0.001
Variance
0.091±
0.525
0.026—
0.156
0.062±
0.059
0.054—
0.069
0.066±
0.058
0.058—
0.073
Skewness
0.123±
0.873
0.014—
0.232
0.152±
0.906
0.041—
0.263
-0.008
±0.634
�0.090—
0.073
Kurtosis
13.5393±
13.123
11.909
—15.169
13.695
±14.047
11.971
—15.419
10.171
±10.337
8.843—
11.498
Dom
inatnFrequ
ency
27.311
±16.790
25.235
—29.406
26.409
±15.821
24.467
—28.352
22.935
±11.005
21.522
—24.348
Entropy
0.818983
±0.082
0.808—
0.829
0.817±
0.062
0.811—
0.826
0.816±
0.058
0.809—
0.824
CentroidFrequ
ency
52.508
±18.491
50.212
—54.806
51.403
±17.707
49.23—
53.576
53.045
±16.658
50.906
—55.185
Average
Wavelet
Energy
34.95±
41.045
29.852
—40.048
31.924
±39.109
27.123
—36.724
39.709
±41.966
34.321
—45.098
Wavelet
Energy
Ratio
0.052±
0.104
0.038—
0.064
0.051±
0.043
0.045—
0.055
0.041±
0.032
0.036—
0.045
FractalsDim
ension
s2.007±
0.321
1.967—
2.047
1.982±
0.299
1.946)
���
2.019
2.075±
0.315
2.035—
2.116
Wavelet
filtered
entrop
y0.767±
0.111
0.753—
0.781
0.773±
0.104
0.76
—0.786
0.791±
0.071
0.781—
0.799
Dom
inan
tFrequ
ency
-PSD
27.021
±16.861
24.927
—29.116
26.391
±15.401
24.501
—28.282
22.655
±10.978
21.246
—24.065
Signal
Energy
70-100
Hz
2.518±
7.160
1.628—
3.407
1.581±
2.122
1.321—
1.842
1.043±
1.587
0.839—
1.247
HalfDistribution
Width
0.103±
0.095
0.092—
0.115
0.102±
0.051
0.095—
0.108
0.111±
0.058
0.103—
0.119
DW
TEnergy
Ratio
0.995±
0.811
0.894—
1.096
0.913±
0.862
0.807—
1.019
0.997±
0.611
0.919—
1.075
Signal
Squ
ared
Variance
0.006±
0.005
0.005—
0.006
0.006±
0.006
0.005—
0.006
0.007±
0.006
0.006—
0.007
Chapter 4. Acoustics and Swallow Screening 50
Tab
le4.2:
Descriptive
statistics
offeaturesfrom
5mL,10mLartifact
samples
5mL
10mL
Saliva
Features
Mean±
SD
⇢95%
Mean±
SD
⇢95%
Mean±
SD
⇢95%
Mean
0.0002
±0.001
�0.0002
—0.0007
�8.437⇥10
�05±
0.004
�0.002—
0.001
0.0003
±0.002
�0.001—
0.002
Variance
0.027±
0.021
0.019—
0.035
0.027±
0.021
0.019—
0.036
0.035±
0.021
0.023—
0.049
Skewness
0.391±
1.063
�0.043—
0.825
0.046±
0.479
�0.136—
0.228
-0.159
±0.797
�0.654—
0.334
Kurtosis
12.413
±12.294
7.388—
17.438
7.622±
5.898
5.378—
9.865
12.022
±9.716
6.001—
18.044
Dom
inatnFrequ
ency
20.425
±7.362
17.416
—23.434
22.681
±15.574
16.758
—28.604
22.962
±9.139
17.297
—28.627
Entropy
0.796±
0.072
0.766—
0.825
0.773±
0.084
0.742—
0.806
0.782±
0.073
0.736—
0.828
CentroidFrequ
ency
52.478
±15.479
46.152
—58.804
37.451
±19.511
30.031
—44.871
45.217
±27.8
27.986
—62.448
Average
Wavelet
Energy
40.856
±42.196
23.611
—58.102
38.425
±36.184
24.664
—52.186
53.155
±46.860
24.111
—82.199
Wavelet
Energy
Ratio
0.032±
0.023
0.023—
0.042
0.032±
0.043
0.016—
0.049
0.035±
0.027
0.018—
0.053
FractalsDim
ension
s2.039±
0.419
1.868—
2.211
2.118±
0.323
1.995—
2.242
2.017±
0.427
1.752—
2.282
Wavelet
filtered
entrop
y0.804±
0.062
0.778—
0.829
0.794±
0.062
0.771—
0.818
0.778±
0.072
0.734—
0.823
Dom
inan
tFrequ
ency
-PSD
20.267
±7.548
17.182
—23.352
21.765
±15.741
15.779
—27.752
22.3
±9.114
16.651
—27.949
Signal
Energy
70-100
Hz
12.458
±23.837
2.716—
22.201
0.677±
0.922
0.326—
1.028
0.972±
1.067
0.309—
1.634
HalfDistribution
Width
0.102±
0.032
0.088—
0.114
0.077±
0.046
0.059—
0.095
0.071±
0.037
0.047—
0.094
DW
TEnergy
Ratio
1.787±
2.374
0.817—
2.758
1.281±
1.042
0.885—
1.678
1.486±
1.054
0.832—
2.139
Signal
Squ
ared
Variance
0.006±
0.006
0.004—
0.009
0.006±
0.006
0.003—
0.008
0.009±
0.005
0.005—
0.012
Chapter 4. Acoustics and Swallow Screening 51
As seen through the comparison of di↵erent sample types in table 4.1, the features for the 5 mL
and 10 mL tasks possess much closer values than the saliva task. This raises the possibility that a
portion of the liquid swallow sound arises from the liquid bolus itself, and is missing in the context of
dry (saliva) swallows. Moreover, it is hypothesized that the presence of a bolus contributes to additional
artifacts as dry swallows have the least number of artifacts and the lowest number of falsely detected
swallows.Therefore, it can be stated that saliva swallows do di↵er from liquid swallows.
4.2 Machine Learning
The use of a machine learning subroutines has shown an enhancement in our segmentation algorithm.
The classification of signals into “true swallow” or “artifacts” was needed in order to add intelligence to
the segmentation algorithm. This was required as the acoustic signals exhibit high levels of variability.
The machine learning algorithm using logistic regression classification, complimented with Fisher’s
discriminant analysis proved to perform well. However, the majority of the chosen features proved
insignificant in discriminating the two classes. As the characteristics of artifacts can have infinite vari-
ability, the performance of this algorithm towards distinguishing artifacts from true swallows was not
highly e�cacious.
4.3 Future Work
The developed method of Fisher’s projection and logistic regression classification is a good candidate
towards the classification of healthy and unhealthy participants. Future work can look into the com-
plete evaluation of the designed algorithm via the provision of two distinct classes. As only the ”true
swallow” class possesses distinct characteristics, while the artifact class possesses a larger variability in
its attributes, this algorithm’s full potential is not fully evaluated. Potential work can revolved around
healthy and unhealthy subjects, di↵erent sexed groups, di↵erent aged groups and di↵erent etiologies of
participants, classified through this protocol.
Moreover, analysis of intentional artifacts such as coughs, throat clears and speech/vocalizations
can contribute to a better classification between real swallows and artifacts. Furthermore, the analysis
of intentional artifacts such can allow for quantitatively answering the question of whether intentional
artifacts such as an intentional cough di↵ers from a naturally induced cough. Lastly, further analysis
of saliva swallows proves beneficial as current results have shown saliva swallows to di↵er from samples
swallows (i.e. 5 mL and 10 mL).
Chapter 4. Acoustics and Swallow Screening 52
4.4 Conclusion
In this work we have evaluated the values and drawbacks of an acoustic transducer in the application of
non-invasive swallow screening. The analysis of the selected features in chapter 3 has been justified and
it is highly recommended for future work to further discover features that quantify signal complexity.
The analysis of individual sample types also demonstrated that saliva swallows di↵er from liquid sample
swallows.
Bibliography
[1] E. Sejdic, T. H. Falk, C. M. Steele, and T. Chau, “Vocalization removal for improved automatic
segmentation of dual-axis swallowing accelerometry signals,” Medical Engineering Physics, vol. 32,
no. 6, pp. 668 – 672, 2010.
[2] D. C. Gleeson, “Oropharyngeal swallowing and aging: A review,” Journal of Communication Dis-
orders, vol. 32, no. 6, pp. 373 – 396, 1999.
[3] G. Malandraki and J. Robbins, “Chapter 21 - dysphagia,” in Neurological Rehabilitation, ser. Hand-
book of Clinical Neurology, M. P. Barnes and D. C. Good, Eds. Elsevier, 2013, vol. 110, pp. 255
– 271.
[4] R. Bulat and R. Orlando, “Oropharyngeal dysphagia,” Current Treatment Options in Gastroen-
terology, vol. 8, no. 4, pp. 269–274, 2005.
[5] C. Borr, M. Hielscher-Fastabend, and A. Lcking, “Reliability and validity of cervical auscultation,”
Dysphagia, vol. 22, no. 3, pp. 225–234, 2007. [Online]. Available: http://dx.doi.org/10.1007/s00455-
007-9078-3
[6] K. Takahashi, M. Groher, and K.-i. Michi, “Methodology for detecting swallowing sounds,” Dys-
phagia, vol. 9, no. 1, pp. 54–62, 1994.
[7] C. M. Steele, S. M. Molfenter, G. L. Bailey, R. C. Polacco, A. A. Waito, D. C. B. H. Zoratto,
and T. Chau, “Exploration of the utility of a brief swallow screening protocol with comparison to
concurrent videofluoroscopy,” Canadian Journal of Speech-Language Pathology Audiology, vol. 35,
2011.
[8] F. E. Ester Marco, Esther Duarte, “Usefulness of the volume-viscosity swallow test for screening
dysphagia in subacute stroke patients in rehabilitation,” NeuroRehabilitation, pp. 631–638.
54
Bibliography 55
[9] B. Kertscher, R. Speyer, M. Palmieri, and C. Plant, “Bedside screening to detect oropharyngeal
dysphagia in patients with neurological disorders: An updated systematic review,” Dysphagia,
vol. 29, no. 2, pp. 204–212, 2014. [Online]. Available: http://dx.doi.org/10.1007/s00455-013-9490-9
[10] P. Clav, V. Arreola, M. Romea, L. Medina, E. Palomera, and M. Serra-Prat, “Accuracy of the
volume-viscosity swallow test for clinical screening of oropharyngeal dysphagia and aspiration,”
Clinical Nutrition, vol. 27, no. 6, pp. 806 – 815, 2008.
[11] D. Suiter, J. Sloggy, and S. Leder, “Validation of the yale swallow protocol: A prospective double-
blinded videofluoroscopic study,” Dysphagia, vol. 29, no. 2, pp. 199–203, 2014.
[12] A. Osawa, S. Maeshima, and N. Tanahashi, “Water-swallowing test: Screening for aspiration in
stroke patients,” Cerebrovascular Diseases, vol. 35, no. 3, pp. 276–81, 04 2013.
[13] C. Steele, E. Sejdi, and T. Chau, “Noninvasive detection of thin-liquid aspiration using dual-axis
swallowing accelerometry,” Dysphagia, vol. 28, no. 1, pp. 105–112, 2013.
[14] S. Morinire, M. Boiron, D. Alison, P. Makris, and P. Beutter, “Origin of the sound components
during pharyngeal swallowing in normal subjects,” Dysphagia, vol. 23, no. 3, pp. 267–273, 2008.
[15] B. Sherman, J. M. Nisenboum, B. L. Jesberger, C. A. Morrow, and J. A. Jesberger, “Assessment
of dysphagia with the use of pulse oximetry,” Dysphagia, vol. 14, no. 3, pp. 152–156, 1999.
[16] J. Lee, C. M. Steele, and T. Chau, “Classification of healthy and abnormal swallows based on
accelerometry and nasal airflow signals,” Artif Intell Med, vol. 52, no. 1, pp. 17–25, May 2011.
[17] S. Mukhopadhyay and G. C. Ray, “A new interpretation of nonlinear energy operator and its e�cacy
in spike detection,” Biomedical Engineering, IEEE Transactions on, vol. 45, no. 2, pp. 180–187, Feb
1998.
[18] C. Steele, E. Sejdi, and T. Chau, “Noninvasive detection of thin-liquid aspiration using dual-axis
swallowing accelerometry,” Dysphagia, vol. 28, no. 1, pp. 105–112, 2013.
[19] J. Lee, E. Sejdi, C. Steele, and T. Chau, “E↵ects of liquid stimuli on dual-axis swallowing accelerom-
etry signals in a healthy population,” BioMedical Engineering OnLine, vol. 9, no. 1, 2010.
[20] I. Orovic, S. Stankovic, T. Chau, C. M. Steele, and E. Sejdic, “Time-frequency analysis and hermite
projection method applied to swallowing accelerometry signals,” EURASIP J. Adv. Sig. Proc., 2010.
Bibliography 56
[21] E. Sejdic, T. H. Falkemail, C. M. Steeleemail, and T. Chauemail, “Vocalization removal for improved
automatic segmentation of dual-axis swallowing accelerometry signals,” Medical Engineering and
Physics, vol. 32, p. 668672, 2010.
[22] J. F. Bercher and C. Vignat, “Estimating the entropy of a signal with applications,” Signal Pro-
cessing, IEEE Transactions on, vol. 48, no. 6, pp. 1687–1694, Jun 2000.
[23] S. Anisheh and H. Hassanpour, “Designing an adaptive approach for segmenting non-stationary
signals,” International Journal of Electronics, vol. 98, no. 8, pp. 1091–1102, 2011.
[24] K. Umapathy, F. H. Foomany, P. Dorian, T. Farid, G. Sivagangabalan, K. Nair, S. Masse, S. Kr-
ishnan, and K. Nanthakumar, “Real-time electrogram analysis for monitoring coronary blood flow
during human ventricular fibrillation: Implications for {CPR},” Heart Rhythm, vol. 8, no. 5, pp.
740 – 749, 2011.
[25] H. Wang, X. Lu, Z. Hu, and W. Zheng, “Fisher discriminant analysis with l1-norm,” Cybernetics,
IEEE Transactions on, vol. 44, no. 6, pp. 828–842, June 2014.
[26] A. Yadollahi and Z. Moussavi, “Feature selection for swallowing sounds classification,” in Engineer-
ing in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of
the IEEE, Aug 2007, pp. 3172–3175.
[27] S. Youmans and J. Stierwalt, “Normal swallowing acoustics across age, gender, bolus viscosity, and
bolus volume,” Dysphagia, vol. 26, no. 4, pp. 374–384, 2011.