A Wearable, Extensible, Open-Source Platform for Hearing ...mesl.ucsd.edu/pubs/pisha_ieeeaccess2019_OSP.pdfclosely synchronized to the audio. All of these features are provided in

Received September 17, 2019, accepted October 5, 2019, date of publication November 4, 2019,date of current version November 18, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2951145

A Wearable, Extensible, Open-Source Platformfor Hearing Healthcare ResearchLOUIS PISHA 1, (Student Member, IEEE), JULIAN WARCHALL 2, (Student Member, IEEE),TAMARA ZUBATIY3, SEAN HAMILTON4, CHING-HUA LEE1, (Student Member, IEEE),GANZ CHOCKALINGAM5, PATRICK P. MERCIER1, (Senior Member, IEEE),RAJESH GUPTA4,6, (Fellow, IEEE), BHASKAR D. RAO1, (Fellow, IEEE),AND HARINATH GARUDADRI5, (Member, IEEE)1Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA 92093, USA2Booz Allen Hamilton, Arlington, VA 22203, USA3School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA4Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA5Qualcomm Institute, University of California at San Diego, La Jolla, CA 92093, USA6Halıcıoglu Data Science Institute, La Jolla, CA 92093, USA

Corresponding author: Louis Pisha ([email protected])

This work was supported in part by the National Institute of Health/National Institute on Deafness and Other communication Disorders(NIH/NIDCD) under Grant R21DC015046 and Grant R33DC015046, ‘‘Self-fitting of Amplification: Methodology and Candidacy,’’ inpart by the NIH/NIDCD under Grant R01DC015436, ‘‘A Real-time, Open, Portable, Extensible, Speech Lab,’’ in part by the U.S. ArmyResearch Laboratory under Contract W911QX-16-C-0003, in part by the National Science Foundation Graduate Research Fellowshipunder Grant DGE-114086, and in part by the National Science Foundation Division of Information & Intelligent Systems under GrantIIS-1838830, ‘‘A Framework for Optimizing Hearing Aids In Situ Based on Patient Feedback, Auditory Context, and Audiologist Input.’’

ABSTRACT Hearing loss is one of the most common conditions affecting older adults worldwide. Frequentcomplaints from the users of modern hearing aids include poor speech intelligibility in noisy environmentsand high cost, among other issues. However, the signal processing and audiological research neededto address these problems has long been hampered by proprietary development systems, underpoweredembedded processors, and the difficulty of performing tests in real-world acoustical environments. Tofacilitate existing research in hearing healthcare and enable new investigations beyond what is currentlypossible, we have developed amodern, open-source hearing research platform,Open Speech Platform (OSP).This paper presents the system design of the complete OSP wearable platform, from hardware throughfirmware and software to user applications. The platform provides a complete suite of basic and advancedhearing aid features which can be adapted by researchers. It serves web apps directly from a hotspot onthe wearable hardware, enabling users and researchers to control the system in real time. In addition, it cansimultaneously acquire high-quality electroencephalography (EEG) or other electrophysiological signalsclosely synchronized to the audio. All of these features are provided in a wearable form factor with enoughbattery life for hours of operation in the field.

INDEX TERMS Hearing aids (HAs), wearable computers, speech processing, field programmable gatearrays (FPGAs), electrophysiology (EEG), system-level design, open source hardware, embedded software,Internet of Things, research initiatives.

I. INTRODUCTIONHearing is essential for communication, navigation, and qual-ity of life. The healthy ear is able to operate in a wide varietyof environments over a huge dynamic range due to its highlycomplex nonlinear, time-varying, and attention-controlledcharacteristics. As a result, when hearing impairments occur,they can rarely be corrected by simply amplifying the inputsound. Hearing aids (HAs) have been under development

The associate editor coordinating the review of this manuscript and

approving it for publication was Li He .

from this starting point for the last forty years, and nowincorporate multi-band processing, dynamic range compres-sion, feedback and noise management, and other advancedfeatures.

Unfortunately, there is substantial dissatisfaction withmany aspects of HAs among the user community [1]. Keyfactors underlying this dissatisfaction include the following:

1) Clinical challenges: One example is that the currentbest practices in HL diagnosis and intervention relymostly on pure tone audiometry (PTA) [2], whichcharacterize only the spectral aspects of HL, in clean

VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ 162083

https://orcid.org/0000-0001-5307-4263

https://orcid.org/0000-0002-1363-8500

https://orcid.org/0000-0003-0261-4068

L. Pisha et al.: Wearable, Extensible, Open-Source Platform for Hearing Healthcare Research

conditions; the temporal dynamics in human perceptionof speech and music in clean and noisy environmentsare largely ignored. A different type of challenge isthe typical need for users to see an audiologist to havefitting parameters adjusted; as an alternative, manyresearchers are investigating self-fitting procedures,environment-dependent profiles, and other ways to givethe user control over their experience.

2) Technical constraints:HAsmust provide sufficient bat-tery power for processing and communication, in anacceptably small form factor, while introducing nomore than 10 ms of latency [3], [4]. The overall latencyrequirement presents a significant challenge for noisemitigation algorithms and other advanced functionssuch as frequency lowering. Furthermore, binaural pro-cessing in HAs to take advantage of spatial informationin noisy environments is a major challenge, because ofthe power requirements for wireless communication offull-band audio signals between the HAs and additionalprocessing for adaptive beamforming.

3) Research accessibility: There are five major HA man-ufacturers: Phonak, Oticon, ReSound, Starkey, andWSAudiology. All of these manufacturers provide audiol-ogists with tools for HA fitting, which can be used forcertain kinds of clinical research. The manufacturersalso sometimes provide their internal platforms foracademic research in specific topics, such as directionalmicrophones, noise management, programs for multi-ple listening environments, etc. However, each of theseplatforms is proprietary and unique, meaning that itis difficult to generalize research across the platforms,and infeasible to modify or experiment with the algo-rithms in ways not intended by the manufacturers.

4) Cost: There is an average 8.9-year delay between HAcandidacy and HA adoption, with the biggest predic-tor of adoption delay being socioeconomic status [5].This implies that the cost of HAs—which is oftenseveral thousand dollars—is a significant obstacle tomany users. This high cost is partly due to the tech-nology, but also largely due to the closed ecosystem ofmedical-grade hearing instruments. In response, a newmarket in off-the-shelf hearing assisted devices hasemerged [6], [7].

The National Institutes of Health (NIH) conducted a work-shop in 2014 on open-source HA research platforms andpublished recommendations about their capabilities and fea-tures [8]. Our system, Open Speech Platform (OSP) [9],is designed to meet these recommendations, including thevision of ‘‘new types of basic psychophysical research stud-ies beyond what is widely done today’’. OSP is a suite ofcomprehensive, open-source hardware and software tools formultidisciplinary research in hearing healthcare. The goalsof OSP are to address the underlying causes behind thechallenges described above, to facilitate existing research byaudiologists and DSP engineers, and to enable new kinds ofinvestigations between hearing and related disciplines.

The OSP hardware is comprised of:1) a Processing and Communication Device (‘‘PCD’’),

which is a small wearable box containing a smart-phone chipset performing all the signal processing andwireless communication functions, plus the battery andsupporting hardware

2) ‘‘hearing aid’’-style audio transducer devices inbehind-the-ear receiver-in-canal (‘‘BTE-RIC’’) formfactor, which connect to the PCD via a 4-wire cable.They support 4 microphones and one receiver (loud-speaker) per ear, plus an accelerometer/gyroscope(IMU) for measuring look direction and researchingmobility disorders

3) an optional set of active biopotential electrodes foracquiring EEG or other electrophysiological signals,daisy-chained together and connected to acquisitionhardware on the PCDvia another 4-wire cable (togethercalled ‘‘FM-ExG’’)

The OSP software components include:1) Firmware for FPGAs in the PCD and BTE-RICs2) An embedded Linux distribution running on the CPU

within the PCD, including kernel modifications andcustom drivers for the BTE-RICs

3) The OSP real-time master hearing aid (RT-MHA),which is a library of signal processing modules anda reference C++ program that performs basic andadvanced HA signal processing in real time

4) The Embedded Web Server (EWS), which:a) hosts a WiFi hotspot on the PCDb) serves web apps to any browser-enabled device

which connects to it, such as the user’s smart-phone

c) controls the RT-MHA parameters live based onuser actions in the web apps

Taken together, OSP is a powerful research tool, in whichall aspects of the assisted hearing experience—from the ear-level hardware to the signal processing algorithms to theway the user interacts with and controls their device—maybe customized and used for research in the lab and in thefield. The target audience of OSP is not just audiologists andspeech DSP engineers, but also researchers in neuroscience,healthy aging, human-computer interaction, networking andedge/cloud processing, wearable electronics, and many otherdisciplines. Because OSP is open-source—all the softwareand hardware design files are released on our website[9]—researchers may modify and enhance whatever part ofthe system is relevant to their work, while leveraging pastcontributions made by other researchers.

Our development of OSP has resulted in novel develop-ments in embedded systems design [10], portable electro-physiology [11], [12], adaptive filtering [13], [14], and otherareas not yet published. Yet, the primary novelty of OSP—and its primary value to the community—is in its systemdesign as a whole, and the capabilities it offers to researchersand users as a result of this design. As such, this paperdescribes the engineering design of all portions of the OSP

162084 VOLUME 7, 2019


platform,with an emphasis on how the design choices provideuseful and advanced functionality. In particular, we focuson aspects of the hardware that have not been reported onin previous publications, and we provide updates on con-tinued development of other parts of OSP. Sec. II discussesthe PCD, the software from FPGA through kernel level,the BTE-RICs, and other included sensors. Sec. III covers theFM-ExG. Sec. IV reviews the RT-MHA and discusses newacademic research on adaptive filters which has already beenenabled by OSP. Sec. V describes the software architectureof the embedded web server (EWS) and the current set ofprovided web apps for audiologist and user engagement.Finally, Sec. VI gives objective performance results for thehardware and software, showing its capacity for real-time,low-latency audio processing, the quality of the recordedelectrophysiological signals, and the platform’s usability formultidisciplinary clinical research.

A. RELATED WORKOSP intersects most aspects of the vast field of researchon hearing healthcare. Thus, we will restrict our discussionin this section to systems for hearing research that performreal-time audio processing and have a portable or wearablecomponent, as this is what OSP is at its core. The five majorHA manufacturers each have their own proprietary systemsof this kind, which they use for research on new clinical andtechnical challenges as they develop their advanced digitalHAs. However, these systems are difficult to access for theresearch community at large, and difficult to modify and toobtain generalizable results from as discussed above.

As of 2014, no non-proprietaryHA research system existedwhich met the needs of the HA research community, accord-ing to the aforementioned NIH workshop on this topic: ‘‘TheNIDCD-supported research community has a critical needfor an open, extensible, and portable device that supportsacoustic signal processing in real time’’ [8]. As a result,in 2016 the NIH awarded six grants for development of open-source hearing aid research tools [15], [16]. Of these six,four—including OSP—are complete master hearing aid toolsfor research. The other three of these tools are:

1) TYMPAN[17] includes a wearable processing unit based on ArduinoTeensy [18] and a basic software library for HA processing.The strengths of this platform include flexibility with thetransducers (the unit simply features standard 1/8’’ jacks)and battery (the user selects their own portable battery pack),low cost and use of readily available components, small size,easy development for beginners with the Arduino platform,and fast time-to-market. Its disadvantages include low audioquality, severely limited processing power, and support foronly one input channel (microphone).

2) OPEN-MHA[19] features an audio expansion board for BeagleBoneBlack, a Linux-based OS, and an extensive real-time and

offline HA software suite. The advantages of this platforminclude good-quality audio, support for six-channel input,the well-documented nature of both BeagleBone Black andLinux, and the powerful master hearing aid DSP algorithms.Its downsides include somewhat limited processing power,the fact that its form factor is portable but not wearable,and the lack of ear-level transducers for users in the field.However, the open-source nature of these platforms allowsthe strengths of each to be combined: for instance, the Open-MHA DSP algorithms could in the future be ported to OSPhardware.

3) UT DALLAS PROJECT[20] is comprised of a cross-platform smartphone app forprocessing and commercial Bluetooth-enabled hearing aidtransceivers. The advantages of this platform include itsadvanced speech enhancement algorithms, the completeabsence of special-purpose hardware, the accessibility ofsmartphone development, and the use of industry-standardear-level transducers (which are proven designs and ulti-mately the target hardware). Its weaknesses include itsinability to process audio in real time (defined as a totalmicrophone-to-loudspeaker delay of less than 10 ms whileHA processing is occurring), the proprietary nature of the ear-level transducers, and the semi- or fully-closed smartphoneoperating systems and driver stack which make it difficult toguarantee performance.

II. WEARABLE HARDWAREA. FORM FACTORAs reported in [21], the software portions of OSP werefirst implemented on a laptop, with a studio audio interfaceand custom analog hardware for interfacing and the ear-level transducers. The OSP RT-MHA can still run on anyMac or Linux computer using any audio hardware supportedby the respective OS. However, the potential of OSP is muchmore fully realized in its new wearable form factor which weinitially discussed in [10].

As discussed in the Introduction, the battery size, availableprocessing power, and communication abilities in commer-cial HAs are severely limited by the behind-the-ear or in-earform factor they typically are available in. These factors inturn contribute to the cost and the difficulty of development(e.g. fixed-point embedded processors). For a research plat-form, we need much higher processing power, substantiallyimproved wireless communication, relatively low cost, andeasy development. These factors are much more importantthan the entire system fitting behind the ear, so we compro-mised on the form factor: we created a design which is stilleasily wearable but which is not limited to the space aroundthe ear (Fig. 1). The processing, wireless communication,and battery for the OSP wearable system are housed in theProcessing and Communication Device (PCD), which is asmall box that may be worn around the neck or on a belt. ThePCD is attached by wires to the BTE-RICs, which contain

VOLUME 7, 2019 162085


FIGURE 1. A user wearing the OSP wearable platform. The two hardwarecomponents shown are the behind-the-ear receiver-in-canal (BTE-RIC)transducers and the Processing and Communication Device (PCD).

the audio transducers, codecs and interface hardware, andother sensors. Since the PCD processes the audio from bothears, it can use beamforming and other algorithms to takeadvantage of binaural information in the audio, somethingBTE or in-ear HAs would have to use wireless transmissionto achieve. The aforementionedNIHworkshop suggested thatthe form factor of BTE-RICswired to a processing unit wouldbe appropriate for a research system [8].

B. CHOICE OF EMBEDDED PLATFORMSmartphone chipsets provide best-in-class computationalperformance per watt, diverse peripherals, and advancedwireless connectivity, so they are a natural choice for theembedded platform in the OSP wearable design. However,many smartphone chipsets are difficult to work with, due tothe high degree of proprietary technology in modern smart-phones. Furthermore, embedded systems development forhard-real-time, low-latency applications is typically done at avery low level. Low-level audio processing would be contraryto the goals of extensibility and controllability of OSP, butlow latency and stability are still mandatory. Thus, the designtask was primarily to (1) select a platform which is capableof high-level real-time processing and has all the necessaryfeatures, and then (2) adapt its hardware and software to theneeds of OSP.

We selected the single-board computer system Drag-onBoard 410c from Arrow, based on the Snapdragon 410cchipset from Qualcomm. Because of the hobbyist-orientednature of this product—it is intended to compete withplatforms like Raspberry Pi and BeagleBone Black—alarge support network for this chipset exists, including awell-maintained Debian branch. Moreover, several compa-nies supply systems-on-module (SoMs) featuring the samechipset, which allow developers to move to an application-specific design without having to design a PCB hosting

FIGURE 2. Block diagram of the OSP PCD (Processing andCommunication Device).

a complex modern system-on-chip (SoC), while maintain-ing software compatibility and most hardware compatibilitywith the DragonBoard. We chose the DART-SD410 fromVariscite [22] as our SoM because it breaks out all themultichannel inter-IC sound (MI2S) peripheral lines from theSoC, unlike the DragonBoard and most other SoMs.

The Snapdragon 410c SoC (APQ8016) has four 64-bitARM A53 cores at 1.2 GHz, plus DSP and GPU. Not onlydoes a multicore CPU provide more processing power thana single-core CPU, it allows us to assign real-time portionsof the HA processing to dedicated cores where they will notbe interrupted, while the OS and EWS run on a different core(see Sec. II-D). Key SoC peripherals include two multichan-nel inter-IC sound (MI2S) ports for audio I/O to the behind-the-ear receiver-in-canal (BTE-RIC) transducers; several SPIports for peripheral control and communication; a microSDcard for data logging; and a UART for the Linux terminalinterface. Crucially, the MI2S ports are directly connectedto the CPU, unlike in many smartphone chipsets where theyare connected to the DSP. The latter would require at leastsome processing to be done on the DSP, which would sub-stantially complicate the development process compared torunning ordinary usermode code on the CPU, or add the addi-tional latency of transfers in each direction. The associatedpower management IC, PM8916, includes a separate lower-performance codecwhich is used to provide twomicrophoneson the PCD. The SoC and associated wireless chips provide2.4 GHzWiFi, Bluetooth, and GPS. Paired with the industry-standard networking software available for Linux, the WiFican act as an access point and the system can serve web pagesto clients which connect to it (Sec. V).

We designed a carrier board to host the SoM (Fig. 3). Thisboard also includes power supplies, the FPGA (Sec. II-F),the other interface hardware and ports for the BTE-RICs andthe FM-ExG, the microSD card slot and USB ports, andother basic system features. Adjacent to the carrier board isa 2000 mAh smartphone-type Li-Ion battery, which can becharged from a microUSB port or swapped out by the user.The carrier board, battery, and WiFi antenna are enclosedin a plastic case (Fig. 4) to form the PCD, which may be

162086 VOLUME 7, 2019


FIGURE 3. Components of the OSP PCD (Processing and CommunicationDevice): the carrier board hosting the Snapdragon 410c SoM.

FIGURE 4. The OSP PCD disassembled, showing the battery, the back ofthe carrier board, and the plastic shell.

worn around the neck or on a belt. The PCD is roughly73 × 55 × 20 mm and has a mass of roughly 83 grams,representing a savings of 67% in weight and 72% in volumeover the previous ‘‘portable’’ OSP hardware design [23].

C. ADAPTING SMARTPHONE SOC AUDIO HARDWAREAs discussed above, the 410c platform was chosen for itspower efficiency, high performance, wireless capabilities,and product ecosystem. However, the audio subsystem in theSnapdragon 410c was designed to support the needs of lowcost smartphones; it was neither designed nor documentedfor general-purpose use. Our needs for audio I/O to theBTE-RICs in the HA application are substantially differentfrom those of the smartphone applications the SoC’s audiosubsystem was designed for. Nevertheless, we were able toadapt this subsystem to the needs of OSP through a com-bination of reverse engineering and analysis of its partialdocumentation. Although some of the implementation detailsdiscussed here are specific to the 410c SoC, many of themwould apply to a variety of single board computers and SoMsbased on ARM processors running Linux. The OSP software

comprising the RT-MHA and EWS is hardware-agnostic,and can run on Linux and OS X systems in addition to theembedded systems mentioned above.

Specifically, each BTE-RIC has one MI2S data line formicrophone data and one for speaker data. The same speakerdata line can be sent to both BTE-RICs, with the left andright receiver signals in the left and right time-division slotsrespectively. However, each of the two microphone data linesmust be received by the SoC on separate MI2S data inputpins, since they each already contain two mics’ data. Thismeans a total of two MI2S data input lines and one MI2Sdata output line are needed. Due to the design of the MI2Speripheral units in the SoC and the undocumentedmultiplexerblock which connects them to the SoC’s I/O pins, the onlyconfiguration which provides two MI2S data input lines isusing two data lines of oneMI2S unit in ‘‘receive’’ mode, andusing a different unit in ‘‘transmit’’ mode for the data outputline. Unfortunately, the Advanced Linux Sound Architecture(ALSA) kernel subsystem assumes that each codec has aunique data (I2S) port; in our case, two MI2S ports are beingshared by two codecs. So, we had to build a custom ALSAdriver for the BTE-RICs which registers two ‘‘virtual’’ audiodevices—one for mics, and one for speakers—connected tothe respective MI2S peripherals. Each virtual device has itsown ‘‘memory map’’ with registers controlling the appropri-ate functions; writes to and reads from these registers arererouted in the driver to both codecs’ SPI control ports asnecessary. The result is that usermode software sees twodevices, onewith only audio inputs and onewith only outputs,both of which function on their own or simultaneously.

D. EMBEDDED OPERATING SYSTEMThe embedded operating system used in OSP is based on theDebian 9 (‘‘stretch’’) distribution of Linux for Snapdragon410c (ARM64) provided by Linaro. Besides the custom audiodriver mentioned above, we have tailor-built the kernel andconfigured the environment to meet the following goals:

1) Stable real-time performance2) Low power consumption3) Fast bootup4) Small memory footprint5) Security

These goals will be referenced by number in the followingparagraphs.

1) KERNELThe kernel is configured with all core facilities and mostdrivers as built-in. Building as much code into the kernelbinary rather than modules improves the bootup time (3).In the current configuration, there are a few drivers thatremain modular due to the fact the driver cannot initializeuntil after the firmware is initialized and the device is pow-ered up. Future work by our team and the community willbe needed to modify these drivers to enable them to compileas built-in, which will allow module loading/unloading to

VOLUME 7, 2019 162087


be completely disabled. No dynamic loading of kernel-modecode is a desire for security (5) since the kernel cannot bemodified as easily at run-time which will help thwart specificthreats, e.g., rootkits.

Any kernel facility that is not used by the system or driverthat does not have the hardware present is not included. Thisoptimization helps to achieve both (3) and (4) along with theadditional benefit of decreasing build time of the kernel. Shortbuild time is not a design goal but is a desirable metric that iscrucial in reducing development and test time. Furthermore,we hope to show that by removing additional kernel features,the security posture of the system improves (5) by removingany attack vectors associated with those features.

To address (1), the PREEMPT_RT option has been enabled,which enables the kernel to become preemptable and shortensthe critical sections within kernel code.

2) ENVIRONMENTsystemd has replaced the old sysvinit style init processthat becomes PID 1when the kernel finishes its boot process,and handles the remaining portions of system boot. systemdis configured to only run on CPU core 0 through a con-figuration setting in /etc/systemd/system.conf. As aresult of this configuration the init process and subsequentprocesses spawned by it will only run on core 0. This allowsCPU cores 1-3 to be reserved for all real-time processing (1);user code handles their assignment to these cores when theyare executed. Similarly, all interrupt handling is bound tocore 0 to avoid interrupting the real-time processes runningon cores 1-3.

Unnecessary and unused services are disabled to reducepower consumption (2) and enhance system security (5). TheBluetooth radio is also disabled by default for the same tworeasons but can be enabled by a user if so desired. As seenin Table 3 below in Sec. VI, the idle power consumption ismore than half of the total power consumption during fulloperation, so it is extremely important to eliminate unneces-sary power sinks to improve battery life.

The system configures the WiFi interface as a hotspotafter boot to allow for remote connectivity to the PCD, forthe embedded web server (EWS) and for SSH for devel-opment. In conjunction with the hotspot, multicast DNS-Service Discovery (mDNS-SD, a.k.a Bonjour) is enabled andconfigured to allow a user connected to the hotspot to easilyaccess the EWS or SSH into the board using the hostnameospboard.local, without needing to know the IP addressof the board. As a fallback for systems that do not supportmDNS, e.g. Android, the IP address of the board is alwaysthe same when connected through the hotspot.

E. HIGH-PERFORMANCE BTE-RICSAlong with the PCD, the other key hardware componentof OSP is novel ear-level transducers in a behind-the-ear

FIGURE 5. The OSP BTE-RICs, together and disassembled.

receiver-in-canal1 (BTE-RIC) form factor (Fig. 5). Theseunits are each connected to the PCD via a four-wire cable,and serve as the primary input and output for the system. Theyare composed of a rigid PCB for the electronics, a flex PCBfor the microphones, a custom 3D-printed plastic shell, and arugged 3D-printed strain relief [9].

Unlike in previous versions, the communication betweenthe BTE-RICs and the PCD is digital—the codecs are withinthe BTE-RICs. The low-level digital interface is transparentlyfacilitated by FPGAs in both the BTE-RICs and the PCD(Sec. II-F). The decision to have digital communication withthe BTE-RICs was made for several reasons. First, analogcommunication with the BTE-RICs would require at leastsix wires—a differential pair each for the microphone andreceiver, plus power and ground—plus even more wires formultiple microphones per ear. As discussed below, multipleaudio inputs per ear is crucial for expansion of the hearing-related research OSP supports. Second, having the codecphysically close to the transducers reduces the opportunityfor noise and interference. Finally, the digital interface allowsfor additional sensors at the ear—starting with the IMU(Sec. II-G.1)—without the need for any additional wires,thanks to the FPGAs.

The codec in each BTE-RIC is the high-performance butconsumer-grade Analog Devices ADAU1372 [24], whichprovides a differential headphone driver for the receiver andfour analog inputs per ear. By default, these are a front micro-phone, a rear microphone, an in-ear microphone, and a voicepick-up (VPU) transducer (Fig. 6); while the former two arecommon on hearing aids, the latter two are for specializedpurposes, and are explained below. The I2S standard onlysupports two channels of audio per data line, so currentlyonly two of these four inputs may be transmitted to the PCDat a time. However, the application may select via ALSAcommands which two inputs these are, and future workwill enable simultaneous capture of all four microphones(Sec. II-F). All inputs and outputs are sampled at 48 kHz24 bit; the codec also supports 96 kHz sampling, which

1The output transducer, i.e. the loudspeaker, is called the ‘‘receiver’’ in thetelephony and HA communities. This is typically a small speaker in a long,slender package that is in or just outside the ear canal.

162088 VOLUME 7, 2019


FIGURE 6. Block diagram of the OSP BTE-RICs.

will be supported by a future version of OSP for improvedaccuracy in beamforming.

Several types of audiological studies require measurementof the sound within the ear canal while a hearing assisteddevice is being worn. Purposes include calibration of theacoustics, Real Ear InsertionGain (REIG)measurements dur-ing HA fitting [25], compensation for occlusion effects [26],and studying otoacoustic emissions [27], [28]. Typically, thismeasurement is performed with a probe placed into the earcanal as theHA is inserted; unfortunately, thismethod is time-consuming and precise positioning of the probe can be diffi-cult [25]. To facilitate such studies, the BTE-RICs support aspecial receiver in development at Sonion [29] which has amicrophone in the same package, facing into the ear canal.This allows the sound within the ear canal to be measuredand monitored as a normal part of work with the platform—including in the field, which would normally be prohibitivelydifficult. The current design uses a CS44 connector for thereceivers, with a pinout that is compatible with a variety ofregular receivers as well as with the embedded-mic receiver,thus not increasing costs for users who do not need thisfeature.

A VPU (voice pick-up) is a bone conduction transducerthat picks up the user’s voice, while being highly immune tobackground noise (40-50 dB loss to ambient sound comparedto conducted sound [30]). When mounted to a device whichis in robust contact with the head, such as an in-ear hearingassisted device, it picks up the vibrations of the skull—thatis, the user’s voice—without any outside sound. While boneconduction microphones have made impressive advances,they still have reduced frequency range compared to airmicrophones, and their response to vibration is noticeablynonlinear. Thus, the VPU in this system effectively provides ameasurement of the user’s voice which is somewhat distortedbut almost completely free of interference. This signal canbe useful in several ways. First, adaptive systems such asbeamforming and speech enhancement rely on accurate esti-mates of when the user is speaking (speech presence probabil-ity or SPP) in order to estimate the interfering noise. TheVPUsignal can provide an improved estimate of the SPP, so thatthe adaptation can be temporarily disabled while the user is

speaking [31]. Second, other algorithms can be developed toimprove the experience of listening to one’s own voice, whichis known to be adversely affected by HAs [26]. These mayinclude reducing the gain while the user is speaking, DSPapproaches to correct for the presence of the HA in the canal,etc. Finally, algorithms—especially ones involving deep neu-ral networks—can be developed to reconstruct an improvedsignal of the user’s speech from the VPU signal [32], forpurposes like telephony or virtual meeting settings.

In addition to the codec and audio hardware, eachBTE-RIC also provides an inertial measurement unit (IMU),which is discussed in Sec. II-G.1; separate analog and dig-ital power supplies for additional noise suppression; and anFPGA, which is discussed next.

F. CUSTOM DIGITAL INTERFACEBoth the PCD and each BTE-RIC contain an FPGA (LatticeMachXO3 series [33]). As discussed below, the form factor ofthe BTE-RICs containing the codecs with processing in thePCD would not be feasible without the FPGAs. Once theywere present, they enabled additional features, including theFM-ExG (Sec. III), so they have become a key component ofthe platform.

The original need for the FPGA came from the observationthat the communication between the BTE-RICs and the PCDwould require a large number of signal wires: bit clock, wordclock, microphone data, and receiver data for I2S, and at leasttwo lines for control signals to and from the codec and IMU(clock and data of I2C). Combined with power and groundwires, the cable to the BTE-RICs would have to have eightconductors. On top of this, neither I2S nor I2C are designedfor transmission over wires of any significant length; whilethey would be likely to work in controlled conditions inthe lab, they might not be robust in varying electromagneticenvironments in the field. So, we decided to add an FPGA ateach end, and transmit all the signals with a custom protocolover a single bidirectional twisted pair, reducing the numberof conductors in the cable to four. The physical layer chosenis bus low-voltage differential signaling (BLVDS) [34], [35],a bidirectional version of the popular LVDS standard [36]used in many modern serial interfaces such as USB, SATA,and PCI Express. This interface uses standard CMOS driversto transmit and analog differential amplifiers to receive; theFPGAs support this interface natively, only needing a fewexternal resistors at each end to match the impedance ofthe cable. Because the signal transmitted is differential, it isnearly immune to common-mode noise and interference; andsince the cable is shielded and the conductors are twisted,there is very little opportunity for differential interference.As a result, this interface is perfect for high-speed commu-nication over the roughly 1 m cable between the BTE-RICand the PCD.

We created a custom communication protocol overBLVDS, designed to allow the SoC to transparently interactwithin the codec and IMU within each BTE-RIC (Fig. 7).There are three categories of signals which are multiplexed

VOLUME 7, 2019 162089


FIGURE 7. BLVDS protocol for communication between the PCD andBTE-RICs. 8 bits of I2S audio data are transmitted in each direction, plus anumber of control signals, during the same time as 8 I2S bit clocks occur.

and packetized for transmission over LVDS: high-speed data,low-speed control, and clock. The microphone and receiverI2S data is the high-speed data; this is transmitted 8 bits ata time in each direction within each communication packet.The SPI control lines for the codec and IMUare the low-speedsignals; the states of these signals are transmitted once perpacket. Finally, the protocol allows the I2S bit clock in theBTE-RIC to be synchronized with that in the PCD, to correctfor drift between the oscillators in the two devices. The FPGAin the BTE-RIC adjusts the sub-cycle timing of its I2S bitclock to match a known rising edge in the data stream fromthe PCD. Since the BTE-RIC sends back its own rising edgein its half of the packet, each FPGA can determine if theother is connected and properly responding, which allows fordeterministic behavior at startup or any time communicationis interrupted.

In future work, the FPGAs will also enable simultaneouscapture from all four microphones on each BTE-RIC. Thecodec supports an extension to I2S called TDMwhich allowsfor four channels per data line. The SoC’s I2S subsystem doesnot support this, but it does support two channels at twice thesample rate, which is the same data bitrate. For this mode,the FPGA in the PCD will send a ‘‘fake’’ word clock signalto the SoC which matches its expectations and ‘‘trick’’ it intoaccepting the data. The FPGA will also annotate the channelnumbers in the lower, unused bits of the audio data—eachsample is 32 bits but the ADC is only 24 bit—so that theapplication can distinguish them.

G. SIMULTANEOUS ANCILLARY SENSORSAs described below, the OSP hardware platform currentlysupports three additional types of sensing capabilities, nottraditionally associated with hearing aids research. SinceOSP is designed to be a tool for new kinds of researchbeyond what is currently possible, these sensors may be usedin conjunction with the audio transducers for new work infields related to hearing, or on their own with OSP actingas a wearable acquisition and processing system. Further-more, OSP can serve as a baseline open-source wearablehardware design, which can be modified by researcherswho would like to add their own sensors for investigationsinto lifestyle, healthy aging, and many other health-relatedfields.

1) IMUS IN BTE-RICs AND PCDBoth the BTE-RICs and the PCD contain a BoschBMI160 inertial measurement unit (IMU), which is a three-axis accelerometer plus three-axis gyroscope. The gyro-scope data from the BTE-RICs provides reasonably accurateinformation about changes in head orientation. Assumingthat target sound sources and interferers move much moreslowly or rarely than the user’s head, this allows changes inthe user’s look direction to be corrected for in algorithmswhich model the spatial positions of audio sources such asbeamforming-based source separation or noise suppression.This has the potential to dramatically improve their conver-gence speed and reduce their error rate, providing a better userexperience.

In addition, there is another related healthcare applicationfor the IMU data. Ability to maintain mobility—broadlydefined as movement within one’s environment—is an essen-tial component of healthy aging, because it underlies many ofthe functions necessary for independence [37], [38]. In thiscontext, gait disturbances are usually due to a combination ofdecreased physiological reserves and increased multisystemdysfunction [39]. The IMUs allow researchers to assess gaitspeed and monitor for unexplained gait disturbances duringactivities of daily living. Physical activity monitoring soft-ware could be developed to run in parallel with the hear-ing aid software and provide appropriate feedback to theuser or researchers.

2) GPSThe SoM includes the radio hardware to support GPS-basedlocation acquisition. Future work will focus on enabling GPSin software and acquiring useful data from it without disrupt-ing real-time audio processing or consuming toomuch power.

3) FM-EXG HARDWARE IN PCDThe PCD’s carrier board also includes a hardware subsystemfor simultaneous biopotential acquisition. This consists of afast-sampling ADC controlled by the on-board FPGA, whichrelays the data to the Snapdragon SoC via SPI. This systemis discussed in the section below.

III. SIMULTANEOUS MULTICHANNEL BIOPOTENTIALSIGNAL ACQUISITIONA. BACKGROUNDAcquisition and processing of biopotential or elecrophysio-logical signals—which we call ‘‘ExG’’, for EEG (electroen-cephalography), ECG/EKG (electrocardiography), EMG(electromyography), etc.—is a major field of study in emerg-ing healthcare research. Simultaneous EEG and HA audioprocessing is of particular interest in pre-lingual pediatrichearing loss management, as it could assist clinicians infitting hearing aids to infants who are unable to self-reportthe efficacy of their hearing aid prescription, leading to adramatic improvement in their quality of life [40]. Further-more, in the future the process of HA tuning could be done

162090 VOLUME 7, 2019


adaptively via machine learning systems, which wouldmonitor the experience of the user as measured by their EEGpatterns. Unfortunately, EEG typically requires many elec-trodes with an independent wire for each, making acquisitionsystems large, expensive, and difficult to use especially inpediatric applications. While devices capable of concurrenthearing aid tuning and EEG do exist [41], to our knowl-edge no wearable or easily-portable devices of this kind areavailable to the research community. Other applications ofwearable biopotential acquisition systems include monitor-ing conditions of concern such as heart ailments (ECG),muscle degeneration (EMG), or the progression of neuro-logical disorders (EEG) such as Alzheimer’s disease andParkinson’s [42]. In addition, there is emerging evidence thatneurofeedback from EEG can be helpful as an interventionin many disease conditions [43] including epilepsy [44] andADHD [45].

B. SYSTEM DESIGNOSP incorporates a wearable biopotential acquisition system,which can run alongside the HA processing, and which onlyrequires one small four-wire cable from the electrodes to thePCD. The design of this system is based upon the distributedFM-ADC architecture in [12]. The active electrodes featurehigh input dynamic range of around 100dB and no input gainstage. This allows them to support wet or dry electrodes, andthey can be used for ECG, EMG, and EEG simply by chang-ing the position of the electrodes on the body. In each activeelectrode, the biopotential signal at baseband is bandwidth-expanded into a frequency-modulated (FM) band centered ata unique carrier frequency. This upconversion is performedin an application-specific integrated circuit (ASIC) and theresultant FM signals are all driven onto a single signal wire,each FM signal occupying a distinct area of spectrum forfrequency domain multiplexing (FDM). The electrodes aredaisy-chained in any order and connected to the PCD via a4-wire cable (the remaining three wires being power, ground,and a reference voltage). The aggregate signal content of thesingle composite FM-FDM wire is sampled by an analog-to-digital converter (ADC) in the PCD. The data can thenbe streamed using WiFi for off-body processing or pro-cessed locally in multi-modal signal processing applications.In either case, after demodulation, the original biopotentialsignals can be recovered.

The benefits of such a biopotential acquisition systemstrategy include: power efficiency intrinsic to the distributedFM-ADC architecture, ruggedization against inertial motionartifacts, reduced system weight due to reduced wiring bur-den, and frequency up-conversion which eliminates basebandcoupling artifacts in the signal wire. Its high input dynamicrange ensures that the acquisition hardware does not satu-rate and lose signal for large motion artifacts at the input;combined with the IMUs in the BTE-RICs, OSP could inthe future support IMU-based motion artifact removal asdemonstrated in [46].

As presented in [11], the FM modulation allows for anincreased effective signal-to-noise ratio (SNRFM) comparedto the SNR of the ADC at the carrier frequency, called carrier-to-noise ratio (CNR). The overall SNR of the system dependson the bandwidth expansion ratio D [47] as follows:

SNRFM = 10 log10(32D2)+ CNR

The CNR of an ideal 12-bit ADC (i.e. 12 effective num-ber of bits or ENoB) is 72 dB, so we chose D = 20 togive a theoretical 28 + 72 = 100 dB SNR for each FMchannel. Assuming EEG signals have a maximum frequencyof 500 Hz,D = 20 leads to a 10 kHz FM frequency deviation.The actual FM bandwidth may be computed two differentways: by the empirical Carson’s Rule, giving 2 × (10kHz +500Hz) = 21 kHz, or by including all side tones withgreater than 1% of the unmodulated carrier amplitude, giving3.2 × 10kHz = 32 kHz [47]. Based on these two estimatesof the bandwidth and the desire for ≈ 10 kHz guard bandsbetween channels, we space the channels 40 kHz apart. Witha sampling frequency of 1 MHz, 12 ExG channels can besupported.

An overview of the hardware included on the PDC to real-ize this is shown in Fig. 8. The Analog Devices AD9235 [48]was chosen for its parallel interface, 12-bit resolution, andsupported sampling rates up to 60 MHz. The ADC is clockedby the FPGA with a 1.024 MHz clock signal generated bydividing the 12.288 MHz clock from the MEMS oscillatordriving the I2S by 12. The ADC’s parallel data interfaceconnects to the FPGA, which contains a simple FIFO queueto store the samples until they are ready to be retrieved bythe SoC via SPI. A level-based signal is sent to the SoCwhen more than 1024 samples (1 ms of data) are available;the SoC polls this signal and then performs an SPI transferof 1536 bytes, which covers the 1024 12-bit samples. Sincethe SPI clock runs at 50 MHz—which could theoretically

FIGURE 8. Block diagram of FM-ExG hardware in the OSP PCD. The FPGAconverts between parallel and SPI data formats and stores samples in aFIFO queue for batched access by the SoC. Note that the FM sample clockis derived from the same MEMS oscillator as the I2S audio is, so the ExGand audio streams remain permanently synchronized.

VOLUME 7, 2019 162091


FIGURE 9. RT-MHA software block diagram with signal flows. Audio I/O operates at 48 kHz and all HA processing is carried out at 32 kHz. Thebaseline HA functions provided include adaptive beamforming (BF), subband decomposition, speech enhancement (SE), wide dynamic rangecompression (WDRC), and adaptive feedback cancellation (AFC). See Fig. 10 for an enlarged picture of the beamforming block.

transfer 6250 bytes per ms if the clock ran continuously—there is sufficient timing slack for transfers to be stable.

When FM-ExG streaming is running, CPU core 3 is ded-icated to the FM-ExG thread. It runs at the highest real-timepriority and is the only thread permitted to run on this core.It polls the ‘‘data ready’’ signal described above, performsthe SPI transfers, and executes a callback to user code foreach 1 ms (1024 samples) of data received. Any process-ing or transmission of the data for any research applicationwould occur during this callback. We created two programswhich implement this callback to collect results as describedin Sec. VI-C: one which measures the time between risingedges of a pulse wave for the sync measurements, and onewhich saves 10 seconds of data to RAM and then to disk.In the latter case, we performed digital demodulation offlineusing MATLAB.

C. FUTURE WORKOur first goal for future work with FM-ExG is to enabledemodulated data to be streamed via WiFi from the PCD.This will require creating a real-time implementation of thedemodulator, ensuring its performance is high enough torun in the callback without disrupting the data capture, andimplementing both the local and remote side of the WiFistreaming system. Once this is accomplished, we are excitedto begin exploring clinical uses of FM-ExG, particularly inpediatric hearing loss research.

IV. REAL-TIME MASTER HEARING AID (RT-MHA)A. BASELINE ALGORITHMSWe provide a full set of baseline implementations of com-mon HA algorithms in the RT-MHA, to facilitate basic HAresearch with the platform and to provide a reference imple-mentation for engineers to build from. An overview of theRT-MHA signal flow is shown in Fig. 9. These algorithms areessential components of any HA, and can be categorized into‘‘basic’’ and ‘‘advanced’’ functions. The basic HA functionsnecessary for amplification are:

1) Subband decomposition2) Wide dynamic range compression (WDRC)3) Adaptive feedback cancellation (AFC)

Many commercial HAs include advanced features to improvespeech perception in realistic situations such as in a noisyenvironment. The RT-MHA implements two advanced func-tions for improving conversation in noise: resume

4) Speech enhancement (SE)5) Microphone array processing (or beamforming)

In the following we briefly describe the role and our baselineimplementation of each of these five algorithms.

1) SUBBAND DECOMPOSITIONHearing loss is typically highly frequency dependent; it iscommon for loss to be worse at high frequencies, but losscurves vary widely among individuals. Hence, gain and otherprocessing must be applied differently at different frequen-cies, motivating the decomposition of the input signal intofrequency bands. In the RT-MHA, this decomposition isimplemented as a bank of 6 finite impulse response (FIR)filters, where the bandwidths and upper and lower cutofffrequencies of these filters are based on Kates’s MATLABmaster hearing aid implementation [49].

2) WDRCBoth healthy hearing and hearing loss are known to be non-linear in amplitude, with these nonlinearities varying overfrequency. Therefore, a gain control mechanism that enablesa frequency-dependent, nonlinear gain adjustment is neededfor modern HAs. This is carried out by the wide dynamicrange compressor (WDRC), which is one of the essentialbuilding blocks of a HA [50]. The WDRC amplifies softsounds while limiting the gain of loud sounds, with theaim of improving audibility without introducing discomfort.Typically, WDRC amplifies quiet sounds (40-50 dB SPL),attenuates loud sounds (85-100 dB SPL), and applies a vari-able gain for everything in between. The basicWDRC systemdescribed in [51] comprises an envelope detector for estimat-ing the input signal power and a compression rule to realizenonlineaer amplification based on the estimated power level.Primary control parameters of the basic WDRC system are:attack time (AT), release time (RT), compression ratio (CR),gain at 65 dB input (G65), and upper and lower kneepoints(Kup and Klow) [51]. The AT or RT is the time the envelope

162092 VOLUME 7, 2019


detector takes to recover the output signal level to its steadystate when a sudden rise or drop takes place in the inputsignal level, respectively. The amount of gain to apply willthen be determined based on a compression rule as a functionof the estimated input power level given by the envelopedetector. The CR, G65, AT, RT, Kup, and Klow are the controlparameters for characterizing the compression rule. In theRT-MHA, the above WDRC is implemented in a 6-channelsystem [51], where gain control is realized independently ineach subband, enabled by selecting different parameters tospecify the compression rule. The outputs of all the subbandsafter applying the WDRC are combined together to producethe HA output signal.

3) AFCFeedback due to acoustic coupling between the microphoneand receiver is a very well-known problem in HAs [51].There are many methods to alleviate this phenomenon [52].Among them, adaptive feedback cancellation (AFC) hasbecome the most common technique because of its abilityto track the variations in the acoustic feedback path andcancel the feedback signal accordingly. The AFC generatesan estimate of the feedback path using an adaptive finiteimpulse response (FIR) filter that continuously adjusts itsfilter coefficients to emulate the feedback path impulseresponse. Typically the AFC can provide 5-12 dB addedstable gain (ASG) [14] depending on the adaptive filteringalgorithms used. The RT-MHA implements the least meansquare (LMS) based algorithms and features the sparsitypromoting LMS (SLMS) [13] which is an advanced adaptivefiltering algorithm developed by the OSP team and discussedbelow (Sec. IV-B).

4) SEIn a quiet environment, the above features of HAs are enoughto help the user better understand speech. However, in a noisyenvironment such as a cafeteria or a restaurant, the HA mightnot be able to improve conversations without any noise reduc-tion mechanism—for example, WDRC may amplify noisecomponents along with soft sounds. It is therefore essentialto have reliable and robust speech enhancement (SE) systemsimplemented in the HA. A baseline SE module, based on aversion of the SE systems investigated in [53], has been addedto the RT-MHA. The SE module performs denoising in thesubband domain, between the subband decomposition and theWDRC blocks.

5) MICROPHONE ARRAY PROCESSINGTo improve speech intelligibility in noisy environments,RT-MHA implements a baseline left/right two-microphoneadaptive beamforming (BF) system. This baseline systemdescribed in [54] realizes the generalized sidelobe canceller(GSC) implementation [55] of the linearly constrained mini-mum variance (LCMV) beamformer [56]. Fig. 10 depicts theBF block diagram. For the adaptation, an adaptive filter usingthe (modified) LMS [57] is used to continuously estimate

FIGURE 10. The two-microphone adaptive beamforming system in theRT-MHA. Adaptive filtering algorithms are utilized to generateinterference estimates based on the left and right channel inputs,which are used to enhance the target signal.

the interference signal components. In addition, adaptation-mode-control and norm-constrained adaptation schemes havealso been incorporated to improve robustness [58], i.e., tomit-igate misadjustment of the BF due to array misalignment,head movement and shadow effect, room reverberation, etc.Based on simulation with one target and one interferencespeech signal, the baseline 2-mic beamformer improves theSignal-to-Interference Ratio (SIR) from 1.6 dB to 15.8 dB,and the Hearing-Aid Speech Quality Index (HASQI) from0.21 to 0.43 over the system with only one microphone (i.e.,no beamformer). In informal subjective assessments, the lis-teners were given a web app for turning the beamformingon/off. All listeners reported a perceived reduction in theinterfering speech and background noise with beamformingenabled.

B. CASE STUDY: SLMSOne of the purposes of OSP is to provide a platform for aca-demic research in DSP with easy prototyping, high-qualityreal-time I/O, and a strong connection to the clinical researchcommunity. As an example of such research already per-formed with this platform, we briefly describe the sparsitypromoting LMS (SLMS) algorithm [13] used in several ofthe adaptive filters on the platform. The SLMS is an adaptivefiltering algorithm that takes advantage of the sparsity of theunderlying system response—which is present in many HADSP applications—for improved convergence behavior whenadapting the filter coefficients. In testing on early versionsof OSP, we have found the SLMS to be useful in the AFCand the adaptive beamforming subsystems. In the AFC, typ-ical feedback path impulse responses are (quasi) sparse innature, which means they contain many zero or near-zerocoefficients and few large ones. It has been shown in [13]that a proper p value of the SLMS parameter leads to aperformance improvement. We reported 5 dB improvementin added stable gain with a p of 1.5 for the SLMS over theconventional methods. For adaptive beamforming, the two-microphone GSC system of [54] also benefits from using theSLMS for the filter coefficient adaptation. We have foundthat improvement in signal-to-interference ratio (SIR) can be

VOLUME 7, 2019 162093


achieved for a p of 1.3 ∼ 1.5. For reference, p = 1 inSLMS results in the `1 norm similarly used in the well-knownproportionate normalized LMS (PNLMS) [59] and p = 2results in the `2 norm which yields the standard LMS.

V. EMBEDDED WEB SERVERMost commercial HAs provide smartphone apps for the userto control various aspects of their HA. Recent evidencesuggests that adults with hearing loss who have access tosmartphone-based tools feel more empowered, autonomous,and in control of their hearing loss [60]. While smartphoneapps hold much promise for both professionals and patients,a significant amount of research is needed in terms of assess-ment and guidance for informed, aware, and safe adoptionof such apps by the community [61]. In order to fulfill thevisions of the NIH workshop [8], we undertook developmentof multiple classes of such apps aimed at users (people withHL controlling their HAs), researchers (clinicians engaged inhearing healthcare research and translation), and engineers(those contributing to OSP and the open source initiative).

Most modern mobile-oriented applications fall into twocategories: native apps and web apps. Web apps would typ-ically require a remote server and guaranteed availability ofan Internet connection, and thus be unsuitable for a wearablesystem to be used in the field. However, due to the processingpower and wireless connectivity of the Snapdragon 410c SoCand the well-developed web software infrastructure on Linux,we are able to host a WiFi hotspot and a web server directlyon the PCD. Thus, any browser-enabled device (such as asmartphone or a tablet) can connect to the PCD without theneed for any external hardware or connection. As a result,the design decision of native apps versus web apps remained.Native apps can have better hardware integration and certainaspects of user experience, while web apps have the bene-fits that they do not require installation, they are operatingsystem and form-factor agnostic, and they are easier forprogrammers to modify and extend [62]. For these reasonsand especially due to the ability to rapidly prototype withweb apps, we adopted web apps and developed the EmbeddedWeb Server (EWS) subsystem of OSP to support them. Alltogether, the EWS comprises (i) a WiFi hotspot for browser-enabled devices to connect to, (ii) a web server running onthe PCD, (iii) bidirectional communication between the webserver and the RT-MHA for monitoring and control, and (iv)a suite of web apps hosted on the web server. Researchers cancustomize these apps to enable a broader range of research inhearing healthcare.

A. EWS ARCHITECTURE / SOFTWARE STACKThe EWS on OSP is implemented using the LAMP stack(Linux OS, Apache web server, MySQL database, and PHPscripting language) [63]. The web apps themselves are codedusing HTML, CSS, and JavaScript. We have chosen SQLiteas the database and a test server provided by the PHP frame-work as the web server. The choice of SQLite and PHPtest server were guided by the fact that they do not require

complex configuration steps like Apache and MySQL do.In addition, they are very lightweight from processing loadand memory footprint perspectives. In the context of realtimemonitoring and control of RT-MHA from a browser enableddevice, we have a limited number of connections and manyof the features of Apache and MySQL are not relevant.

The RT-MHA serializes OSP parameters between a binaryrepresentation in memory and a JSON string format forcommunication with EWS over a TCP/IP socket. All theRT-MHA parameter states are stored in the SQLite databasefor persistency and use by the web apps.

B. WEB APPSIn order to expedite web app development, OSP providesLaravel and Node.js frameworks. Web apps in OSP presenta graphical user interface (GUI) to the user via their device’sbrowser. Based on the user’s interactions with the GUI,the apps’ control logic may modify the RT-MHA parameters,play back audio to the user through the BTE-RICs, recordaudio from the microphones, store information in the SQLitedatabase or in logs, or take other actions. In this section,we describe the current suite of web apps, which showcasethe functionality of the EWS and OSP as a whole and whichserve as templates to be modified and extended for specificinvestigations.

1) RESEARCHER APPThe ‘‘Researcher App’’ is used to manipulate any ofthe exposed RT-MHA parameters. The main tab of thisapp includes all the WDRC parameters in each subband.Researchers can save different configurations in named filesand load them from the GUI. A Transmit button sets the RT-MHA to the parameters displayed in the GUI. The researchercan individually control the right ear channel or the left earchannel, or both at the same time. The Noise Managementtab has the parameters associated with noise managementalgorithms described in Secs. IV-A.4 and IV-A.5. It enablesresearchers to experiment with various parameters and pro-vide configurations such as aggressive,mild and no noise sup-pression in studies with human subjects. Similarly, the Feed-back Management tab allows the researcher to optimize AFCparameters for specific investigations. This app is suitablefor ‘‘audiologist fit’’ research by entering the user’s initialprescription from a fitting such as NAL-NL2, DSL, etc. [64]and then optimising the HA for user comfort. The researcherapp, like the other apps, requires a researcher ID and userID to access, allowing user profiles to be easily loaded forclinical studies in which one system is used sequentially bymultiple users.

2) SELF-FITTING APPSThere has been a lot of interest in self-fitting research,wherein the user is able to choose the HA parameters withthe help of apps. The recent passage of the Over-the-Counter(OTC) Hearing Aid Act of 2016 was aimed at easing thefinancial burden of owning HAs, at least for some users

162094 VOLUME 7, 2019


with mild to moderate hearing loss. The use of OTC HAswill require users to be able to independently control theHAs in multiple listening environments without professionalassistance.

We have implemented baseline web apps for two self-fitting paradigms. First, for the lab based OSP system [21],we initially implemented a native Android version of theGoldilocks explore-and-select self fitting protocol proposedin [65], [66]. For the wearable system [10] aimed at fieldstudies, we transitioned to web apps for the ease of rapidprototyping and ported Goldilocks as a web app.

Second, we created an AB app in which the user can switchbetween hearing an A or B set of RT-MHA parameters forthe same stimulus, and then select the one that they prefer.For the baseline implementation, the app performs a binarysearch over the overall gain parameter, allowing the user tonarrow in on the gain they most prefer. This is intended asa proof-of-concept for researchers to incorporate other HAparameters in their self-fitting research.

This AB app, like several others described below, relieson the audio file I/O module included in OSP. This module,under control of the EWS, can play audio files (typicallyspeech content) stored on the PCD to the user, with or withoutthe RT-MHA processing. This capability allows researchersto provide stimuli to the user in a repeatable and repro-ducible manner. The file I/O module can also record theraw or processedmicrophone audio to audio files on the PCD,as described below.

3) MONITORING USER AND ENVIRONMENT STATEWe have created an Ecological Momentary Assessment(EMA) web app, which is designed to help researchers under-stand more about the user’s actions in the context of anexperiment or a self-fitting adjustment. It does so by collect-ing information about the environmental state that elicitedthe user’s behavior along with the user’s behavior itself.The EMA web app has two components. First, it displaysa brief survey through the GUI which asks the user qual-itative questions about their experience and environment.Researchers can edit the survey questions by changing thecontents of the JSON file associated with the EMA webapp. Second, the app records microphone audio in order tocharacterize the user’s auditory environment. This works witha circular buffer that temporarily keeps the last few secondsof microphone audio. When the EMA is started, the previousbuffer is saved, and the audio continues to be saved whilethe user completes the survey and for a certain time afterleaving the app. In the future, the information gathered fromthe EMA web app could be used to create machine learningmodels that dynamically update their parameters dependingon environmental factors.

4) OUTCOMES ASSESSMENTThis class of apps is aimed at assessing the benefits to theuser of a proposed hearing loss intervention (such as a partic-ular fitting or an entire self-fitting paradigm). In these apps,

researchers define a series of questions in which the userhears pre-recorded sound stimuli (typically speech) and indi-cates their preference among them or attempts to distinguishbetween them. The stimuli are processed through specificHA parameter sets during playback, so they can be used toassess the effectiveness of these fitting parameters for theuser. The environment audio recording described above mayalso optionally be enabled in these apps.

In the 4-Alternative Forced Choice (4AFC) app (Fig. 11),each question has a playable prompt stimulus and four writtenwords, one of which matches the stimulus. The words arethemselves also playable, and any errors in the user’s choicescan inform the researcher about what improvements maybe needed in the user’s HA fitting. The app can easily bemodified to create N -alternative forced choice tests.

FIGURE 11. Screenshots of the main EWS page and the 4AFC task, takenfrom a smartphone connected to the WiFi hotspot of an OSP PCD. Afterpowering on the PCD and connecting the smartphone to the new‘‘ospboard-*’’ WiFi hotspot, the user simply enters ‘‘ospboard.local’’ or‘‘192.168.8.1:8000’’ into a web browser and receives the page on the left.Clicking on the ‘‘4AFC’’ button and logging in returns the page on theright, which is a fully-functional web app that interfaces with the RT-MHAstate in real time.

In the outcomes assessment AB app, the user hears twodifferent stimuli A and B, and rates their preference forB relative to A on a Likert scale. At the researcher’s option,A and B may be different audio files played through the sameset of RT-MHA parameters, or the same audio played throughdifferent parameter sets. In the latter case, the audio may befrom a file or it may be the live real-world sound from theuser’s environment.

Finally, in the ABX app, the user is presented with a targetstimulus X, and then two stimuli A and B where one isidentical to X and the other is typically very similar. Theuser selects the one they believe is identical; errors implythat the user could not hear the difference between A andB. This approach has strong discriminative power; its usesinclude optimizing signal processing (for example, whether

VOLUME 7, 2019 162095


the user can detect distortions introduced by approximatecomputations to save battery power), determining just notice-able differences between parameter settings, etc.

C. Web App CustomizationThe current suite of web apps are meant to function asbaseline, reference implementations for the development ofnew web apps. Some web apps can be reconfigured for newusers and new trials by the researchers without modifying thesoftware. For example, in the case of the outcomes assess-ment web apps (Sec. V-B.4), the researchers can specifythe contents of the questions which will be shown to theuser. In the case of 4AFC, for one question, the researcherneeds to specify the audio file for the prompt, as well as thetext and audio files of the four choices. The researcher canencode these choices by editing the text-based JSON file thataccompanies the app. The audio files themselves are stored ina specific hierarchical file structure, so that a researcher caneasily track which files are associated with which question,and have a consistent scheme to document the files referencedin the JSON file. Similarly, the AB and ABX web apps alsohave JSON files that are used to specify which sound filesshould be played for which question, which can also be editedwith a text editor.

It is also possible to combine aspects of different web appsto create new apps for novel investigations. This requiresfamiliarity with HTML, JavaScript, and PHP. When new HAparameters are exposed by the RT-MHA signal processing,they can also be easily integrated in the web apps with appro-priate changes to the HTML and JavaScript (for the modifiedGUI) and the PHP (for the HA parameter control logic).

VI. RESULTSInitial results about the performance of the wearable OSPsystem were reported in [10]. This section summarizes thoseresults and includes updated results based on the current inter-nal development versions of the OSP hardware and software(a version of which will become Release 2019b). In addition,the results relating to the FM-ExG are reported here for thefirst time.

A. HA PERFORMANCE1) LATENCYLatency plays an important role in users’ comfort with theirdevices [3], [4], and most commercial HAs have under 10 mslatency [67]. In the OSP wearable system, with the RT-MHA algorithms disabled and the software set to simplypass through an amplified copy of the front microphoneinput signal to each receiver, the microphone-to-loudspeakerlatency is about 2.4 ms. This delay is caused by input andoutput buffers of 1 ms each allocated by the audio subsystem(ALSA / PortAudio), plus additional delays due to resamplingfilters within the codec. With the RT-MHA enabled withoutbeamforming, it measures at about 4.6 ms, with this 2.2 msdifference being due to the FIR filters within the audio

processing. With beamforming enabled and set to 5 ms delay,the latency is measured to be 9.6 ms as expected. Thus, ourfull-featured baseline implementation meets the 10 ms targetmaximum latency for a HA system. While the latency dueto hardware and firmware (2.4 ms) is not user-adjustable,the latencies due to the steps of HA processing are determinedby the parameters of that processing (e.g. the length of theFIR filters), which will vary as researchers tweak the baselinealgorithms and implement their own. The latency ‘‘budget’’of 7.6 ms for all HA processing allows for a wide range ofexperimentation and research.

2) ANSI 3.22 TEST RESULTSANSI 3.22 [68] is a standard test protocol for HAs, the resultsof which are available for commercial HAs. We measured theOSP wearable system, as well as the previous OSP laptop-based system, with the Audioscan Verifit 2 test unit [69], forcomparison with four anonymous commercial HAs.

The OSP wearable system meets or exceeds the perfor-mance of the commercial HAs on most metrics. With thehigh-power (bandwidth-limited) receiver, it provides higherOSPL90 (loudness) with gain, bandwidth, noise, and distor-tion figures which are comparable to the best of the commer-cial HAs. With the high-bandwidth receiver, it has similarperformance with slightly reduced gain, but with higherdistortion. Reducing the gain from 35 to 25 dB (not shownin the table) did reduce the distortion to 1% or less in allbands. We believe this distortion is due to impedance differ-ences between the two receivers: the codec’s output voltageswing is limited by its 3.3V supply rail, which will lead todistortion at a lower power with the higher-impedance (high-bandwidth) [70] receiver than with the lower-impedance(high-power) [71] receiver. Future BTE-RIC designs couldadd a boost regulator and additional power amplifier toincrease the gain with the high-bandwidth receiver; neverthe-less, the OSP wearable system meets its performance goalswith the high-power receiver.

B. EMBEDDED SOFTWARE PERFORMANCE1) CPU USAGEEach audio channel (left and right ear) is processed by aseparate thread so most of the computation can be donesimultaneously on two CPU cores. Three of the four coresare assigned to the RT-MHA process at the OS level withthe remaining core left for all OS functions and other non-realtime processes. The RT-MHA process is also given max-imum CPU and I/O priority. The RT-MHA processes audioin 1 ms frames (48 samples), which means the system hasless than 1 ms of real time to complete the processing of eachframe. Thus, we report the real time required for each step ofthe RT-MHA processing.

As shown in Table 2, on average the processing completeswith some time to spare. In addition, while most of theprocessing is being done by two cores (one per ear), there arethree cores available for the RT-MHA as long as the FM-ExG

162096 VOLUME 7, 2019


TABLE 1. ANSI 3.22 test results for OSP system configurations asmeasured by Audioscan Verifit 2, as compared to results from fourcommercial HAs.

TABLE 2. RT-MHA real-time processing performance statistics for Release2018c and the current test version. Wall-clock time taken to perform eachprocessing step on 1 ms audio buffers.

is not in use. In this case, a substantial amount of additionalprocessing could be added on a third thread provided that itcould be done in parallel. Between 2018c and the currentversion, the subband and AFC filter lengths were reducedsomewhat to free up CPU and latency budget for the additionof beamforming. In addition, the ‘‘maximum time’’ values,which effectivelymeasure the algorithms’ stability in terms ofCPU usage, have dramatically reduced. This is partly due tostability and performance improvements in the OS, and partlydue to improved initialization in the RT-MHA. However, it isalso partly an artefact of measurement changes: in 2018c,we measured average and maximum times beginning as soonas the RT-MHA started, whereas now we begin timing mea-surements after the RT-MHA has initialized and run for abouta second. Thus, previously the ‘‘maximum times’’ includedinitialization, whereas now they only measure timing varia-tion in steady state.

2) BATTERY LIFEWe measured the current draw of the wearable system inseveral conditions, and computed the battery life from theseassuming a 2000 mAh battery:

TABLE 3. Current draw (at nominal 3.7 VDC) and battery life (assuming2000 mAh Li-ion battery) for common system use cases.

As seen in Table 3, both system and RT-MHA efficiencyhave been improved since last reported. Note that due tobattery capacity not being fully exhaustible and other factorsthat maymake the usable energy of a battery less than its ratedcapacity, actual usage times may be lower than reported here.Still, these results indicate that the system should provide atleast 4 hours of full-featured operation per charge.

C. FM-EXG PERFORMANCE1) SIMULTANEOUS ACQUISITIONSeveral metrics are important in characterizing the perfor-mance of simultaneous audio and FM-ExG capture on theOSP hardware/software platform. First is the stability of thesimultaneous capture—how frequently data is lost in eitherstream while the system is streaming them both. We testedthis by running simultaneous capture from both streams intoa simple utility which validated whether and when sampleswere lost. Over a 90-minute test, no samples were lost onFM-ExG (about 5.5 billion consecutive samples receivedcorrectly), and only one incident occurred where a few ms ofaudio samples were lost. Second is the long-term drift or rel-ative inaccuracy in the sample rate of both streams. This isguaranteed to be zero by design: both the 1.024 MHz FMsample clock and all the audio clocks are derived from thesame 12.288 MHz MEMS oscillator which drives the FPGA,so any drift or inaccuracy in this oscillator will be reflecteduniformly in the two data streams.

Finally, a key metric for simultaneous capture is howclosely the two streams can be synchronized in time. We usethe term skew to refer to the time difference between the audioand EEG sampled data streams. Since any known skew canbe corrected for by simply re-aligning the two data streams,the metric of interest is the variability of the skew over dif-ferent runs. To help synchronize the two streams, we createda ‘‘sync’’ feature in the OSP FM-ExG API, which signalsthe FPGA to insert about 1 ms worth of zeros into both theFM-ExG data stream and all microphone audio data streams.Then, a utility detects this period of exactly zero data (whichis virtually impossible to occur naturally due to system noise)and marks the end of this period as corresponding to the sametime in both streams. To determine the remaining skew andthe skew variability after this offset was corrected for, we useda signal generator to input a pulse wave into both the FM-ExGand microphone inputs, and in software measured the timingof the rising edges relative to the sync zeros period. Fig. 12shows the resulting skew over 32 trials.

VOLUME 7, 2019 162097


FIGURE 12. Comparing 32 trials of the measured skew between OSP’sFM-ExG and audio streams with the audio sample period. Since themeasurements only vary over about two audio sample periods, OSP canperform simultaneous FM-ExG and audio streaming synchronized towithin about 2 audio sample periods, or about 40 µs.

The FM-ExG signal path should theoretically have a delayof about 4-5 samples due to the pipelined ADC, whichaccounts for about 4-5µs; the audio signal path should have adelay of about 4-8 samples due to the resampling filters in thecodec, which accounts for 80-160 µs. The measured averageskew of 135 µs meets our expectations. More importantly,the standard deviation of the skew is about half the audiosample period; of course, it is not possible to identify thetiming of a step signal from a sampled representation withbetter precision than the sample period. Since this uncertaintyholds for both the sync zeros period and the pulse wave edges,and the subsample positioning of each of these are presum-ably uncorrelated, we expect a spread of about

√2 times the

audio sample period, which closely matches the data. Hencewe claim that OSP allows for simultaneous FM-ExG andaudio streaming synchronized to within about 2 audio sampleperiods, or about 40 µs.

2) ANALOG PERFORMANCETo evaluate the analog performance of the FM-ExG signalacquisition system, a 100Hz test sine wave modulating a250 kHz center frequency FM carrier with bandwidth expan-sion of 20 (to fit the FM bandplan outlined in Sec. III-B) wasgenerated by MATLAB’s fmmod() function and driven intothe FM-ExG ADC introduced in Section III.B. by a NationalInstruments USB-6361 DAQ Multifunction Analog/DigitalI/O Device. After being sampled and recorded by the PCD,the data was copied to a computer where it was demodulatedusing MATLAB’s fmdemod() to recover the original testsignal. Fig. 13 depicts the frequency domain representation ofthe result of this process, which demonstrates 94dB of signal-to-noise ratio (SNR). Compare this to the theoretical 100dBdescribed in Sec. III-B. While this is more than enough SNRfor the target application, we believe this measurement mayhave been partially limited by the test equipment. The DAQtest device mentioned above has a timing resolution of 10 ns,which limits the precision with which the FM carrier wave’sinstantaneous frequency could be generated, and thus theresolution with which the message signal could be encodedon the carrier wave.

FIGURE 13. Example spectrum of demodulated FM-ExG output, for a100 Hz sinusoidal signal as data. The FM waveform (250 kHz carrier,20× bandwidth expansion) was generated in software and played intothe OSP PCD’s analog FM-ExG input via a NI DAQ test device. Thesampled signal was recorded on the PCD and demodulated in MATLAB.

D. RESULTS SUMMARYOSP meets or exceeds the performance of four representa-tive commercial HAs on the ANSI 3.22 test protocol withan appropriate receiver. OSP also matches the latency ofcommercial systems with its baseline algorithms (< 10 ms),although its latency will vary as researchers reconfigure itwith optimized or additional algorithms. Its capabilities forwireless control, monitoring, and user interaction via theEWS enable rapid prototyping for clinical investigationsthat may not be possible with most commercial systems.The CPU occupancy reported in Table 2 and current drawreported in Table 3 are only partially optimized, and may beimproved further by the open-source community. The addi-tion of 6 DOF IMUs at ear level and the capability of acquir-ing multi-channel EEG synchronized with auditory stimuliwith about 40 µs are expected to facilitate phychophysicalinvestigations beyond what is currently possible. In conclu-sion, OSP meets the requirements of the community as a HAresearch platform; it is not a form-factor-accurate HA, in thesense of commercial HAs.

VII. CONCLUSIONOpen Speech Platform (OSP) is a comprehensive hardwareand software platform for research in hearing healthcare andrelated fields. It is designed to facilitate lab and field studies inspeech processing algorithms, human sound perception, HAfitting procedures, and much more, while also enabling newkinds of research which were never before possible.

The OSP PCD hardware contains the quad-coreSnapdragon 410c smartphone chipset running a custom-optimized Debian Linux OS. The PCD software comprisesbasic and baseline advanced binaural HA audio processingalgorithms, which run in real time with CPU resources tospare. The total microphone-to-loudspeaker latency due tohardware and OS is about 2.4 ms. Currently, basic HAprocessing adds 2.2 ms of latency and beamforming adds an

162098 VOLUME 7, 2019


additional 5 ms, for a total latency of 9.6 ms. The PCD ispackaged in a small, light plastic case, roughly 73 × 55 ×20 mm with a mass of roughly 83 grams. It contains enoughbattery power for at least 4 hours of operation with all featuresenabled.

OSP includes custom ear-level transducers in BTE-RICform factor. They support up to four microphones per ear,including special-purpose in-ear and VPU microphones, andsample all inputs and outputs at 48 kHz 24 bit with hardwaresupport for 96 kHz. They also contain an six-axis IMU formeasuring look direction, assessing balance, and other phys-ical activity research. The BTE-RICs communicate with thePCD via a custom packetized protocol over LVDS facilitatedby FPGAs at either end, which transmits high-speed audio,control, and clock information over a single differential pairin a thin four-wire cable.

The OSP PCD is also the gateway for FM-ExG, a low-power wearable biopotential signal acquisition system forcollecting EEG, ECG/EKG and EMG signals. The PCDincludes a high-speed ADC and interface logic in the FPGA,to enable acquisition of 12 channels of biopotential signalswith a measured SNR of 94 dB. FM-ExG can run while theHA processing is occurring, for simultaneous acquisition ofaudio and EEG synchronized to within 40 µs and with nolong-term drift.

Finally, the PCD hosts a WiFi hotspot and web serverwhich users and researchers can connect to with any browser-enabled device. The OSP software framework serves webapps from the PCD which allow users to interact with theparameters of the HA processing in real time. The web appsprovided with the current release of OSP include apps fordirect monitoring and control of all HA parameters, self-fitting, collecting data about the user’s environment, andassessing HA performance. The web apps use a popularsoftware stack and are easy to modify and extend, so thatresearchers can adapt them or design new web apps toconduct novel studies and field trials.

OSP has been architected to fulfill the vision set out by theNIH workshop [8] for an open, extensible research tool forhearing healthcare and related fields. OSP meets all of thebasic requirements presented there—portable hardware, real-time signal processing, advanced processing power, wirelesscontrollability, a reference HA implementation, and open-source hardware and software releases. It further meets manyof the advanced or optional suggestions: wearability, use of anFPGA in the signal chain, binaural processing, and incorpo-ration of sensing paradigms not traditionally associated withhearing aids, such as FM-ExG and the IMUs. OSP is a pow-erful set of tools which promote the open initiative for col-laborative work on research hardware and software, towardsnew discoveries in hearing-related healthcare research.

ACKNOWLEDGMENTSSupport from Sonion for providing emerging ear leveltransducers (in-ear microphone integrated in the speakermodule and ‘‘VPU’’ bone conduction microphone),

traditional BTE-RIC transducers, and other electromechan-ical components is greatly appreciated.

REFERENCES[1] R. J. Bennett, A. Laplante-Lévesque, C. J. Meyer, and R. H. Eikelboom,

‘‘Exploring hearing aid problems: Perspectives of hearing aid owners andclinicians,’’ Ear Hearing, vol. 39, no. 1, pp. 172–187, 2018.

[2] Guidelines for Manual Pure-Tone Threshold Audiometry, Amer. Speech-Lang.-Hearing Assoc., Rockville, MD, USA, 2005.

[3] J. Agnew and J. M. Thornton, ‘‘Just noticeable and objectionable groupdelays in digital hearing aids,’’ J. Amer. Acad. Audiol., vol. 11, no. 6,pp. 330–336, 2000.

[4] M. A. Stone and B. C. Moore, ‘‘Tolerable hearing aid delays. I. Estimationof limits imposed by the auditory path alone using simulated hearinglosses,’’ Ear Hearing, vol. 20, no. 3, pp. 182–192, 1999.

[5] A. N. Simpson, L. J. Matthews, C. Cassarly, and J. R. Dubno, ‘‘Timefrom hearing aid candidacy to hearing aid adoption: A longitudinal cohortstudy,’’ Ear Hearing, vol. 40, no. 3, pp. 468–476, 2019.

[6] S. Kochkin, ‘‘MarkeTrak VIII: Utilization of PSAPs and direct-mail hear-ing aids by people with hearing impairment,’’ Hearing Rev., vol. 17, no. 6,pp. 12–16, 2010.

[7] L. Brody, Y.-H. Wu, and E. Stangl, ‘‘A comparison of personal soundamplification products and hearing aids in ecologically relevant test envi-ronments,’’ Amer. J. Audiol., vol. 27, no. 4, pp. 581–593, 2018.

[8] R. L. Miller and A. Donahue, ‘‘Open speech signal processing platformworkshop,’’ Nat. Inst. Health, Bethesda, MD, USA, Tech. Rep., Oct. 2014.[Online]. Available: https://www.nidcd.nih.gov/research/workshops/open-speech-signal-processing-platform/2014

[9] Lab at UC San Diego. (2019). Open Speech Platform. [Online]. Available:http://openspeechplatform.ucsd.edu/

[10] L. Pisha, S. Hamilton, D. Sengupta, C.-H. Lee, and K. C. Vastare,‘‘A wearable platform for research in augmented hearing,’’ in Proc. 52ndAsilomar Conf. Signals, Syst., Comput., no. 52, Oct. 2018, pp. 223–227.

[11] J. Warchall, S. Kaleru, N. Jayapalan, B. Nayak, H. Garudadri, andP. P. Mercier, ‘‘A 678-µW frequency-modulation-based ADC with104-dB dynamic range in 44-kHz bandwidth,’’ IEEE Trans. Circuits Syst.II, Exp. Briefs, vol. 65, no. 10, pp. 1370–1374, Oct. 2018.

[12] J. Warchall, P. Theilmann, Y. Ouyang, H. Garudadri, and P. P. Mercier,‘‘A rugged wearable modular ExG platform employing a distributed scal-able multi-channel FM-ADC achieving 101 dB input dynamic range andmotion-artifact resilience,’’ in IEEE Int. Solid-State Circuits Conf. (ISSCC)Dig. Tech. Papers, Feb. 2019, pp. 362–363.

[13] C.-H. Lee, B. D. Rao, and H. Garudadri, ‘‘Sparsity promoting LMS foradaptive feedback cancellation,’’ in Proc. 25th Eur. Signal Process. Conf.(EUSIPCO), Aug./Sep. 2017, pp. 226–230.

[14] C.-H. Lee, J. M. Kates, B. D. Rao, and H. Garudadri, ‘‘Speech quality andstable gain trade-offs in adaptive feedback cancellation for hearing aids,’’J. Acoust. Soc. Amer., vol. 142, no. 4, pp. EL388–EL394, 2017.

[15] R. L. Miller, ‘‘Open design tools for speech signal processing (R01),’’Nat. Inst. Health Nat. Inst. Deafness Other Commun. Disorders, Bethesda,MD, USA, Tech. Rep., 2015. [Online]. Available: https://grants.nih.gov/grants/guide/rfa-files/RFA-DC-16-001.html

[16] R. L. Miller, ‘‘Open design tools for speech signal processing (R43/R44),’’Nat. Inst. Health Nat. Inst. Deafness Other Commun. Disorders, Bethesda,MD, USA, Tech. Rep., 2015. [Online]. Available: https://grants.nih.gov/grants/guide/rfa-files/RFA-DC-16-002.html

[17] O. Clavier, C. Audette, D. Rasetshwane, and S. Neely. (2019). Tympan.[Online]. Available: https://tympan.org/

[18] P. Stoffregen. (2019). Teensy USB Development Board. [Online]. Avail-able: https://www.pjrc.com/teensy/

[19] C. Obbard, D. James, T. Herzke, and H. Kayser. (2019). Open Commu-nity Platform for Hearing Aid Algorithm Research. [Online]. Available:http://www.openmha.org/

[20] I. Panahi, N. Kehtarnavaz, and L. Thibodeau. (2019). Smartphone-BasedOpen Research Platform for Hearing Improvement Studies. [Online].Available: https://www.utdallas.edu/ssprl/hearing-aid-project/

[21] H. Garudadri, A. Boothroyd, C.-H. Lee, S. Gadiyaram, J. Bell,D. Sengupta, S. Hamilton, K. C. Vastare, R. Gupta, and B. D. Rao, ‘‘A real-time, open-source speech-processing platform for research in hearing losscompensation,’’ in Proc. 51st Asilomar Conf. Signals, Syst., Comput.,Oct./Nov. 2018, pp. 1900–1904.

VOLUME 7, 2019 162099


[22] Variscite. (2019). DART-SD410: Qualcomm Snapdragon 410. [Online].Available: https://www.variscite.com/product/system-on-module-som/cortex-a53-krait/dart-sd410-qualcomm-snapdragon-410/

[23] L. Pisha, S. Hamilton, D. Sengupta, C.-H. Lee, K. C. Vastare, S. Luna,T. Zubatiy, C. Yalcin, A. Grant, M. Stambaugh, A. Boothroyd, G. Chock-alingam, R. Gupta, B. Rao, and H. Garudadri, ‘‘A wearable platform forhearing aids research,’’ in Proc. Int. Hearing Aid Res. Conf. (IHCON),2018. [Online]. Available: https://ihcon.org/files/ihcon/files/final_ihcon_2018_program.pdf

[24] Analog Devices. (2014). ADAU1372 Data Sheet. [Online]. Available:https://www.analog.com/media/en/technical-documentation/data-sheet/ADAU1372.pdf

[25] R. Bentler, H. G. Mueller, and T. A. Ricketts, Modern Hearing Aids:Verification, Outcome Measures, And Follow-Up. San Diego, CA, USA:Plural Publishing, 2016.

[26] F. Kuk, D. Keenan, and C.-C. Lau, ‘‘Vent configurations on subjectiveand objective occlusion effect,’’ J. Amer. Acad. Audiol., vol. 16, no. 9,pp. 747–762, 2005.

[27] D. T. Kemp, ‘‘Stimulated acoustic emissions from within the human audi-tory system,’’ J. Acoust. Soc. Amer., vol. 64, no. 5, pp. 1386–1391, 1978.

[28] P. Sergi, G. Pastorino, P. Ravazzani, G. Tognola, and F. Grandori, ‘‘A hos-pital based universal neonatal hearing screening programme using click-evoked otoacoustic emissions,’’ Scand. Audiol., vol. 30, no. 1, pp. 18–20,2001.

[29] Sonion. Sonion. (2019). [Online]. Available: https://www.sonion.com/[30] VPU14AA01 Tentative Data Sheet, Sonion, Private Commun., Plymouth,

MN, USA, 2018.[31] C.-H. Lee, B. D. Rao, and H. Garudadri, ‘‘Bone-conduction sensor assisted

noise estimation for improved speech enhancement,’’ in Proc. Interspeech,2018, pp. 1180–1184, doi: 10.21437/Interspeech.2018-1046.

[32] V. Kuleshov, S. Z. Enam, and S. Ermon, ‘‘Audio super resolutionusing neural networks,’’ 2017, arXiv:1708.00853. [Online]. Available:https://arxiv.org/abs/1708.00853

[33] Lattice Semiconductor. (2019).MachXO3: Futureproof Your Control PLDand Bridging Designs. [Online]. Available: https://www.latticesemi.com/Products/FPGAandCPLD/MachXO3

[34] J. Goldie, ‘‘The many flavors of LVDS,’’ Texas Instrum., Dallas, TX, USA,Tech. Rep. SNLA184, 2011.

[35] M. Peffers, ‘‘Introduction to M-LVDS,’’ Texas Instrum., Dallas, TX, USA,Tech. Rep. TIA/EIA-899, Feb. 2002.

[36] S. B. Huq and J. Goldie, ‘‘An overview of LVDS technology,’’ Nat.Semicond., Santa Clara, CA, USA, Appl. Note 971, Jul. 1998.

[37] S. E. Lord, M. Weatherall, and L. Rochester, ‘‘Community ambulation inolder adults: Which internal characteristics are important?’’ Arch. Phys.Med. Rehabil., vol. 91, no. 3, pp. 378–383, 2010.

[38] T. R. Prohaska, L. A. Anderson, S. P. Hooker, S. L. Hughes, andB. Belza, ‘‘Mobility and aging: Transference to transportation,’’ J. AgingRes., vol. 2011, May 2011, Art. no. 392751.

[39] N. M. Peel, S. S. Kuys, and K. Klein, ‘‘Gait speed as a measure in geriatricassessment in clinical settings: A systematic review,’’ J. Gerontol. A, Biol.Sci. Med. Sci., vol. 68, no. 1, pp. 39–46, 2012.

[40] J. Meinzen-Derr, L. H. Y. Lim, D. I. Choo, S. Buyniski, and S. Wiley,‘‘Pediatric hearing impairment caregiver experience: Impact of durationof hearing loss on parental stress,’’ Int. J. Pediatric Otorhinolaryngol.,vol. 72, no. 11, pp. 1693–1703, 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0165587608003819

[41] C. B. Christensen, R. K. Hietkamp, J. M. Harte, T. Lunner, and P. Kidmose,‘‘Toward EEG-assisted hearing aids: Objective threshold estimation basedon ear-EEG in subjects with sensorineural hearing loss,’’ Trends Hearing,vol. 22, pp. 1–13, Dec. 2018.

[42] V. Mihajlovic, B. Grundlehner, R. Vullers, and J. Penders, ‘‘Wearable,wireless EEG solutions in daily life applications: What are we missing?’’IEEE J. Biomed. Health Inform., vol. 19, no. 1, pp. 6–21, Jan. 2015.

[43] K. Kondo, K. M. Noonan, M. Freeman, C. Ayers, B. J. Morasco, andD. Kansagara, ‘‘Efficacy of biofeedback for medical conditions: Anevidence map,’’ J. Gen. Internal Med., pp. 1–11, Aug. 2019, doi:10.1007/s11606-019-05215-z.

[44] M. B. Sterman and T. Egner, ‘‘Foundation and practice of neurofeedbackfor the treatment of epilepsy,’’ Appl. Psychophysiol. Biofeedback, vol. 31,no. 1, p. 21, Mar. 2006, doi: 10.1007/s10484-006-9002-x.

[45] S. Enriquez-Geppert, D. Smit, M. G. Pimenta, and M. Arns, ‘‘Neurofeed-back as a treatment intervention in ADHD: Current evidence and practice,’’Current Psychiatry Rep., vol. 21, no. 6, p. 46, 2019.

[46] B. H. Kim, J. Chun, and S. Jo, ‘‘Dynamic motion artifact removal usinginertial sensors for mobile BCI,’’ in Proc. 7th Int. IEEE/EMBS Conf.Neural Eng. (NER), Apr. 2015, pp. 37–40.

[47] S. Haykin, Communication Systems. Hoboken, NJ, USA: Wiley, 2008.[48] Analog Devices. (2012). AD9235 12-Bit, 20/40/65 MSPS 3 V A/D Con-

verter. [Online]. Available: https://www.analog.com/media/en/technical-documentation/data-sheets/AD9235.pdf

[49] J. M. Kates, ‘‘Master hearing aid implementation in MATLAB,’’ PrivateCommun., Tech. Rep., 2016.

[50] J. M. Kates, ‘‘Principles of digital dynamic-range compression,’’ TrendsAmplification, vol. 9, no. 2, pp. 45–76, 2005.

[51] Digital Hearing Aids. Plural Publishing, Road San Diego, CA, USA, 2008.[52] T. van Waterschoot and M. Moonen, ‘‘Fifty years of acoustic feedback

control: State of the art and future challenges,’’ Proc. IEEE, vol. 99, no. 2,pp. 288–327, Feb. 2011.

[53] J. M. Kates, ‘‘Modeling the effects of single-microphone noise-suppression,’’ Speech Commun., vol. 90, pp. 15–25, Jun. 2017.

[54] J. E. Greenberg and P. M. Zurek, ‘‘Evaluation of an adaptive beam-forming method for hearing aids,’’ J. Acoust. Soc. Amer., vol. 91, no. 3,pp. 1662–1676, 1992.

[55] L. J. Griffiths and C. W. Jim, ‘‘An alternative approach to linearly con-strained adaptive beamforming,’’ IEEE Trans. Antennas Propag., vol. AP-30, no. 1, pp. 27–34, Jan. 1982.

[56] O. L. Frost, III, ‘‘An algorithm for linearly constrained adaptive arrayprocessing,’’ Proc. IEEE, vol. 60, no. 8, pp. 926–935, Aug. 1972.

[57] J. E. Greenberg, ‘‘Modified LMS algorithms for speech processing withan adaptive noise canceller,’’ IEEE Trans. Speech Audio Process., vol. 6,no. 4, pp. 338–351, Jul. 1998.

[58] O. Hoshuyama and A. Sugiyama, ‘‘Robust adaptive beamforming,’’ inMicrophone Arrays: Signal Processing Techniques and Applications.Berlin, Germany: Springer-Verlag, 2001, pp. 87–109.

[59] D. L. Duttweiler, ‘‘Proportionate normalized least-mean-squares adapta-tion in echo cancelers,’’ IEEE Trans. Speech Audio Process., vol. 8, no. 5,pp. 508–518, Sep. 2000.

[60] D. W. Maidment, Y. H. Ali, and M. A. Ferguson, ‘‘Applying the COM-Bmodel to assess the usability of smartphone-connected listening devicesin adults with hearing loss,’’ J. Amer. Acad. Audiol., vol. 30, no. 5,pp. 417–430, 2019.

[61] A. Paglialonga, G. Tognola, and F. Pinciroli, ‘‘Apps for hearing science andcare,’’ Amer. J. Audiology, vol. 24, no. 3, pp. 293–298, 2015.

[62] A. Charland and B. Leroux, ‘‘Mobile application development: Web vs.native,’’ Commun. ACM, vol. 54, no. 5, p. 49, 2011.

[63] J. Lee and B. Ware, Open Source Web Development with LAMP: UsingLinux, Apache, MySQL, Perl, and PHP. Boston, MA, USA: Addison-Wesley, 2002.

[64] G. Keidser, H. Dillon, M. Flax, T. Ching, and S. Brewer, ‘‘The NAL-NL2prescription procedure,’’ Audiol. Res., vol. 1, no. 1, p. e24, 2011.

[65] A. Boothroyd and C. Mackersie, ‘‘A ‘goldilocks’ approach to hearing-aid self-fitting: User interactions,’’ Amer. J. Audiol., vol. 26, no. 3S,pp. 430–435, 2017.

[66] C. Mackersie, A. Boothroyd, and A. Lithgow, ‘‘A ‘Goldilocks’ approach tohearing aid self-fitting: Ear-canal output and speech intelligibility index,’’Ear Hearing, vol. 40, no. 1, pp. 107–115, 2019.

[67] H. Dillon, G. Keidser, A. O’Brien, and H. Silberstein, ‘‘Sound qual-ity comparisons of advanced hearing aids,’’ Hearing J., vol. 56, no. 4,pp. 30–32, 2003.

[68] Specification of Hearing Aid Characteristics, document ANSI 3.22-2014,American National Standards Institute, New York, NY, USA, 2014.

[69] Audioscan/Etymonic Design Inc. (2014). Audioscan Verifit 2. [Online].Available: https://www.audioscan.com/verifit2

[70] RVA-90020-NXX Datasheet, Knowles Electron., Bengaluru, Karnataka,2009.

[71] RVA-90080-NXX Datasheet, Knowles Electron., Bengaluru, Karnataka,2014.

LOUIS PISHA received the B.A. degree in liberal arts from St. John’sCollege, Annapolis, MD, USA, in 2013, and the M.S. degree in electricalengineering/signal and image processing from the University of California atSan Diego (UC San Diego), in 2018, where he is currently pursuing the Ph.D.degree in electrical engineering/signal and image processing. His researchinterests include systems design for real-time audio signal processing appli-cations and related areas, including software and hardware architecture andsoftware/hardware co-optimization.

162100 VOLUME 7, 2019

http://dx.doi.org/10.21437/Interspeech.2018-1046

http://dx.doi.org/10.1007/s11606-019-05215-z

http://dx.doi.org/10.1007/s10484-006-9002-x


JULIAN WARCHALL received the B.S. degree in electrical and computerengineering from the University of Virginia, in 2013, and the M.S. andPh.D. degrees in electrical engineering from UC San Diego, in 2015 and2019, respectively. His research interest includes low power signal acqui-sition systems for medical applications. He is a member of the Tau BetaPi Engineering Honor Society. He received the Natural Science FoundationGraduate Research Fellowship (NSF GRFP), in 2014.

TAMARA ZUBATIY received the B.S. degree in cognitive science fromUC San Diego. She is currently pursuing the Ph.D. degree in human-centeredcomputing with the Georgia Institute of Technology. Her research interestsinclude human–computer interaction and mobile health in the healthcaresetting.

SEAN HAMILTON is currently pursuing the Ph.D. degree with the Depart-ment of Computer Science and Engineering, UC San Diego. He is currentlyresearching in the fields of embedded systems and programming languagesat the Microelectronic Embedded Systems Laboratory (MESL), under theadvice of Dr. R. Gupta. On the Open Speech Platform, he works underthe advice of Dr. H. Garudadri on the embedded operating system on theplatform as well as the real-time performance of the software development.

CHING-HUA LEE received the B.S. degree in electrical engineering fromNational TaiwanUniversity, Taipei, Taiwan, in 2013. He is currently pursuingthe Ph.D. degree in electrical and computer engineering with UC San Diego.His main research interests include speech and audio signal processing, sig-nal processing for hearing aids, sparse signal processing, adaptive filtering,speech enhancement, machine learning, and optimization.

GANZ CHOCKALINGAM received the Ph.D. degree in electrical and com-puter engineering from the University of Iowa. He is currently a PrincipalEngineer with the Qualcomm Institute, UC San Diego. His research interestsare in the areas of mHealth, telematics, and the IoT.

PATRICK P. MERCIER (S’04–M’12–SM’17) received the B.S. degreein electrical and computer engineering from the University of Alberta,Edmonton, AB, Canada, in 2006, and the S.M. and Ph.D. degrees in elec-trical engineering and computer science from the Massachusetts Institute ofTechnology (MIT), Cambridge, MA, USA, in 2008 and 2012, respectively.He is currently an Associate Professor in electrical and computer engineeringwith UC San Diego, where he is also the Co-Director of the Center for Wear-able Sensors. His research interests include the design of energy-efficientmicrosystems, focusing on the design of RF circuits, power converters, andsensor interfaces for miniaturized systems and biomedical applications.

RAJESH GUPTA serves as a Founding Director of the Halıcıoglu DataScience Institute and also as a Professor of computer science and engineeringat UC San Diego. His research is in embedded and cyber-physical systemswith a focus on sensor data organization and its use in optimization andanalytics. He currently leads the NSF Project MetroInsight and is a co-PIon DARPA/SRC Center on Computing on Network Infrastructure (CONIX)with the goal to build new generation of distributed cyber-physical systemsthat use city-scale sensing data for improved services and autonomy. He isa Fellow of the ACM and the American Association for the Advancementof Science (AAAS). He holds the Qualcomm Endowed Chair in EmbeddedMicrosystems at UC San Diego and the INRIA International Chair at theFrench International Research Institute in Rennes, Bretagne Atlantique.He currently serves as the Editor-in-Chief of the IEEE TRANSACTIONS ON

COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS.

BHASKAR D. RAO (S’80–M’83–SM’91–F’00) is currently a DistinguishedProfessor with the Department of Electrical and Computer Engineering andthe holder of the Ericsson Endowed Chair in wireless access networks withUC San Diego. His research interests include digital signal processing,estimation theory, and optimization theory, with applications to digital com-munications, speech signal processing, and biomedical signal processing.He was a recipient of the 2016 IEEE Signal Processing Society TechnicalAchievement Award.

HARINATH GARUDADRI received the Ph.D. degree in electrical engi-neering from The University of British Columbia, Vancouver, BC, Canada,in 1988, where he spent half his time in ECE and the other half in the Schoolof Audiology and Speech Sciences, Faculty of Medicine. He is currently anAssociate Research Scientist with the Qualcomm Institute, UC San Diego.He moved to academia, in November 2013, after 26 years in the industry towork on technologies that will improve healthcare delivery beyond hospitalwalls. His area of expertise is signal processing applications in diversefields, such as speech recognition, machine learning, speech, audio and videocompression, multimedia delivery in 3G/4G networks, low-power sensingand telemetry of physiological data, reliable body area networks (BAN),noise cancellation, and artifacts mitigation, among other areas. He holdsmore than 50 granted patents and over 20 pending patents in these areas.

VOLUME 7, 2019 162101

A Wearable, Extensible, Open-Source Platform for Hearing ...mesl.ucsd.edu/pubs/pisha_ieeeaccess2019_OSP.pdfclosely synchronized to the audio. All of these features are provided in

Documents