Application of LabVIEW and myRIO to voice controlled home ...1301398/FULLTEXT01.pdf · service known as Alexa (developed by Amazon), ... The other system is more focusing on myRIO

UPTEC E 19003

Examensarbete 30 hpApril 2019

Application of LabVIEW and myRIO to voice controlled home automation

Tim LindstålDaniel Marklund

Masterprogram i förnybar elgenereringMaster Programme in Renewable Electricity Production

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Application of LabVIEW and myRIO to voicecontrolled home automation

Tim Lindstål & Daniel Marklund

The aim of this project is to use NI myRIO and LabVIEW for voice controlled homeautomation. The NI myRIO is an embedded device which has a Xilinx FPGA and adual-core ARM Cortex-A9 processor as well as analog input/output and digitalinput/output, and is programmed with the LabVIEW, a graphical programminglanguage. The voice control is implemented in two different systems. The first system is based on an Amazon Echo Dot for voice recognition, which is acommercial smart speaker developed by Amazon Lab126. The Echo Dot devices areconnected via the Internet to the voice-controlled intelligent personal assistantservice known as Alexa (developed by Amazon), which is capable of voice interaction,music playback, and controlling smart devices for home automation. This system inthe present thesis project is more focusing on myRIO used for the wireless control ofsmart home devices, where smart lamps, sensors, speakers and a LCD-display wasimplemented.

The other system is more focusing on myRIO for speech recognition and was built onmyRIO with a microphone connected. The speech recognition was implemented usingmel frequency cepstral coefficients and dynamic time warping. A few commands couldbe recognized, including a wake word "Bosse" as well as other four commands forcontrolling the colors of a smart lamp.

The thesis project is shown to be successful, having demonstrated that theimplementation of home automation using the NI myRIO with two voice-controlledsystems can correctly control home devices such as smart lamps, sensors, speakersand a LCD-display.

UPTEC E 19003Examinator: Tomas NybergÄmnesgranskare: Ping WuHandledare: Payman Tehrani

Populärvetenskaplig sammanfattning

Röstigenkänning och röststyrning har blivit mycket populärt på senare tid med ett flertal stora

företag som satsat enorma resurser på att utveckla system för detta ändamål. Både Amazon,

Google och Apple har sina egna Cloud-baserade röstigenkänningssystem som hela tiden förbättras

allteftersom fler användares röster kan spelas in och därefter användas som jämförelse i ett

bibliotek. Till dessa röstigenkänningssystem finns det dessutom en mängd olika

tredjepartsapplikationer som kan användas för explicita ändamål. Exempel på dessa är Philips hue

system som med hjälp av Amazon Alexa eller Google Home kan användas för att röststyra smarta

lampor. Användandet av Amazons hemapplikationssystem (Amazon Alexa) och Googles (Google

home) har dock vissa begränsningar i form av att de kräver en brygga som konverterar kod till ett

protokoll som smarta lampor förstår samt att det är svårt att göra något helt skräddarsytt. För att

kunna göra något skräddarsytt behövs någon form av mikrokontroller som kan hantera och tolka

kod.

NI myRIO är en relativt kraftfull mikrokontroller som är specialanpassad för att användas i

studentrelaterade projekt. Den är uppbyggd med två processorer, en FPGA och en

realtidsprocessor. Programmeringsspråket som används för att skriva kod på denna

mikrokontroller är LabVIEW , vilket är ett grafiskt programmeringsspråk som använder sig av

dataflödesprogrammering.

Målet med detta projekt har varit att skapa två röststyrda system för hemapplikationer med hjälp

av mikrokontrollern NI myRIO och programmeringsspråket LabVIEW. Det ena systemet har

använt Amazon Alexa för röstigenkänning och fokuset för detta system har varit att med hjälp av

olika kommunikationsprotokoll kunna styra några smarta lampor, sensorer, en LCD-skärm och ett

par datorhögtalare samt att få dessa att interagera med varandra på givna kommandon.

För det andra systemet har fokuset mer legat på teorin bakom röstigenkänning, där målet varit att

bygga ett system som kan tolka några få röstkommandon och även styra en smart lampa med hjälp

av dessa.

Båda systemen uppnår projektspecifikationerna. Det ena systemet med Amazon alexas

röstigenkänning har implementerats med 28 olika röstkommandon, där exempel på

implementerade funktioner är en ljusshow och ett reglersystem för ljusstyrka i ett rum. Det andra

systemet kan känna igen fem olika kommandon, varav ett är ett ‘aktiveringsord’ som måste sägas

innan övriga kommandon. De övriga kommandona är fraserna ‘Red light’, ‘Blue light’, ‘Green

light’ samt ‘Yellow light’. Systemet har använt Mel frekvens cepstral koefficienter som en identitet

för en fras och dynamisk tidsförvrängning för att jämföra med förinspelade fraser i ett bibliotek.

Acknowledgements

The authors are very grateful for all the help and support provided by the supervisors PingWu and Payman Tehrani during the project, as well as the support from family and friends.Gratitude is also directed toward colleagues in the department of signals and systems at Uppsalauniversity for various support. A special gratitude is directed to Daniel’s girlfriend, Mimi Riblom,for all her patience and support when the apartment has been used as base for the project.

ii

Contents

Abstract i

Acknowledgements ii

Contents iii

List of Figures vi

List of Tables ix

Abbreviations x

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Purpose and project specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Tasks and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 LabVIEW & NI myRIO 5

2.1 LabVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 NI myRIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Theory 10

3.1 Speech recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.2 Type of speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.2.1 Isolated word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2.2 Connected word . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2.3 Continuous speech . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2.4 Spontaneous speech . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.3 Speech recognition techniques . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.3.1 Mel Frequency Cepstral Coefficients (MFCC) . . . . . . . . . . . 11

3.1.3.2 Hidden Markov Model (HMM) . . . . . . . . . . . . . . . . . . 13

3.1.3.3 Deep Neural Network (DNN) . . . . . . . . . . . . . . . . . . . . 16

3.1.3.4 Dynamic Time Warping (DTW) . . . . . . . . . . . . . . . . . . 19

3.2 Communication protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

iii

Contents iv

3.2.1 Open System Interconnection Model (OSI) . . . . . . . . . . . . . . . . . 22

3.2.2 IEEE 802.11 & WIFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.2.1 IEEE 802.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.2.2 WIFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 ZigBee & IEEE 802.15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3.1 IEEE 802.15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3.2 ZigBee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.3.3 Zigbee cluster library . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.4 Inter-Integrated Circuit (I2C) . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.5 Universal Asynchronous Receiver/Transmitter (UART) . . . . . . . . . . 30

4 Implementation 32

4.1 Voice controlled system using Alexa . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 Amazon Alexa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.2 IFTTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.3 Webserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1.3.1 Network variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.3.2 Port Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.3.3 Debug & Public server . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.4 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.4.1 LCD Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1.4.2 Temperature sensor . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.1.4.3 Light sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.4.4 Radio module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1.4.5 Philips hue bulbs . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.4.6 Computer speakers . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.5 System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.5.1 Parallel processes . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1.5.2 Queue system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1.6 Command & functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.6.1 On/Off & Dim Lights . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.6.2 Temperature display on/off . . . . . . . . . . . . . . . . . . . . . 46

4.1.6.3 Light sensor display on/off . . . . . . . . . . . . . . . . . . . . . 47

4.1.6.4 Speaker feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.6.5 Light Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.6.6 Dim level control . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.6.7 Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.6.8 Light show . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Customized voice controlled system directly in labVIEW . . . . . . . . . . . . . 50

4.2.1 Components and setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 FPGA configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.3 Wake-word and LED lights . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.4 Audio configuration and input . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.5 Decoding the audio signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.5.1 Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.5.2 Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.5.3 Start- and end time . . . . . . . . . . . . . . . . . . . . . . . . . 54

Contents v

4.2.5.4 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.5.5 Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Results & Discussions 59

6 Conclusions & Future work 63

A 65

B 70

Bibliography 71

List of Figures

2.1 Dataflow programming example [4] . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Different data type and their respective wire appearance [4] . . . . . . . . . . . . 6

2.3 The front panel and block diagram of a labVIEW VI . . . . . . . . . . . . . . . . 6

2.4 Mathscript node inside a While loop (grey square) . . . . . . . . . . . . . . . . . 6

2.5 The front of NI myRIO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.6 An overview of the two processor chips on NI myRIO [11]. . . . . . . . . . . . . . 8

2.7 Primary/Secondary Signals on MXP Connectors A and B at NI myRIO [11]. . . 8

2.8 An overview of NI myRIO [11]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 A block diagram describing the MFCC process. . . . . . . . . . . . . . . . . . . . 12

3.2 Principle of Mel scale filter bank [20]. . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 A graphical Hidden Markov model, where the circles indicates the states and thearrows indicates probabilistic dependencies between states. . . . . . . . . . . . . 14

3.4 A graphical overview of the Hidden Markov model parameters for one coin whereto possible outcome can be either heads or tails. . . . . . . . . . . . . . . . . . . 14

3.5 A graphical overview of the Hidden Markov model parameters for three coinswhere probability that a certain coin will be used are showed. . . . . . . . . . . . 15

3.6 A hidden Markov Model showing the three phonetic letters of the word nine. . . 15

3.7 A hidden Markov Model showing the three different states, beginning, middle &end for phoneme ay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.8 A hidden Markov Model showing the probability for a word as well as a specificphoneme and a specific part of a phoneme to occur. . . . . . . . . . . . . . . . . 16

3.9 A spectrum of the first 20 ms frame of the word ”Hello” where it is possible tosee more low frequency energies than high frequency energies, typical for malevoices [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.10 A full spectrogram for the word ”Hello” with all 20 ms frames added up together [9]. 18

3.11 A simplified model of a recurrent neural network. . . . . . . . . . . . . . . . . . . 18

3.12 A Deep neural network with input-, middle- & output layers. . . . . . . . . . . . 19

3.13 Euclidean & DTW matching of two sequences [7]. . . . . . . . . . . . . . . . . . . 20

3.14 An empty 10× 10 cost matrix, D. . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.15 A 10× 10 cost matrix, D, where the two first columns of values are calculated. . 21

3.16 A 10× 10 cost matrix, D, where all columns of values are calculated as well asthe warp path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.17 The ISO model with its layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.18 Example of a WIFI network [24]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.19 The IEEE 802.15.4 standard uses two layers as well as the LCC- and SSCS-layersfor communication with all layers above defined by additional standards. . . . . . 24

3.20 OSI model and the ZigBee model . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

vi

List of Figures vii

3.21 Different types of zigbee network topology . . . . . . . . . . . . . . . . . . . . . . 26

3.22 Example of an I2C-bus with one master device and three slave devices . . . . . . 28

3.23 Example of START and STOP conditions in an I2C circuit. [23] . . . . . . . . . 29

3.24 An example of a single byte I2C data transfer. [23] . . . . . . . . . . . . . . . . . 29

3.25 Example of an I2C write register. [23] . . . . . . . . . . . . . . . . . . . . . . . . 30

3.26 Example of an I2C read register. [23] . . . . . . . . . . . . . . . . . . . . . . . . 30

3.27 UART connections between two devices. . . . . . . . . . . . . . . . . . . . . . . 31

3.28 Example of a one byte UART communication. [21] . . . . . . . . . . . . . . . . . 31

4.1 A model of Alexa voice service [14]. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Block diagram code of the initialization of the LCD Display . . . . . . . . . . . . 37

4.3 Block diagram code of the first sequence window in the Write bytes VI . . . . . . 38

4.4 Block diagram code of the Print text VI . . . . . . . . . . . . . . . . . . . . . . . 38

4.5 I2C Write/Read express VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.6 Illustration of XCT-U software, where ’Discover Radio Modules’ is marked. . . . 40

4.7 X-CTU discover radio devices menu. The data should be left default . . . . . . . 41

4.8 X-CTU searching for radio modules. . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.9 Discovered radio modules in X-CTU. . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.10 Conversion of a .wav file to an array in LabVIEW. . . . . . . . . . . . . . . . . . 43

4.11 An overview of the voice controlled system with alexa. . . . . . . . . . . . . . . . 44

4.12 Block diagram of process 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.13 Block diagram of process 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.14 Block diagram of the Lightcontrol part in process 4 . . . . . . . . . . . . . . . . . 48

4.15 Block diagram of the Dim level control VI . . . . . . . . . . . . . . . . . . . . . . 49

4.16 Color and saturation scale in decimal values (0-255) . . . . . . . . . . . . . . . . 49

4.17 Process 3 Block diagram with the code for the command Lightshow . . . . . . . 50

4.18 Fpga configuration vi with controls and input for LED diodes and audio. . . . . 51

4.19 LabVIEW code for audio configuration . . . . . . . . . . . . . . . . . . . . . . . . 52

4.20 LabVIEW code for retrieving elements from Audio IN FIFO queue . . . . . . . . 53

4.21 LabVIEW code for setting up frame length . . . . . . . . . . . . . . . . . . . . . 53

4.22 LabVIEW code detecting the start- and end time of the Utterance . . . . . . . . 54

4.23 Calculation of the Mel filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.24 The Mel Filter Bank used in the project . . . . . . . . . . . . . . . . . . . . . . . 55

4.25 Calculation of Mel frequency cepstral coefficients. . . . . . . . . . . . . . . . . . . 55

4.26 Calculation of delta coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.27 Calculation of delta-delta coefficients. . . . . . . . . . . . . . . . . . . . . . . . . 56

4.28 Block diagram of the code where the set of words are tested against the utteredword in Match input creating the Distance matrix and Word index array . . . . . 57

4.29 Calculation of the cost matrix used in DTW . . . . . . . . . . . . . . . . . . . . . 57

4.30 Block diagram of the code for the requirements of the best match . . . . . . . . . 58

5.1 An overview of the system. 1. The radio module, which transmits Zigbee signalsto the smart lamps. 2. LED diodes, which indicates when a a word could be saidto the customized system. 3. The LCD screen which can display temperatureand Lux values on command with with the system based on Amazon Alexa. 4. Alight sensor used for light level display on the LCD screen as well as in the lightcontrol system. 5. A temperature sensor used for temperature display on theLCD screen and as voice response of temperature from a speaker. . . . . . . . . 59

List of Figures viii

5.2 A chart of the The PID regulator in the light control system, which changes thelight intensity between different Lux values in range 200-800 Lux. The Y-axisrepresent the Lux values and the X-axis the time, where the scale is 1/10 s. Eachset point is also marked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3 A chart of the The PID regulator in the light control system, which changes thelight intensity between different Lux values in range 500-1500 Lux. The Y-axisrepresent the Lux values and the X-axis the time, where the scale is 1/10 s. Eachset point is also marked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4 Unfiltered & filtered audio input signals and utterance sequence of the command”Blue Light” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.5 Matching results and data from a run when the command ”Blue Light” was said 62

A.1 A chart flow of process 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A.2 A chart flow of process 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66



A.5 A chart flow of the whole system . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

List of Tables

3.1 Dictionary with numbers and the corresponding phonetic numbers . . . . . . . . 17

3.2 Zigbee public profile IDs and Profile name . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Payload data packet in Hex values and their corresponding function and scale . . 28

4.1 Created applets and their function . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 LCD initialize commands and their function . . . . . . . . . . . . . . . . . . . . . 36

4.3 Processes and which commands they each handles . . . . . . . . . . . . . . . . . 45

4.4 Queues and where they are enqueued and dequeued as well as which componentsthey affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5 Color and saturation values for the different colors used . . . . . . . . . . . . . . 49

5.1 Success rate for the customized system when Daniel is speaking . . . . . . . . . . 61

5.2 Success rate for the customized system when Tim is speaking . . . . . . . . . . . 61

B.1 The phonetic alphabet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

ix

Abbreviations

API Application Programming Interface

ASR Automatic Speech Recognition

CCA Clear Channel Assessment

DLL Dynamic Linked Library

DNN Deep Neural Network

DCT Discrete Cosine Transform

DTW Dynamic Time Warping

ED Energy Detection

FFT Fast Fourier Transform

FIR Finite Impulse Response

FPGA Field Programmable Gate Array

HMM Hidden Markov Model

IEEE The Institute of Electrical and Electronics Engineers

IoT Internet of Things

LabVIEW Laboratory Virtual Instrument Engineering Workbench

LAN Local Area Network

LLC Logical Link Control

LQI Link Quality Indication

LR - WPAN Low Rate Wireless Personal Area Network

LSB Least Significant Bit

MAC Medium Access Control

MAN Metropolitan Area Network

MATLAB Matrix Laboratory

MFCC Mel Frequency Cepstral Coefficients

MLME MAC Sublayer Management Entity

x

Abbreviations xi

MLME -SAP MAC Sublayer Management Entity Service Accsess Point

MPDU MAC Protocol Data Unit

MSB Most Significant Bit

NI National Instruments

OSI Open System Interconnection

PHY PHYical layer

PPDU Physical Protocol Data Units

RT Real Time

RTOS Real Time Operating System

SAP Service Accsess Point

SPI Serial Peripheral Interface

STFT Short Time Fourier Transform

WLAN Wireless Local Area Network

WPAN Wireless Personal Area Network

ZCL Zigbee Cluster Library

ZLL Zigbee Light Link

Application of LabVIEW and myRIO to voice controlled home automation xii

Chapter 1

Introduction

This section presents the background for this thesis, specifications and objectives of the project,tasks required to fulfill the specifications, how the work procedure of the thesis is planned andan outline of the report.

1.1 Background

Speech is a basic form of communication between people and originates in its most primitiveway at least 100 000 years back [16]. No other communication method is faster and morenatural for humans. This is one of the reasons modern speech recognition systems have beenincreasingly developed in the last decades. These systems have enabled humans to communicatewith machines and computers by voice commands and can make daily life easier for people.Today there are many different options for speech recognition systems available on the marketand huge companies like Amazon and Google have their own systems which are compatible witha number of interesting smart home products by WIFI-connection. The use of these systemshowever requires extra tools called bridges to communicate wireless with home automationproducts as well as an app and can not be used fully customized.

Internet is something that connects people all around the globe but it is not limited to justpersonal computers and mobile devices, it can also be used to control home automation devicessuch as lights and fans. The technology of controlling home appliances over the internet is calledInternet of Things (IoT) and has seen a great rise in popularity during the recent years. Withthe use of internet home appliances can communicate with each other and making it possibleto integrate them in an automation system. Other functions with IoT is for example a smartmirror which is an ordinary mirror with a LED screen making it possible for a mirror to displayboth the reflection and communicate with the internet to display for example the local time andweather.

1.2 Purpose and project specifications

The purpose of this project is to use NI myRIO which is an embedded device developed nyNational Instruments Inc. and the graphical programming language LabVIEW to create two

1

Application of LabVIEW and myRIO to voice controlled home automation 2

wireless home automation systems, where both systems should be able to be used as stand aloneapplications.

The first system should use Amazon Alexa for voice recognition and all commands sent to smartlamps, speakers, sensors and a LCD screen should be controlled by NI myRIO. The systemshould implement the use of several communication protocols like WIFI, inter-integrated circuits(I2C) and Zigbee. The system should be able to handle the following criteria.

• The system should take in commands recognized by Amazon Alexa into the NI myRIO asstrings, which can be handled further by the system.

• The system should handle several communication protocols.

• Implementation of at least three external devices like smart lamps, sensors and LCDscreens.

The second system is more focused on speech recognition and will use Mel frequency cepstralcoefficients (MFCC) and Dynamic time warping (DTW) to recognize a few commands controllingcolors on a smart lamp. The following criteria should be fulfilled for this system.

• The system should recognize at least four different commands.

• A ’wake word’ which needs to be said before an executing command should be implemented.

1.3 Tasks and scope

The work procedure of this thesis can be divided into four parts; literature study, hardware work,software work and finally an evaluation. Security issues related to the communication protocolsare not handled in this thesis, nor is the theory behind cloud-based communication associatedwith Amazon Alexa. A couple of other common speech recognition techniques are handled intheory, but not implemented in the system.

The study of literature aims at presenting relevant theory of existing projects related to speechrecognition and voice controlled systems as well as the theory behind different communicationprotocols. The study’s tasks includes:

• Studying theory related to NI myRIO and LabVIEW.

• Studying WIFI-, I2C-, and Zigbee protocols.

• Studying theory of speech recognition techniques, and in particular the theory behind melfrequency cepstral coefficients and dynamic time warping.

• Investigating related projects and extract ideas from these.

To create the systems the NI myRIO has to be wired to the applications used in the project, likesensors, LCD screens, a radio module and LED diodes. The project’s hardware part containsthe following tasks:


• Connect a temperature sensor to the SDA- and SCL- input on the myRIO.

• Connect a light sensor to the SDA- and SCL- input on the myRIO.

• Connect a LCD screen to the SDA- and SCL- input on the myRIO.

• Connect a pair of computer speakers to the audio output on the myRIO.

• Connect a radio module via UART to the transmitter- and receiver ports on the myRIO.

• Connect 9 LED diodes to digital outputs on the myRIO.

• Connect a microphone to audio in on the myRIO.

The biggest work load lays in the implementation of the software built in LabVIEW. Afterconnecting the hardware components, the software is responsible for all functionality. Thefollowing tasks are included in this part:

• Implement communication between hardware components and the NI myRIO.

• Set up a web server on the NI myRIO and use this URL address to communicate withAmazon Alexa.

• Implement a voice controlled system which uses two way communication between thehardware components.

• Implement a speech recognition system based on MFCC and DTW.

The last part is the evaluation part which measures the performance of the systems, wheretesting results are evaluated.

1.4 Method

The literature study was the first part of this thesis project. By investigating the conceptsand theory behind speech recognition and voice controlled systems, including communicationprotocols, an overview of the intended systems could be obtained. Further, a closer look at thetheory of NI myRIO were taken as well as tutorials and a short web based course in LabVIEW.By studying related projects, many common problems could be avoided, or later in the processsolved.

Following the literature study, the focus was on implementing the chosen hardware componentsin the first system with Amazon Alexa. This work was closely made together with the work withprogramming the software to see that all connections were established correctly. The AmazonAlexa were at the same time prepared and the software for this installed on an app. When allhardware components were connected correctly, the work began with programming the softwareprogram in LabVIEW for the first system. A web server was established on the myRIO andcommands to use were defined in IFTTT. The program for then expanded continuously andimplemented with more functions.

When the first system was nearly finished, the work with the second system based on MFCCand DTW began. The work procedure was similar to the first system, but theory on FPGAneeded to be studied carefully since this was used for sampling the signals.

When both the programs was finished, both systems were evaluated based on performance.


1.5 Outline

Chapter 1 presents the background, purpose and project specifications, tasks as well as the workflow of this thesis. The objective of Chapter 2 is to give an introduction to the microcontroller NImyRIO and the graphical programming language LabVIEW. Chapter 3 presents history, relevanttheory and concepts related to speech recognition and communication protocols used in the thesis.Chapter 4 presents the implementation of software for both the systems. Chapter 5 presentsthe results for the project and discussions. Chapter 6 contains conclusions and suggestions forfurther work related to the results.

Chapter 2

LabVIEW & NI myRIO

2.1 LabVIEW

LabVIEW is a graphical programming language that creates applications using icons insteadof lines of text. In contrast to programming languages based on texts, where the executionorder is determined by instructions, LabVIEW uses data flow programming. Code and functionsare written in block diagrams belonging to a VI and flow of data passes through nodes, whichdetermines the order of execution. A node will in other words execute when it has receivedall its inputs and when executed produce output data that is passed to the next node in thedata flow path. A simple example is shown in Figure 2.1 where two numbers are added andthen subtracted by 50. In this example the VI will execute from left to right since the subtractfunction will not be able to execute before the add function has executed and sent output data tothe input of the subtract function. One of the benefits of data flow programming is that different

Figure 2.1: Dataflow programming example [4]

tasks is allowed to be processed concurrently which makes it very easy to design multitaskingblock diagrams for example parallel tasks managed in multiple while loops [18]. Wires is used totransfer data between objects in the block diagram and depending on the type of data beingtransferred the appearance of the wire will change. The wires must also be connected to inputsand outputs that are compatible with the data that are being transferred, for example an arrayoutput can’t be connected to a numeric input. A figure of a couple of different types of data andtheir respectively wire appearance is shown in Figure 2.2.

The VI (Virtual instrument) shown in Figure 2.3, built in LabVIEW, consists of the blockdiagram where the source code is but also of something called a front panel which is. The frontpanel is a GUI where controls, indicators, buttons and graphs etc. can be added by selectingthem from a drop down menu. This makes it extremely easy to create interactive and helpfullUI for your applications. As the name implies the VI is an imitation of a physical instrument[13]. LabVIEW is used in many industries around the world, for example LabVIEW is used

5


Figure 2.2: Different data type and their respective wire appearance [4]

Figure 2.3: The front panel and block diagram of a labVIEW VI

in the large hadron collider at CERN. LabVIEW can be used for data acquisition as well asequipment control and although it might be different from traditional programming it is veryuser friendly for first time users. It is also very flexible as you can import C/C++ code bymaking a Dynamic link library (DLL) in C/C++ and then call that DLL from labVIEW. Forusers that is familiar with MATLAB labVIEW has something called a mathscript node were itis possible to write your own textbased computations or cut and paste your MATLAB code intothe node. A mathscript node is shown in Figure 2.4. It is also possible to call the MATLABsoftware to execute scripts provided the user has a valid licence.

Figure 2.4: Mathscript node inside a While loop (grey square)


2.2 NI myRIO

Figure 2.5: The front of NI myRIO.

The myRIO shown in Figure 2.5 is an embedded device developed by National Instrumentsfor student projects with an architecture based on the configurable multiprocessor Xilinx Zynqz-7010. The two processors in the device (one fixed and one configurable) can be programmedindependently due to their own peripherals and memory. The fixed processor is a dual core pro-cessor called ARM Cortex-A9 MPCore, which includes a fixed set of peripherals and implementsthe ARMv7 instruction set architecture (ISA). The myRIO ARM Cortex-A9 is pre-configured inthe factory with a real-time Linux distribution. Linux is a popular real-time operating system(RTOS) with real-time extensions, a more deterministic scheduling and behavior fitting foroperating systems in a wide range of embedded applications and the overview of the FPGAprcessor aswell as the RT procsessor is shown in Figure 2.6. For NI Linux Real-Time, EclipseEdition, C and c++ development tools are used for downloading, compilation and executionas well as debugging of c applications on myRIO. But for structured dataflow applications,compiling, downloading and executing code on myRIO are performed in LabVIEW instead [11].

The myRIO has a plethora of functions and is a great choice for standalone systems. It hasan inbuilt accelerometer and also 4 LEDs that can be controlled and the FPGA chip makes itpossible for advanced robotics control. Since it has inbuilt WIFI it is also possible to use themyRIO for IoT systems and wireless control. Besides the many digital inputs and outputs aswell as analog input/output and audio in and output the myRIO has 3.3V, 5V and 15/-15Vpower outputs that can be connected to your own devices and applications. As seen in figure2.7 the myRIO has two MXP connectors where inputs and outputs for different communicationprotocols can be connected. The myRIO supports both I2C (inter-intergrated communication)where the SDA and SCL port is on pin 34 and 32 respectively as well as SPI (Serial PeripheralInterface), It also has a Universal Asynchronous Receiver/Transmitter (UART) with Rx andTx pins making the myRIO able to communicate with many different devices directly withouthaving to add an adapter of some sorts.


Figure 2.6: An overview of the two processor chips on NI myRIO [11].

Figure 2.7: Primary/Secondary Signals on MXP Connectors A and B at NI myRIO [11].


The Xilinx Artix-7 field programmable gate array (FPGA) is the reconfigurable processor onmyRIO. The FPGA consists of logic units, memory and other key building blocks that can bereconfigured at hardware level. An FPGA can implement peripheral hardware such as PWMgenerators, communication buses, quadrature encoder interfaces, video rendering and decoding,algorithms for signal processing and other processor architectures [11]. The FPGA processor aretherefore especially useful for systems requiring very fast calculations/response. An overview ofNI myRIO including expansion boards is shown in Figure 2.8. The specific information such asinputs and outputs can be found in the user guide for NI myRIO-1900 [22].

Figure 2.8: An overview of NI myRIO [11].

Chapter 3

Theory

3.1 Speech recognition

3.1.1 History

The first voice recognition technologies was developed in 1952 by Bell Labs and called TheAudrey System. This early system could only recognize ten digits spoken by a single voice [15].A following step was taken in 1962 by IBM with their Shoe-box machine, which above the tendigits also could recognize 16 English words and six arithmetic commands. Greater steps wastaken in the 1971-1976 by the U.S. Department of Defence who funded the DARPA SUR, aresearch program for speech recognition who developed Harpy by Carnegie Mellon, a programwhich could understand 1011 words. During the same period a voice recognition company,Threshold Technology, was founded, the first commercial one and a system which could handlemultiple voices was also introduced by Bell Labs. A new milestone was taken in 1978 by TexasInstruments when they introduced Speak & Spell since it used a speech chip. This made itpossible to make synthesize sound more human-like. The major breakthrough in the subject washowever when probability of unknown sounds and statistics in a special so called, Hidden MarkovModel, was introduced 1980’s. After this voice recognition started entering homes, with thefirst system for consumers called ’Dragon dictate’ developed in the beginning the 1990’s. Thissystem was further improved in 1997 and could at this point recognize 100 words per minute.BellSouth made the first voice - activated portal (VAL) in 1996. However, for many people, thissystem was inaccurate and caused nuisance. By 2001, the development of speech recognitiontechnology had hit a plateau until the arrival of Google. Google invented an app called ’GoogleVoice Search’ for iPhones that used data centers to calculate the huge amount of data analysisneeded to match user queries with samples of actual human speech. In 2010, Google introducedcustom recognition on Android devices where recorded voice queries from different users wasused to develop an enhanced model of speech. This systems library consisted of 230 billionwords. Eventually the modern systems developed by Apple, Google and Amazon rely on cloudbased calculations. These systems also implement third part applications and can be both funnyand behave more like an assistant.

3.1.2 Type of speech

There are four separated speech recognition classes which recognizes different types of utterances.

10


3.1.2.1 Isolated word

Isolated word system typically recognizes a single word inside an utterance window. It requiressilence both before and after the recorded word and also have a ”listen & non-listen state”.

3.1.2.2 Connected word

Connected word system are closely related to isolated word system, but they also allow separateutterances to be concatenated with minimal pauses in between.

3.1.2.3 Continuous speech

Continuous speech systems allows a user to speak naturally while a computer analyzes its content.The systems outer utterance boundaries can vary in a complex way, which makes continuousspeech systems one of the most difficult to create.

3.1.2.4 Spontaneous speech

Spontaneous speech systems can analyze unprepared speech like disfluencies. Examples of theseare filled pauses, repetitive words or a false start. Systems of this type which can understandspoken material in an accurate way as well as understand the context of the words are still beyondexisting technology, but would enable new features like making summaries of conversations,notes at business meetings and eventually even translate any existing languages perfectly.

3.1.3 Speech recognition techniques

3.1.3.1 Mel Frequency Cepstral Coefficients (MFCC)

Sounds generated by humans are all different depending on the shape of the vocal tract, likethe tongue or the teeth. The determination of this shape in a correct way would make itpossible to produce any sound accurately. Vocal tract can be represented by MFCC, which is arepresentation of a speech signal’s time power spectrum, where the MFCC is the coefficientsthat the Mel Frequency Cepstrum consist of [6].

The MFCC consists of six computational steps as presented by the block diagram in figure3.1. Every step represents either a function or some mathematical approach which are brieflydiscussed below.

The pre emphasis let the input signal pass through a filter which emphasizes the high frequencies.The energy in the higher frequencies will then increase due to the process as explained byequation 3.1, where a is a probability factor.

Y (n) = X(n)− aX(n− 1) (3.1)

Lets for example say a = 0.90. Then it is 90 % probability that a sample originates from theprevious sample.


Pre Emphasis Framing Windowing

FFTMel Filter BankDiscrete CosineTransform

Voice input

MagnitudeSpectrum

MelSpectrum

MelSpectrum

Figure 3.1: A block diagram describing the MFCC process.

Next step is to divide the speech signal into frames in range of 20-40 ms, where the total numberof samples in a frame is defined as N .

Further, a windowing technique called hamming window is used, which helps to reduce thediscontinuity at the start and end of each frames. The Hamming window is defined by equation3.2, where the window (W (n)) is in the range 0 ≤ n ≤ N − 1.

W (n) = 0.54− 0.46 cos(2πn

N − 1) (3.2)

The windowed output signal (Y (n)) is then given as equation 3.3, where X(n) represents theinput signal.

Y (n) = X(n) ·W (n) (3.3)

Since all samples in each frame are in time domain, some transform method need to be used totransform the samples to frequency domain. This is achieved by the Fast Fourier Transform,which converts the vocal impulse response (H(n)) and also the glottal pulse convolution (U(n)[12]. The output in frequency domain can then be described as in equation 3.4.

Y (w) = FFT [h(t) ∗X(t)] = H(w) ∗X(w) (3.4)

The range of the frequencies in the spectrum of FFT is wide and doesn’t follow a linear scale,why a log scale is used for the filter bank output, a scale called the Mel scale. A representation ofthis scale is shown in Figure 3.2, where a series of triangular filters are used for computation ofa weighted sum of spectral components which filters the output so that it approximates the Melscale. The magnitude of each filter’s frequency response is shaped triangular and equals unity atthe centre, while it linearly decreases to zero at closely adjacent filter centres [17]. The sum ofevery single filter’s spectral components then represents the output of each filter. An equationfor calculating the Mel frequency is described in equation 3.5. Maximum frequency should be

selected below the Nyquist frequency (fnyquist =fsample

2) and minimum frequency should be

selected above 100 hz. Typical values used for a sample rate of 11025 hz is fmax = 5400 andfmin = 130 [3].


Figure 3.2: Principle of Mel scale filter bank [20].

fMel = 2595 log10(1 +fHz

700) (3.5)

The final step is to transform the Mel scale back to the time domain which is achieved by theDiscrete Cosine Transform. The transform results is the Mel Frequency Cepstral Coefficients,where the series of coefficients are called acoustic vectors. Each and every input utterance istherefore transformed into an arrangement of acoustic vectors. The Discrete Cosine Transformis described by equation 3.6, where i = (1, 2, ...,K) and K defines the number of a frequencyband in the Mel scale. Further, n = (1, 2, ..., N), where N is the extracted number of MFCC:sand Si is the Short-Time Fourier Transform (STFT) of the discrete input signal [20].

Cn =

K∑i=1

log10(Si) cos[n(i− 1

2)π

K] (3.6)

Deltas and delta-deltas represent the first respectively the second order derivatives of the MFCCfeature vector and are also known as differential and acceleration coefficients. The use of thesecoefficients increases information about the dynamics that the MFCC:s describes i.e. over time,since MFCC:s in it self only describes the power spectral envelope of a single frame. The ASRperformance increases significantly if the MFCC trajectories are calculated and appended tothe original feature vector. If 12 MFCC coefficients are calculated, we can get 12 delta- and 12delta-delta coefficients, giving a total feature vector length of 36. [2].

The delta coefficients (dt) can be calculated according to equation 3.7, where this project usesN = 1. The delta-delta coefficients can be calculated in the same way, but are calculated fromthe deltas and not the static coefficients.

dt =

∑Nn=1 n(ct+n − ct−n)

2∑N

n=1 n2

(3.7)

3.1.3.2 Hidden Markov Model (HMM)

Hidden Markov model is the most commonly used modelling technique in modern speechrecognition systems and depends on probabilistic functions for known (observable) states but inunknown (hidden) sequences.


Figure 3.3: A graphical Hidden Markov model, where the circles indicates the states and thearrows indicates probabilistic dependencies between states.

Lets for example say we have a person tossing three coins in a closed room, where the outcomecan be either heads (H) or tail (T ) which results in the following sequence; THTTHHTH. Thissequence will be called the observation sequence (O). Someone outside of the room will onlyknow the outcome, but not in which sequence the different coins were tossed, nor the bias of thedifferent coins. To estimate in which dimension the outcome depends on the order of the tossingcoins or the individual biasing, we set up a probabilistic model which explains the sequenceof observations O = (o1, o2, o3, o4, o5, o6, o7, o8) = (T,H, T, T,H,H, T,H). The coins will hererepresent the hidden states since it is unknown which coin was tossed each time. It is possible toassume the likeliness for a state from the observations, but this sequence will not be unique. Tosimplify the idea we can first look at one coin, where the model parameter can be described asP (H). In this case the hidden states will be the actual observed states and P (H) will then bethe ratio of heads and tails i.e the probability for both heads and tails will be 0.5 [5].

O = T,H, T, T,H,H, T,H

S = 2, 1, 2, 2, 1, 1, 2, 1

P (H) = 1− P (H) = 0.5

1 2Heads Tails

1 P(H)

P(H) 1 P(H)

P(H)

Figure 3.4: A graphical overview of the Hidden Markov model parameters for one coin whereto possible outcome can be either heads or tails.

If we go back to the example with three coins the model parameter will look different sincethe hidden states are unknown, but probability parameters extracted from relevant informationon the side can give an idea of the model parameters for which coins that are tossed [5]. Anoverview of the possible outcome is shown in Figure 3.5. Lets say we have the known observationsequence and for this sequence the hidden states (S) are as below.

O = T,H, T, T,H,H, T,H

S = 3, 3, 2, 1, 1, 3, 2, 3


For someone not seeing the hidden states the model parameters can then only be described as

P (H) : P1, P2, P3

P (T ) : 1− P1, 1− P2, 1− P3

1 2

a11

3

a12

a22

a21

a31

a13

a32

a23

a33

Figure 3.5: A graphical overview of the Hidden Markov model parameters for three coinswhere probability that a certain coin will be used are showed.

The concept can be directly transferred to speech recognition, where analyzed text transformedto phonetic alphabet can give a probability for the next phoneme to occur in a word (Withoutanalyzed text, nothing can of course be said). Instead of just searching for all phonemes in adecided order of the phonetic alphabet and try to find a matching one, the HMM will reorderthe list and look for the one with most probability and if not matching, the one with second bestprobability and so on. The method will then increase the speed of finding the correct phoneme.

Figure 3.6: A hidden Markov Model showing the three phonetic letters of the word nine.

When windowing techniques are used, each phoneme will be repeated several times in a row, sinceeach window represent a very short time period. The phoneme will then have different sounds atthe beginning, in the middle and in the end, why the HMM has to take this in consideration[25]. A more representative illustration of how a specific phoneme could be determined couldtherefore be seen in figure 3.7.


Figure 3.7: A hidden Markov Model showing the three different states, beginning, middle &end for phoneme ay.

As seen in figure 3.6 & figure 3.7 there are two different probabilities to take in to account whenspeech recognition is used to determine a word. The one between phonemes and the one withinphonemes. When these parameters are known, it is possible to take the last HMM parameter inconsideration. The one that describes the probability for a certain word to be used. A modelof the concept for a few numbers is shown in Figure 3.8. A dictionary of the phonetic lettersfor numbers can be seen in Table 3.1, while a table for the whole phonetic alphabet can befound in Appendix A. Between the words there are always a silent moment, why there alsoneeds to be a probability for silence to occur. This will most often lead to the end where allphonemes determined will be compared to a dictionary and the final word is solved. After thisthe procedure repeats again.

w w w ah ah ah n n n

ey ey ey t t t Sil

Sil

z z z iv iv iv r r r ow ow ow

Start

Sil

End

P("one")

P("eight")

P("zero")

Figure 3.8: A hidden Markov Model showing the probability for a word as well as a specificphoneme and a specific part of a phoneme to occur.

3.1.3.3 Deep Neural Network (DNN)

The most accurate way to recognize speech is by the use of deep neural network, especially for alarge vocabulary system like a whole language [9]. The description of the concept first needs anintroduction of how audio samples are recorded. Raw mp3 audio is typically sampled in 44.1kHz, but human speech does normally not exceed 4 kHz, why speech recognition systems usesa sample rate of 8 or 11.025 kHz (one quarter of mp3 audio) to catch all fundamental speech

signals according to Nyquist frequency (fnyquist =fsample

2). These samples can be broken down

to smaller parts called frames, where a typical frame is 20 ms long and all frames are analyzedseparately. The use of Fast Fourier Transform (FFT) which is an algorithm going from timedomain to frequency domain makes it possible to extract the signal energies in the frequency


Numbers Phonetic numbers

one w ah ntwo t uwthree th r iyfour f ao rfive f ay vsix s ih k sseven s eh v eh neight ey tnine n ay nzero s iy r ow

Table 3.1: Dictionary with numbers and the corresponding phonetic numbers

band. The equation is shown in 3.8, where N is the number of samples for the resulting sequenceXn.

Xn =

N−1∑k=0

ke

−2πjkn

N , 0, 1, 2...N − 1 (3.8)

The result (Xn) can then be illustrated as a spectrum as shown in Figure 3.9 where it is easy tosee in what frequency range we have the energies. For this particular frame, the spectrum canbe seen as a ”fingerprint” for this part of the speech.

Figure 3.9: A spectrum of the first 20 ms frame of the word ”Hello” where it is possible to seemore low frequency energies than high frequency energies, typical for male voices [9].

If all frames in a sample are added up together as columns after each other it is then possibleto see the whole spectrogram for a word. An example of this is shown in Figure 3.10 for theword ”Hello”. This spectrogram will then be the ”fingerprint” for the whole word which can befurther analyzed with DNN.

The full spectrogram can be interpreted as an n ∗m image with pixels where DNN can be usedto analyze what letter each frame of sound corresponds to. The most time effective type is arecurrent neural network with a memory which can predict the future outcome. If for example”Gard” is said in the beginning of a word it is more likely that ”en” or ”ener” is coming in theend rather than something irrelevant like ”zxl”. So by saving previous predictions in the neuralnetwork in a memory will make the system more accurate a predict correct letter faster nexttime. A simplified model of this type is shown in Figure 3.11.

The technique behind it is however far more complicated and consist of an input layer, severalmiddle layers and an output layer as seen in 3.12, where each consist of many neurons. If welook at the spectrum of the 20 ms frame in figure 3.9 and imagine that each spectral valuecorresponds to an input neuron. These neurons can then be seen as a, where


Figure 3.10: A full spectrogram for the word ”Hello” with all 20 ms frames added up together[9].

Recurrent NeuralNetwork

Likelihoodsaying 'A'

Likelihoodsaying 'B'

Likelihoodsaying 'C'

And so on...

Input Stateful model Output

Figure 3.11: A simplified model of a recurrent neural network.

a = a1 + a2 + a3 + a4...an−1 + an

Each input neuron will be connected to every neuron in the first middle layer, where the analyticprocess begins. In this layer small fractions of the speech are broken down to pieces which areeasier to put together to something more understandable. All parts in the spectrum is not asinteresting for the determination of speech, why weighting functions (w) are used to distinguishthe importance of a parameter. The input parameter together with the weighting function willthen look like

w1a1 + w2a2 + w3a3 + w4a4...wn−1an−1 + wnan

To get the sum of the weighted function squeezed into a value between 0 and 1 a sigmoid functioncan be used.


σ(x) =1

1 + e−x(3.9)

This sigmoid value multiplied with the weighted function will give a measure of how positive therelative sum (equation 3.10) is and with a bias value set to some specific threshold value willthen activate the neuron.

p = σ(w1a1 + w2a2 + w3a3 + w4a4...wn−1an−1 + wnan) (3.10)

The same procedure will be repeated a few times depending on how many middle layers thereare, and the weighting functions will of course look different in all layers. Finally only one neuronin the output layer will be activated representing a letter.

Input layer Middle layers Output layer

Figure 3.12: A Deep neural network with input-, middle- & output layers.

3.1.3.4 Dynamic Time Warping (DTW)

Dynamic time warping is an algorithm depending on time series alignment. The method wasoriginally developed for speech recognition and uses two sequences of feature vectors and warpsthe time axis in an iterative way until there is an optimal path found between the two sequences[19]. Since a word can be said in many different ways (Slow, fast, high pitch, low pitch etc.) thismethod still allows to recognize them.

Lets say we have two speech sequences of the same word as in figure 3.13 that we want tocompare. Using euclidean distance matching element by element will most often lead to poorrecognition, why DTW are used instead.


Figure 3.13: Euclidean & DTW matching of two sequences [7].

The first step in DTW is to create the cost matrix D : m×n, where the two comparing sequencesx & y represent each axis.

x = [x1, x2, x3, ..., xi, ..., xn]

y = [y1, y2, y3, ..., yj , ..., ym]

The cost matrix can then be represented as in equation 3.11 or as an illustration with emptyvalues in figure 3.14.

D(i, j) = Dist(i, j) +min

D(i− 1, j)

D(i, j − 1)

D(i− 1, j − 1)

(3.11)

If values of the comparing sequences x & y as given as below

x = [1, 2, 4, 3, 5, 3, 2, 3, 2, 5]

y = [1, 1, 2, 4, 3, 5, 3, 2, 3, 2]

Then equation 3.11 is used to calculate each element. The calculation of the elements begin inleft bottom corner i.e. D(1, 1), where the absolute value of the distance is |1− 1| = 0. Since novalue exist below or to the left of this value, the result will be D(1, 1) = 0. If we instead look in


1 2 3 . . . i . . m

1

2

3

.

.j

.

.

.n

Figure 3.14: An empty 10× 10 cost matrix, D.

the second column for the value of D(2, 5), we have |2− 3| = 1 and the min(6, 4, 2) = 2, whichyields D(2, 5) = 1 + 2 = 3. In this way the matrix can be filled up column by column. The firstand second column in this example will then look like in figure 3.15

1 2 4 3 5 3 2 3 2 5

1

1

2

4

3

5

3

2

3

2

1

0

0

12

10

4

16

15

13

1

1

0

3

6

7

7

8

8

6

2

Figure 3.15: A 10× 10 cost matrix, D, where the two first columns of values are calculated.

When all elements in the matrix are calculated, the process of creating a warp path, W =(w1, w2, w3, ..., wk) begins. This is done by backtracking and greedy search to minimize thedistance as in equation 3.12.

Dist(W ) =

k=L∑k=1

Dist(wki, wkj) (3.12)

The equation begin to look in the upper right corner of the matrix and searches for the minimalvalue of the neighbouring left, bottom and left-bottom values. The process is repeated until apath is found all the way to the bottom-left corner as in figure 3.16. Since Dist(W ) is the sum


of all values in the path, it is possible to see that the warp path in this example will be justequal to the value of the upper right corner (Dist(W ) = 3). This is a very low value, whichindicates a good match.

1 2 4 3 5 3 2 3 2 5

1

1

2

4

3

5

3

2

3

2

1

0

0

12

10

4

16

15

13

1

1

0

3

6

7

7

8

8

6

2

4

4

2

0

1

2

3

5

6

8

6

6

3

1

0

2

2

3

3

4 6

6

5

2

0

2

2

6

10

10

2

1

1

0

2

2

3

7

12

12

1

1

0

1

5

3

5

7

13

13

1

0

1

1

5

3

6

8

15

15

0

1

1

2

6

4

8

8

16

16

3

3

4

4

4

6

9

11

20

20

Figure 3.16: A 10× 10 cost matrix, D, where all columns of values are calculated as well asthe warp path.

3.2 Communication protocols

3.2.1 Open System Interconnection Model (OSI)

To describe how information is transferred from one networked device to another by a transmissionmedium the OSI model (Open Systems Interconnection Model) is used. This model has a totalof seven layers each with its own function. Starting from the top is the Application layer(layer 7) which is the layer where the user is interacting with high level APIs. Next is thePresentation layer where the operating system handles the data, it can be functions such asencryption/decryption, translation or data compression. The fifth layer is the session layerwhose function is to handle multiple back and forth continuously transmission of informationbetween two nodes in other words a session, a session between a computer and a web serviceis for example created whenever a website is visited. The following layer is the transport layerwhich handles the reliability of the data segments being sent with functions such as segmentation(dividing data packets into smaller parts), acknowledgement (a signal to specify that data hasbeen sent/received), multiplexing (combining multiple stream of information/signals into onecomplex signal). Following these is the Network layer, this layer handles the structuring of anetwork i.e. addressing, routing and traffic control. Layer 2 is called the Data link layer and thislayer consist of two sub layers named Logical link control (LLC) and Medium access control(MAC). The LCC manage the flow control of information and multiplexing for the logical linkand the MAC layer provides flow control and multiplexing for the transmission medium. The lastlayer is the Physical layer whose function is the transmission and reception of raw bit streamsover a physical medium. The OSI model can be divided into two parts, one part is mostlysoftware and can be called the host layers (layer 7-4) and the other layers (layer 3-1) is mostlyhardware and is sometimes called the media layers and is illustrated in figure 3.17.


76543

Application.

Presentation

Session

Transport

Network

21

Data Link

Physical

Host L

ayers

Media Layers

Figure 3.17: The ISO model with its layers.

3.2.2 IEEE 802.11 & WIFI

3.2.2.1 IEEE 802.11

The IEEE 802 is a set of standardized protocols developed by the IEEE (Institude of Electricaland Electronics Engineers) dealing with LAN (Local area networks) and MAN (Metropolitan areanetworks). There are many protocols that are part of the IEEE 802 family with different workinggroups, for example the IEEE 802.3 deals with Ethernet and the IEEE 802.19 working groupdevelops standards for coexistence between standards of unlicensed devices and the standardsfor WLAN (Wireless local area network) is the IEEE 802.11. The 802.11 protocol only operatesin the lower layers of the OSI model i.e. the data link and physical layers.

3.2.2.2 WIFI

WIFI is a wireless networking communication technology, the communication uses radio waves totransmit and receive information. WIFI represents a wireless local area network (WLAN) whichis part of the IEEE 802.11 standards. The most widely used operating frequency is 2.4 GHz,but newer routers can also operate at 5 GHz, a technology called dualband. This technologyoffer more channels and a data transfer rate up to 600Mbit/s.

The WLAN consist of a gateway, formerly a router which receives and transmits signals from aninternet service provider. The router then forward signals to a receiver within the range, whichcould be a computer, cell phone or other WIFI enabled device. The range of the WLAN can beextended by a WIFI bridge, but in case of low signals- or if a device is out of the range, Ethernetcables can be used. A wireless local area network is exemplified in figure 3.18.


Figure 3.18: Example of a WIFI network [24].

3.2.3 ZigBee & IEEE 802.15.4

3.2.3.1 IEEE 802.15.4

The IEEE 802.15.4 is an architecture, on which the zigbee protocol is based on. To simplifythe standard, the IEEE 802.15.4 is defined in terms of a number of blocks. These blocks arereferred to as layers. Each layer is responsible for one part of the standard and provides thehigher layers with services and between the layers there are interfaces which serves to definelogical links described by this standard [10].

A low rate wireless personal area network (LR - WPAN) device consists of at least one PHY layercontaining the radio frequency (RF) transceiver together with its low - level control mechanismand a MAC sublayer providing access to the physical channel for all types of transfers. In agraphic representation, figure 3.19 shows these blocks which are further described below [10].

Layer 1

Layer 2

Figure 3.19: The IEEE 802.15.4 standard uses two layers as well as the LCC- and SSCS-layersfor communication with all layers above defined by additional standards.


Two services are provided by PHY: the PHY data service and the PHY management service.The PHY data service enables PHY protocol data units (PPDUs) to be transmitted andreceived across the physical radio channel. PHY’s features are radio transceiver activationand deactivation, energy detection (ED), link quality indication (LQI), channel selection, clearchannel assessment (CCA), and transmission and receiving packets across the physical medium.The ultra wide band PHY also has a feature of precision range [10].

Two services are provided by the MAC sublayer: the MAC data service and the MAC managementservice interfacing with the MAC sublayer management entity (MLME) service access point (SAP)(MLME - SAP). The MAC data service enables MAC data units (MPDUs) to be transmitted andreceived throughout the PHY data service. MAC sublayer features include beacon management,channel access, guaranteed time slot management, frame validation, recognized frame delivery,association, and disassociation. The MAC sublayer also provides features for appropriate securitymechanisms to be implemented [10].

The IEEE 802.15.4 standard also includes a logical link control (LLC) and service specificconvergence sub-layer (SSCS), which are added to communicate with layers above defined byadditional standards [8].

The intention of the standard is to have a format which can be used by other protocols andfeatures (layers three to seven). There are three different frequency assignments used by theprotocol (902-928 MHz in America & 868 MHz in Europe), but a frequency of 2.4GHz is mostwidely used worldwide [8].

3.2.3.2 ZigBee

ZigBee is representing the enhancement layers three to seven according to the OSI model, wherethe representation of each layer is shown in Figure 3.20.

Layer three and four defines the additional communication features. Example of enhancements inremaining layers can be to check valid nodes, encrypt for security and use forwarding capabilityand data routing to enable mesh networking. The most prominent utilization of ZigBee is remotesensor systems utilizing the mesh topology. A nice benefit with mesh topology is that all nodesin the system can communicate with any other node [8], leading to an increased network whichcan be spread over a larger area. The functionality and reliability will also be better since nodescan be bypassed in case of an disabled node. Examples of network topology is shown in Figure3.21, where a coordinator (the black dot) can communicate with other nodes in different ways.There is also another version of ZigBee available supporting energy saving. This version doesnot need either a battery or AC power to be maintained. A key benefit with ZigBee is howeverthe fact that there are a huge amount of pre-developed applications available. The applicationused in this paper is called Light link and are specifically used to control smart LED lights.


Figure 3.20: OSI model and the ZigBee model

Star Mesh Cluster Tree

Figure 3.21: Different types of zigbee network topology

3.2.3.3 Zigbee cluster library

The Zigbee Cluster Library (ZCL) is a library of standardized commands and functions whichare grouped together as clusters, ”A cluster is a related collection of commands and attributes,which together define an interface to specific functionality” [1]. The different clusters havedifferent IDs but for this project the focus will be on the clusters ”on/off”, ””Level Control”and ”Color Control” (Cluster ID ”0x0006”, ”0x0008 and ”0x300” respectively). Beside clusterID there are also profile IDs which are IDs for related applications and devices. There are anumber of public profiles and they are designed so that products from different manufacturescan work together. For this project the Home Automation (HA) profile is used and a table


of other profiles and there respectively IDs can be seen in Table 3.2. The ZCL uses frames to

Profile ID Profile Name

0101 Industrial Plant Monitoring (IPM)0104 Home Automation (HA)0105 Commercial Building Automation (CBA)0107 Telecom Applications (TA)0108 Personal Home & Hospital Care (PHHC)0109 Advanced Metering Initiative (AMI)

Table 3.2: Zigbee public profile IDs and Profile name

transmit information where the cluster ID and profile ID is specified. The frames are createdusing a software called XCTU and consist of the following parameters

• Delimiter (DL)

• Length (L)

• Frame type (FT)

• Frame ID (FI)

• 64-bit destination address (64bit)

• 16-bit destination address (16bit)

• Source endpoint (SE)

• Destination endpoint (DE)

• Cluster ID (CI)

• Profile ID (PI)

• Broadcast radius (BR)

• Options (Opt)

• Data payload (DP)

• Checksum (CS)

The delimiter is the first byte of a frame which indicate the beginning of a data frame and itis for this project always 0x7E. The length specifies the total number of bytes excluding thedelimiter, length and checksum. The frame type specifies the API type that is used and forthis project the API ID is 0x11 which is ”Explicit Addressing Command Frame” and allowsendpoint and cluster ID to be specified for a wireless transmission. To receive a response of thetranmission the Frame ID was chosen as 0x01 (setting it to 0 will disable the response frame).The 64-bit address represent the destination address, for this project Philips hue light bulbs wasused so the MAC address of those bulbs is the 64-bit address and it is described in section 4.1.4.4how these addresses are obtained. A device that joins a Zigbee network receives a 16-bit addressalso called the network address and when the address is unknown or when sending a broadcastthe value is ”0xFFFE”, this is why the 64-bit adress is included in the frame to ensure that the


data is being transmitted to the correct device. The source endpoint and destination endpointwere set to default values (0xE8 and 0xB). Cluster ID is the specific ID for the function thatis to be accessed example (0x0300 for color function). The Profile ID is ”0104” or the HomeAutomation profile (HA). Broadcast radius is set to 0x00 which sets the broadcast hops amountsto the maximum value. The data payload is the command and will be different depending onthe function that is to be accessed, for example the payload ”01 00 01 00 10” is the payloadfor the command ”turn on” and the payload ”01 00 00 00 10” is the payload for ”turn off” thepayloads used in this project can be seen in Table 3.3. Checksum is the control of data integrityi.e. to check if there were any error during transmission. The equation to calculate the checksumfrom a frame is shown in equation 3.13 and the method is to add all frame parameters exceptthe delimiter and length then take out the lowest 8-bit from this sum and subtract it from FFotherwise known as an Bitwise AND operation. To check if the checksum is valid and correctlyadd all parameters including the checksum and the last 2 digits of the sum will be FF. All digitsare in hexadecimal.

Checksum = (FT+FI+(64bit)+(16bit)+SE+DE+CI+PI+BR+Opt+DP )&FF (3.13)

Payload[Hexadecimal] Command Scale [Decimal/Hex]

01 00 01 00 10 Turn on -01 00 00 00 10 Turn off -01 00 04 XX 10 00 10 Level control, XX represent the specific light intensity 0-255/0x00-FF01 00 06 XX YY 10 00 10 Color, XX=specific color and YY=saturation 0-255/0x00-FF

Table 3.3: Payload data packet in Hex values and their corresponding function and scale

3.2.4 Inter-Integrated Circuit (I2C)

The I2C-bus is a very popular communication protocol which is developed to communicatebetween master- and slave devices. It has the benefits that a single bus can be used for multipledevices as shown in Figure 3.22, where typically a micro controller is used as master device andsensors, DACs/ADCc, LCD screens and controls are slave devices. All the devices are connectedto just two pins controlled by the master device [23].

Master

Slave Slave Slave

SDA

SCL

Figure 3.22: Example of an I2C-bus with one master device and three slave devices

The communication begins with initialization of a START condition sent by the master, whichhappens when the SDA line goes from high to low and the SCL line is high. The communicationthen terminates when a STOP condition is sent from the master, which is defined by a transitionfrom low-to-high on the SDA line while the SCL is high as seen in figure 3.23.

The SCL clock defines the speed of transfer, where only one data bit is transferred during eachclock pulse. One byte then contains eight bits on the SDA line. A byte can represent either a


Figure 3.23: Example of START and STOP conditions in an I2C circuit. [23]

register address, device address or some data read from or written to a slave, where the orderof the data bits will be with Most significant bit (MSB) first. Between the Start and STOPconditions, data bytes of any length can be transferred between the master and slave. It isimportant that data in the SDA line is stable during the clock periods high phase, since changesof data when SCL is high will be interpreted as a START- or STOP condition [23].

When a byte has been sent it will be followed by an ACK bit from the receiver. This bit tellsthe transmitter that the byte was received successfully and that it is ready for another byte tobe sent. The ACK bit can not be sent from the receiver before the SDA line has been released,why the receiver pulls down The SDA line from high to low. This will be done during the wholeclock period 9 (The ACK-period) so that stability of the SDA line as low is guaranteed whenthe clock pulse is high [23]. An illustration of this is shown in Figure 3.24.

Figure 3.24: An example of a single byte I2C data transfer. [23]

When writing to a slave on the I2C-bus, the start condition sent from the Master will be theaddress of the slave. Following bit will be the Read/write-bit which are set to 0, which representswrite. When the slave has sent an ACK bit, the master will send a byte representing the registeraddress of the specific register it wants to write to. Another ACK-bit is then sent from the Slave,which tells the master it is ready. Afterwards, the master will begin sending data accordingto the register until all data has been sent, which finally finishes with a STOP-condition thatterminates the transmission. An example of a write register are presenter in figure 3.25

To read from a slave device is similar to the writing process, but needs some more steps. Thefirst step is to send an instruction from the master to the slave telling what register it wantsit to read from. This is done in the same way as in the writing process, by sending the slave


Figure 3.25: Example of an I2C write register. [23]

address followed by the Read/Write bit set to 0 (which mean write) and then further the registeraddress it wants to read from. When the slave has sent the ACK command, the master onceagain sends a START condition, followed by the Slave’s address, but this time the Read/Writecommand is set to 1 (representing read). The slave will then send an ACK command for therequest to read which makes the master release the SDA bus. The master will however stillcontinue to supply the clock, even though the slave has the command. At this point, the masterwill become a master-receiver while the slave acts as a slave-transmitter.

Clock pulses will still be sent out by the master, but at the same time the SDA line will bekept released, which makes it possible for the slave to transmit data. After every byte an ACKcommand will be sent by the master to the slave to make the slave ready to send more data.When all expected bytes has been received by the master, a NACK command will be sent outby it, which tells the slave that the bus should be released to halt the communication. Thenfinally, a STOP condition is sent out by the master [23]. An example of an I2C read command isshown in Figure 3.26.

Figure 3.26: Example of an I2C read register. [23]

3.2.5 Universal Asynchronous Receiver/Transmitter (UART)

UART is a serial communication protocol which requires only two wires for communicatingdata between source and destination. In serial communication a single bit is transmitted atthe time sequentially over a single wire. By serial transmission of digital information througha single wire, channel size and overall communication can be reduced compared to parallelcommunication through multiple wires. Since UART communicates asynchronously, no clocksignal is required by the receiver to synchronize or validate data sent from the transmitter. Thistype of communication is in contrast to synchronous serial communication, where a clock signalis shared between the transmitter and the receiver in order to synchronize data. In UART, dataflows from The Tx-pin of the transmitter to the Rx-pin of the receiver and vice versa as seen infigure 3.27. Also, to achieve a common reference, both devices should be connected to the sameground.

UART achieve data synchronization by two mechanisms. The two communicating devices needsto share the same timing reference, which can be achieved by setting the baud rate. Also a start-as well as a stop-bit are used in the beginning and end of each data byte. Baud rate is a rate of


Rx

Tx

GND

Tx

Rx

GND

Device 1 Device 2

Figure 3.27: UART connections between two devices.

data transfer in serial communication and is expressed in bits per second (bps). There are somestandard baud rates defined (2400,4800,9600...) which can be configured in both devices.

The data frame of UART (figure 3.28) begins with an idle state where the logic is high. Thisshows that the line and transmitter is not damaged. Each frame then consist of a start-bit,data-bits, possibly a parity-bit and finally a stop-bit. The start-bit signals that a new characteris coming. The next 5-9 bits, depending on the configuration, represent the data. If a paritybit is configured, it is placed directly after the last data bit. This bit is used by the receiver todetect errors in the transmission if any data has been changed. The stop bit (or two stop bits)will always be in the logic high state, which signals that the transmission of the character iscompleted. All characters will be sent from the transmitter one by one in the same format.

Figure 3.28: Example of a one byte UART communication. [21]

Chapter 4

Implementation

4.1 Voice controlled system using Alexa

A voice controlled system using Alexa echo dot and myRIO was created. For the speechrecognition part the Alexa was used and myRIO was used as the microcontroller acting as thecentral control unit which sends and receives commands to/from various devices. This systemmakes use of both WIFI and Zigbee communication as well as UART and I2C protocols.

4.1.1 Amazon Alexa

Amazon Alexa is a cloud based voice recognition system supported by Alexa devices. There arecurrently three different kinds of Alexa devices available on the market; Amazon Echo, EchoDot and Amazon Tap. The first generation of device released was Amazon Echo, which staysin listening mode until the wake up word ”Alexa” is said. When it wakes up, a single voicecommand can be served and then the device returns to listening mode again. Amazon Echo isthe largest in the series with its 9.25-inches in height and appear as a cylinder shaped speakerwith an array of 7 small microphones. The mid-size option is the 6.2-inch-tall Amazon Tap,which is also portable and supplies batteries. Otherwise, functions are similar to the AmazonEcho. The latest generation of Echo devices is the Echo Dot, which is the smallest one with only1.6 inches in height, but also share most functions with Amazon Echo. Both Amazon Echo andEcho Dot require plug in power supply, why they best fit for fixed locations, like inside a kitchenor living room [14].

The voice recognition part for certain voice commands is further made by Alexa voice service. Aconcept of the voice service in connection with an Alexa device controlling smart home devicesis illustrated in figure 4.1. Sending a control command to a smart device begins with wakingthe Alexa up by saying ”Alexa”. Afterwards the specific command can be said. The soundof this voice command are then sent via the connected WIFI network to the Voice processingcloud. If the cloud validates the sound as a known command, the command is sent further to aSmart Home Skill Adapter, which enables cooperation with third party providers. Further, thecommand is sent to the third party provider’s cloud, which remotely can communicate with thespecific smart home device. [14].

32

Application of LabVIEW and myRIO for voice controlled home automation 33

Figure 4.1: A model of Alexa voice service [14].

4.1.2 IFTTT

IFFT (If This Then That) is a web-based platform that connects different services and devicestogether and makes it easy to create and customize your own apps called applets. It supports awide variety of services such as YouTube, Google, Dropbox and Amazon Alexa to name a few.The priciple is that if a specific action is done then do an action, hence the name IFTTT. Theconcept can be broken down in 4 steps.

• Choose a service

• Choose a trigger (the IF statement)

• Choose action service (which service should the If statement trigger)

• Choose action fields (The That statement)

For this project the services Amazon Alexa and Webhooks was used as service and action servicerespectively. For the If statement a specific phrase said to Alexa was chosen as a trigger andWebhooks was used to send a web request to a specific URL. The only requirement for thetrigger is that the user has to say ”alexa trigger” before the actual command, for this project anexample would be that if a user would like to turn on lamp 1 the phrase ”alexa trigger Light 1on” would be uttered. A full list of all the specific applets created and their functions can beseen in Table 4.1.


Trigger phrase: Function:

Light 1 on Turns on lamp 1Light 1 off Turns off lamp 1Dim Light 1 Decrease the light intensity on lamp 1Full dim light 1 Maximum light intensity on lamp 1Light 2 on Turns on lamp 2Light 2 off Turns off lamp 2Dim Light 2 Decrease the light intensity on lamp 2Full dim light 2 Maximum light intensity on lamp 2Normal light Set normal (white) light on lamp 2Red light Set red light on lamp 2Blue light Set blue light on lamp 2Green light Set green light on lamp 2Yellow light Set yellow light on lamp 2Lightshow Lamp 2 fades between red,blue,green and yellow light every two secondsLight control on Automatically changes the intensity of lamp 1 to a specific setpointLight control off Turns light control offLight sensor display on Display the current light sensor value on a displayLight sensor display off Remove the light sensor value from the displayTemperature display on Display the temperature sensor value on a displayTemperature display off Remove the temperature value from the displayVoice temperature Get the current temperature value as an audio responseSetpoint 200 Set the light control value at 200 (Lux)Setpoint 300 Set the light control value at 300 (Lux)Setpoint 500 Set the light control value at 500 (Lux)Setpoint 800 Set the light control value at 800 (Lux)Setpoint 1000 Set the light control value at 1000 (Lux)Setpoint 1200 Set the light control value at 1200 (Lux)Setpoint 1500 Set the light control value at 1500 (Lux)

Table 4.1: Created applets and their function

4.1.3 Webserver

With the help of labview a webserver is set up to receive the commands from IFTTT. A simpleHTTP request is sent from IFTTT to a specific URL which in turn is connected to a HTTPmethod VI on the myRIO whose function is to update a network variable. The web service VIwill in other words be invoked each time a HTTP request is sent from IFTTT and update anetwork variable. With labview it is fairly straightforward and user friendly to create a webserverwith HTTP method VIs. From the project explorer simply choose the target (myRIO) and selectnew, Web Service and after that select create new VI under ”Web Resources” and a HTTPmethod VI will be created.

Once the VI is invoked by a request, a network variable is updated. This is done by mappingthe URL query string to the network variable. So for each voice command (IFTTT applets)a different query string was used and the network variable will be updated according to thecommand the user said to alexa.


4.1.3.1 Network variable

A shared network variable is in this project used to communicate between 2 different VIs andshare data. The VIs described in section 4.1.3 only have 1 function namely to receive thecommand the user has said. The process of actually doing the task specified by the user ishandled in a seperate VI and to share the data between the VIs a shared variable is used.

4.1.3.2 Port Forwarding

The myRIO has WIFI communication and is easy to setup to the users home network givingit its own local IP address but since the information from alexa and IFTTT passes throughthe internet a way to reroute information coming on the home network router to the spe-cific machine (myRIO) on the network is needed. This is why port forwarding is used, tomake computers/clients on the internet be able to access a machine behind a router. By”opening” or forwarding a specific port to a specific local IP adress all information comingto that port is rerouted to the machine in question. This is done on the settings of therouter and for Labview the ports that are forwarded to the local address of the myRIO is port8080 and 8001 for public and debugging servers which is explained in more detail in section4.1.3.3. The URL address that is getting a web request from IFTTT will be on the formathttps://www.1234:8080/Webservice/?VIname/Command=lightshow. Where ”1234” is the homenetwork router address, ”8080” the open/forward port, ”/Webservice/?VIname” the referenceto the HTTP method VI, ”Command” the name of the network variable and ”lightshow” thestring value the network variable will have.

4.1.3.3 Debug & Public server

Before publishing a webserver to a target it is tested to see if the HTTP method VI communicateswith webclients correctly with the help of a debug server. The debug server provides a sandboxlike environment and the Web server can be placed on it directly from the Labview project.With the debug server virtual test was done to see if the information from alexa and IFTTT waspassed to the myRIO correctly, for example the command ”turn on light 1” was in the beginninglighting up a virtual LED light on the front panel of Labview instead of an actual physical lamp.When all debugging was completed a public server was published on the myRIO. The ports forthe debug server was 8001 and for the public server 8080 so both these port was ”opened”.

4.1.4 Components

For this system the following hardware components were used:

• 1 Amazon alexa echo speaker

• 1 National Instrument myRIO microcontroller

• 1 Philips Hue White Smart LED-lampa E27 806

• 1 Philips Hue Smart LED-lampa RGB E27 806

• 1 XB24CAWIT-001 - XBee R© through-hole module 2.4 GHz 6.3 mW, wire antenna - Digi


• 1 Xbee explorer USB Dongle

• 1 Luxorparts LCD-display 2x16 seriell I2C

• 1 LM75A Digital Temperature Sensor

• 1 TSL2561 luminosity sensor

• 1 Pair of speakers

• 1 Circuit breadboard

• Wires

4.1.4.1 LCD Display

The LCD display was used to display the temperature values and light intensity levels whenprompted of the user. The principle were that if the user issues a command to display thetemperature or light intensity the value would be displayed and if the user would issue a command(temperature/light display off) the display would display a blank space i.e. nothing. Since theDisplay had two lines with 16 slots each for characters both the temperature and light intensitycould be displayed at the same time, the temperature on the first line and the light intensity onthe second line.

An Initialize VI was created to make the display operate in a specific mode. For this projectthe mode was chosen to be 4-bit, 1/16 duty, 5x8 font and 2 lines. The commands sent to thedisplay by I2C communication to initialize this mode and their respectively function can be seenin Table 4.2 and the block diagram is shown in Figure 4.2.

Command [Binary/Decimal] Function

11/311/3 Reset the LCD11/310/2 Initialize 4-bit mode101000/40 Set the mode to 4-bit, 1/16 duty, 5x8 font and 2 lines1/1 Clear the display110/6 Move cursor after every entry1100/12 Turn off cursor11/3 Return to home position

Table 4.2: LCD initialize commands and their function

To write commands to the LCD a write bytes VI was created. To write bytes to the LCD acertain structure is needed. The specific register has to be selected by making the RS signaleither low/0 (instruction register) or high/1 (data register) meaning that setting the bit lowmeans sending a command and setting the bit high means the data being sent represents anascii character. For the initialize commands the instruction register is used. Then it has to bespecified if the mode is to read from the LCD or write to the LCD (R/W). For write mode a0/low is selected and for read a 1/high is selected. For the initialization commands only writemode is used (R/W = 0). Optional is also to enable the back light on/off by setting the bithigh(1) or low(0). Once these steps been made the high half of the command byte is extracted


Figure 4.2: Block diagram code of the initialization of the LCD Display

and the enable signal (E) is strobed to signal the data is ready to be sent. Then the low half ofthe command byte is extracted and the enable signal is strobed again. This is done in labviewby creating a boolean array with all the settings and then converting it to a number(byte) andsending that byte to the LCD slave adress. For the first part of the initialization commands (thedecimal command 3) would be:

1. Create a boolean array

2. Convert the command byte to a boolean array

3. Extract the high half of the command byte boolean array

4. Set index 0 to the RS value (0), index 1 to the value of R/W (0), index 2 to the value of E(0), index 3 to the value of the back light and index 4 to the high part of the commandbyte.

5. Convert the boolean array to a byte array and send to the slave adress.

6. Repeat the steps with E=1

7. Repeat the steps with the low half of the command byte

The procedure is then repeated for all the commands and snippets of the code is shown in Figure4.3.

A VI ”Print text” was created to display the current temperature and light intensity value whenprompted by the user. The same VI (write bytes) as in the Initialize VI was used but instead ofa command an ascii character was sent, i.e. RS was set to high (1). The ascii character wasrepresented by a string and then converted to a byte array. For the temperature the stringdisplayed was on the format ”Temperature: [value] ◦C” and for the light intensity on the format”Light: [value] Lux”. The block diagram code is illustrated in Figure 4.4.


Figure 4.3: Block diagram code of the first sequence window in the Write bytes VI

Figure 4.4: Block diagram code of the Print text VI

4.1.4.2 Temperature sensor

The temperature sensor (LM75A) has an inbuilt slave address and needs to be initialized. Asub-VI where the initialization is done using a series of I2C commands to write commands in thefollowing order: 8, 3. The initialization commands represents that the user is first accessing theresolution register (decimal command ”8” or ”1000” in binary) and then choosing a resolutionof +0.0625 ◦C with the binary command ”11” or decimal ”3”. The default value of the slaveaddress is 24, which is the value when the three address pins are of floating value. The slaveaddress can be changed when one or several of the address pins are connected to 3.3 V instead,


giving a maximum number of different slave addresses to 8. This means that it is possible to use8 different temperature sensors of this type in a single system.

A VI for reading the temperature values and converting them to Celsius was also made. By usingLabViews I2C write/read express VI illustrated in figure 4.5 the command ”101” or decimal ”5”was written to the temperature sensor to access the temperature registers and a byte count of 2was specified to read the most significant byte (MSB) and least significant byte (LSB) from thesensor. These commands were executed continuously every 100 millisecond, i.e. the temperaturevalue was updated every 100 millisecond. The temperature value was read in the form of a 16-bitbinary number where the MSB represents bit 15-8 and the LSB represents bit 7-0. Bit 15-13 is3 boundary bits (Ambient temperature (Ta) vs Critical temperature (Tc)), (Ta vs upper limit(TUpper)), (Ta vs lower limit (TLower)) and bit 12 represents if the temperature is positive ornegative and these values are masked out when converting to Celsius. When the values has beenread from the sensor equation 4.1 and equation 4.2 was used to convert the value to Celsius.

Figure 4.5: I2C Write/Read express VI

Ta > 0◦C : Ta = MSB × 24 + LSB × 2−4 (4.1)

Ta < 0◦C : Ta = 256− (MSB × 24 + LSB × 2−4) (4.2)

4.1.4.3 Light sensor

The light sensor TSL2561 was used to measure the ambient light levels in Lux. To communicatewith the device I2C protocol was used and 3 slave address could be chosen from depending onhow the Addr pin was connected (GND = 41, Float = 57, Vdd = 73). The light sensor wasinitialized by creating an Initialize VI similiar to the temperature sensor. In the Initialize VI thecommands ”80” and ”3” was written to the sensor. Since the user has to ”activate” the devicebefore reading the lux values these commands represent that the control register was chosen andpower on mode selected.

A VI was also created to read the lux values. The sensor operates by reading the infrared valuesand storing it in 2 registers called DATALOW1 and DATAHIGH1, were DATALOW1 representthe lower byte (8-bit value) and DATAHIGH1 the upper byte. The full spectrum values i.e.infrared + visible light is stored in the same way in registers DATALOW0 and DATAHIGH0.To read these values from the sensor a protocol called read word protocol was used meaningthat the value read from the sensor is a 16-bit value consisting of both the lower byte and theupper byte. For CH0 (DATALOW0 and DATAHIGH0) the command ”172” was written to thesensor representing that the read word protocol is used and 2 bytes (low and high) are read


with the Write/Read express VI. The same procedure is done for the CH1 (DATALOW1 andDATAHIGH1) bytes by writing the command ”174” instead of ”172”. The lux values ch0 (totallight) and ch1 (infrared light) is then obtained by shifting the upper byte to DATAHIGH shownin equation 4.3 and equation 4.4. The visible light is then obtained by subtracting ch0 with ch1.

ch0 = 256×DATAHIGH0 +DATALOW0 (4.3)

ch1 = 256×DATAHIGH1 +DATALOW1 (4.4)

4.1.4.4 Radio module

Smart lamps like Philips hue bulbs use the communication protocol zigbee IEEE 802.15.4, why aradio module which can send this type of signals are needed. Philips have their own hue bridgewhich works fine together with their app, but this will not work for a customized system. Theuse of Digi Xbee is then a better option for this purpose. There are several different models ofthis radio module with variations in range of power and sensitivity, but also in memory capacity.When using this radio module for transmitting signals to Philips hue bulbs it is best to use Digixbee S2C, which has larger memory capacity and therefore fits better for the use as coordinator.

For zigbee signals to work properly the radio module needs to be paired with the Philips huebulbs. This can be done in a separate program called X-CTU. When Philips hue bulbs arebought together with a Philips hue bridge they are always already paired. Since a hue bulbcannot be paired with several radio modules at the same time, these needs to be unpaired first.This can be done be using the Philips hue remote control which comes with the package. Byclicking on both ON- and OFF buttons at the same time near a lamp for some seconds willmake the lamp flash which indicates that the lamp is unpaired.

For making it possible to connect the radio module to the USB port on a computer an Xbeeexplorer USB Dongle can be used. The pairing procedure can then begin by open up X-CTUand clicking on ’Discover Radio Modules’.

Figure 4.6: Illustration of XCT-U software, where ’Discover Radio Modules’ is marked.

There it will then show up a window where the Com port should be chosen. If there areuncertainties to which port the radio module is connected to, all ports can be chosen. Then clicknext.

Port parameters like data bits and baud rate should be chosen, but these should be left asdefault. Then click finish.


Figure 4.7: X-CTU discover radio devices menu. The data should be left default

X-CTU will then search for all available radio modules through the COM ports chosen earlier.

The result will be displayed. To go further, click on ’Select all’ and then ’Add selected devices.

Now flash the zigbee coordinator API while also writing the following parameters:

• ZS=2, Zigbee-PRO Stack Profile

• EE=1, Enable Encryption

• EO=1, Encryption Options

• AP=1, Enable API

• A=1, Explicit API Output

After this, the Philips Hue Lightbulb should be turned on by connecting it to power. Choosethe Network working mode in XCT-U and hit scan. After some seconds the lightbulb will showup. This can be identified by an R, which stands for router (while C stands for coordinator).Now its also possible to see the 64-bit address. This address is the one to use when sendingcommands to the lamp. The frames used to send the commands are explained in section 3.2.3.3.


Figure 4.8: X-CTU searching for radio modules.

Figure 4.9: Discovered radio modules in X-CTU.


4.1.4.5 Philips hue bulbs

Philips hue bulbs are smart LED lamps found in a variety of different appearances. Some arejust white, others can change color and a few are designed four outdoor use. What they all havein common is that they can be controlled wireless by voice, for example by Amazon Alexa orGoogle home. It can either be done by the use of the Philips hue bridge, which often comeswith a package together with the bulbs, or by some other radio module which can handle Zigbeelighting protocols for communication.

4.1.4.6 Computer speakers

Common computer speakers can be connected directly to myRIO through analog out. A drawbackwith RT targets like myRIO is however that prerecorded audio files cannot be saved directlyon the hard-drive, why these need to be converted to and further saved as arrays inside theLabVIEW program.

When several short feedback commands should be recorded an easy way to do this is to use atext-to-speech program. There are numerous free web pages which can handle this with variousquality of the sound and with options to choose different languages and dialects as well as female-and male voices. A lot of them however do take a fee to make it possible to download the audiofiles, but SoundofText.com is a very good option with both good sound quality and possibilityto download the audio files totally for free. The audio files will be downloaded as .mp3 formatand also needs to be converted to .wav format before using it in labview.

The conversion to arrays can then be done as in figure 4.10 on the host computer with a SoundFile Read.vi where the Y-value of the waveform component can be extracted and saved as aconstant.

Figure 4.10: Conversion of a .wav file to an array in LabVIEW.

4.1.5 System setup

The system is set up so that the myRIO can receive voice commands from Alexa and it usesboth parallel processes and queues which are explained in more detail in section 4.1.5.1-4.1.5.2 tocommunicate between the devices listed in section 4.1.4. A figure of the communication betweenthe devices are illustrated in figure 4.1.


Amazon Alexa

LCD

Temperature sensor

Speakers

Radio module

Light sensor

Figure 4.11: An overview of the voice controlled system with alexa.

4.1.5.1 Parallel processes

The system has a multiple of functions that are independent with each other and has to beable to run in parallel i.e. if the command ”Lightshow” is said which is a function for light 2that function has to continue to run even if a function for light 1 is said after. This is done bygrouping several functions into different processes and running them simultaneously. For thissystem a total of 4 processes was used named Process 1, Process 2, Process 3 and Process 4.

Process 1 is used to send the command variable to Process 2. To avoid sending the commandover and over the process checks if the value of the command is the same as last iteration andputs it in the State queue if it is not true and does nothing if it is true and is shown in Figure 4.12.The second process Process 2 extracts the value from the state queue and decodes what type ofcommand has been said and if it is a noncontinuous command the corresponding command stringis sent to the radio module. If it is a continuous command or a command that uses the sensorsor speakers the process puts the command variable value in corresponding queue. Process 3 isentirely devoted to the command ”Lightshow” while process 4 handles command that uses thetemperature or light sensor, the display or the speakers. Flow charts for the whole system aswell as the individual processes are shown in Figure A.1-A.5 and which process handles whichcommand can also be seen in table 4.3.

Figure 4.12: Block diagram of process 1


Process 2 Process 3 Process 4

Turn on light 1 Lightshow Light control onTurn off light 1 Light control offFull dim light 1 Light sensor display onFull dim light 2 Light sensor display offTurn on light 2 Temperature display onTurn off light 2 Temperature display offDim light 1 Voice temperatureDim light 2 Setpoint 200Normal light Setpoint 300Red light Setpoint 500Blue light Setpoint 800Green light Setpoint 1000Yellow light Setpoint 1200

Setpoint 1500

Table 4.3: Processes and which commands they each handles

4.1.5.2 Queue system

A queue system is used to communicate between the different processes, for example to send thecommand variable value from process 1 to process 2. A queue is a FIFO (first in, first out) datastructure meaning the first element added to the queue will be the first one that is removed. Inother words when an element is enqueued (added) to a queue it is put at the back of the queueand when the queue is dequeued (remove an element) it is the first element in the queue thatis removed which will make sure for this system that the first command is handled before anyother commands.

For this system multiple queues was used. The ”State” queue which is a queue containing thestring values from the command variable. It is enqueued at all processes for when the state isupdated i.e. the command is changed and dequeued at process 2. Another queue that is uses isthe ”Lightshow” queue which represent the queue for process 3 i.e. the function ”lightshow”. Itis enqueued at process 2 and dequeued at process 3. Both the temperature sensor and the lightsensor has their own queues and a queue for Lightcontrol mode is also used. All different queuesand were each one is enqueued and dequeued can be seen in Table 4.4.

Queue name Enqueued at Dequeued at Contains commands for component

State Process 1,3&4 Process 2 All componentsLightShow Process 2&3 Process 3 Light 2LightControl Process 2&4 Process 4 Light 1 & Light sensorTemperature Process 2&4 Process 4 Temperature sensor, Speakers & DisplayLight sensor Process 2&4 Process 4 Light sensor & Display

Table 4.4: Queues and where they are enqueued and dequeued as well as which componentsthey affect


4.1.6 Command & functions

A total of 28 voice commands were made and can be seen in Table 4.1. All of these commandsfollows the same flowchart seen in figure A.5 to guide the command to the correct process andqueue. Which process and queue that handles each command in table 4.1 is described in section4.1.5 and this section describes how each command is handled.

4.1.6.1 On/Off & Dim Lights

For the commands ”Turn on/off Light 1/2”,”Dim Light 1/2” and ”FullDim Light 1/2” Process 2is used described in section 4.1.5 and the code is shown in Figure 4.13. A simple case structureis used and in each case is a cluster of strings. These clusters contains the hex code that theradio module will send to the Philips Hue Lamp via Zigbee protocol. To write these commandsto the radio module Labview VISA VIs is used. These VI makes it possible to open a serielcommunication and write/transmit information to the device connected to the myRIO hardware.The theory behind serial communication is described in section 3.2.5.

Figure 4.13: Block diagram of process 2

4.1.6.2 Temperature display on/off

For these commands the VIs that are used are Temperature Initialize, Temperature Read, LCDInitialize and LCD Write which are described in more detail in section 4.1.4.1, section 4.1.4.2 andsection 4.1.4.3. As explained in section 4.1.5.2 and table 4.4 the Temperature queue is used andis dequeued at process 4. If the element dequeued is ”Temperature display on” the TemperatureRead VI is used and the value from that VI is sent as a string to the LCD Write VI, the queueis also enqueued with the string element ”Temperature on” to make sure that the temperaturevalue is continuing to display at the LCD. If however the element dequeued is ”Temperatureoff” the queue is flushed (all elements in the queue is deleted), an empty string sent to the LCDwrite VI and the queue is enqueued with the command variable value to continue to send out anempty string unless the command ”Temperature on” is issued.


4.1.6.3 Light sensor display on/off

Since these commands need the information from the light sensor they use the Light sensorqueue and are dequeued at process 4. It uses the the same principle as the temperature displaycommands only that the light sensor is always collecting the lux value but depending on whatelement is dequeued it will either send an empty string (if command dequeued is ”light sensordispaly off”) and the lux value if the command is ”light sensor display on” to the LCD VI. Thelight sensor queue will also be updated by either ”light sensor display on” or ”light sensor displayoff” to continiusly display the lux value or an empty string on the display depending on thecommand.

4.1.6.4 Speaker feedback

The command ”Voice temperature” is used when the user wants an audio response of what thecurrent temperature is. The same queue as with the temperature display is used since the VIthat reads the temperature has to be accessed. It then uses the speaker VI described in section4.1.4.6 to play a prerecorded audio file through a pair of speakers.

4.1.6.5 Light Control

The command Light Control on/off uses the LightControl queue and process 4 as seen in table4.3 and table 4.4. Once the element is dequeued from the queue it goes into a case structurewhich will flush the queue and enqueue the command variable value if the dequeued elementis ”light control off”. For the on command a PID regulator is used to determine which lightintensity (Dim level) is needed to obtain the user specified setpoint. The PID regulator has 4inputs and 1 output. The input variables are the output range which range from 0 to 255 where0 represent a brightness of 0% and 255 represent a brightness of 100%, the network variable”Setpoint” which is the lux value that the user has specified i.e. the setpoint, the process valuewhich is the current lux value which is read from the light sensor VI from and the pid gains(proportional (P), integral(I) and deriative(D) gain constants).

The PID gains where tuned experimentally until a satisfying overshoot and response time wasobtained. In this case the values were P=0.1, I=0.01 and D=0. The setpoint variable is updatedas described in section 4.1.3.1. A shift register is used to keep track of the lux value from thelight sensor for every iteration i.e. how the process value has changed for every iteration andhow close the current lux value is to the setpoint. The output represent the procentual (0-100)brightness value (0-255) that the PID regulator has determine the light should have for thesetpoint to be reached. which is fed into the VI Dim level control which is explained in section4.1.6.6.

The process handles the commmands Lightcontrol on/off in the same manner as with thetemperature sensor in section 4.1.6.2 i.e. if the command is ”Light control off” the queue isflushed and the current command variable value is enqueued in the Lightcontrol queue. If thecommand is ”Lighcontrol on” a case structure is used to check the value of the command variable,if the variable is a command with a function for light 2 the lightControl process should continue soin that case the LightControl queue will be updated by enqueing the string element ”LightControlon” to continue the function. On the other hand if the command variable is something that hasa function for light 1 i.e. the function for light 1 should change from lightcontrol to somethingelse the lightcontrol queue is flushed then enqueued with the command variable element, the


state queue is also enqueued with the command variable element to ensure that the LightControlfunction is stopped. This is illustrated in figure 4.14.

Figure 4.14: Block diagram of the Lightcontrol part in process 4

4.1.6.6 Dim level control

The value from the regulator needs to be converted to the correct hex code command that theradio module will write to the Hue lamp but since the value from the regulator is an integera way to convert it to hex is needed. Since the value is dynamic the checksum will not be aconstant value. This value needs to be calculated to obtain the correct string command thatthe radio module will write to the Hue light, the checksum value is obtained with equation3.13. The packet or zigbee command that is sent to the light will have some constant values,for example the address for the light will not change and the cluster ID will be the same. It isonly the datapayload and the checksum that will change when the Dim level is changed. Thedata payload of light intensity is on the format ”01 00 04 XX 10 00 10” where XX represent thebrightness (0-255) i.e. in this case the output from the regulator. Type cast is used to convertthe integer out from the regulator to hex and then the ”01 00 04” ”converted value” and ”10 0010” is concatenated to form the correct datapayload string, the same is done for the checksumcommand string i.e. after calculating the value it is type casted to the datatype hex. A figure oftheblock diagram is shown in Figure 4.1.6.6.

4.1.6.7 Colour

The colour commands follows the same procedure as the on/off & dim commands in section4.1.6.1. Namely that the state queue is dequeued in process 2 and depending on what colour isused a specific command string/frame will be written to the radio module via LabView VISAVIs and sent to light 2. The color scale and saturation scale is shown in Figure 4.16 and thecolor and saturation levels chosen for this system can be seen in Table 4.5

4.1.6.8 Light show

The command ”Light show” represents a function where light 2 every 2 seconds fades betweenthe colours red, blue, green and yellow. The queue that is used is Lightshow and the process is


Figure 4.15: Block diagram of the Dim level control VI

Color Color scale value [Hex/decimal] Saturation scale value [Hex/decimal]

Red FE/254 FE/254Blue A9/169 FE/254Green 54/84 FE/254Yellow 2A/42 FE/254Normal 20/32 87/135

Table 4.5: Color and saturation values for the different colors used

0 255

Color

Saturation

Figure 4.16: Color and saturation scale in decimal values (0-255)


process 3. Once the command is dequeued from the queue a case selector checks if the commandis something other than light show (do nothing) or if it is lightshow. When the command lightshow is selected a shift register is used to count from 0-3 (increment every iteration and settingthe value to 0 every fourth iteration) and then extract at the index of the current iteration valuefrom a pre made color array containing the command string for the radio module for each of thecolors (red, blue, green and yellow) and then using the VISA write VI to write the packet to theradio module. The block diagram of the process is shown in Figure 4.17.

Figure 4.17: Process 3 Block diagram with the code for the command Lightshow

4.2 Customized voice controlled system directly in labVIEW

A customized voice controlled system was made using LabVIEW and NI myRIO. The programuses FPGA to record audio samples and MFCC to process the samples. Finally, DTW is use tomatch the samples against a pre-defined dictionary and a command is performed. The programcan handle five commands, a wake word ”Bosse”, and four color commands sent by a radiomodule by Zigbee protocol to Philips hue color bulbs. The processes are described below.

4.2.1 Components and setup

For this system the following hardware components were used:

• 1 National Instrument myRIO microcontroller

• 1 Philips Hue Smart LED-lampa RGB E27 806

• 1 XB24CAWIT-001 - XBee R© through-hole module 2.4 GHz 6.3 mW, wire antenna - Digi

• 1 Xbee explorer USB Dongle

• 1 Trust All around USB Microphone

• 1 Circuit Breadboard

• 1 Red LED Diode

• 4 Green LED Diode

• 4 Yellow LED Diode

• Wires


4.2.2 FPGA configuration

When configuring FPGA in LabVIEW, the first thing to do is to open up a reference, whichspecifies FPGA VI where all controls for components exist. It is then possible to change thevalues of these controls, which are needed in the application and set up a reference to those. Inthis project a reference is set up to the four inbuilt LED diodes on the myRIO, which indicateswhen the wake word is registered. A reference is also set up to nine digital outputs on the myRIO.These are further connected to LED diodes on the bread board, which indicates a time periodwhen a word can be said. A reference is also set up to audio in as stereo mode (Left/right) forthe audio recording. The code for this part is shown in Figure 4.18.

Figure 4.18: Fpga configuration vi with controls and input for LED diodes and audio.

4.2.3 Wake-word and LED lights

All modern speech recognition systems have a wake-word which needs to be said before acommand, why a wake-word has been implemented in this system as well. The wake wordshould be short and consist of two to three syllables and the chosen wake-word for this system is”Bosse”.

The wake word was implemented by first recording the sample, process it and detect whichword was said. Then a shift register look at the word said in the last iteration and if this wordwas ”Bosse”, then it goes into a case where all other words can be detected as well. The word”Bosse” can be said multiple times in a row, always allowing other words to be said in the nextiteration. Every time the system detect the wake-word, four LED diodes on the myRIO willlighten up indicating it is listening to other commands too. This is done by sending a true valueto the control ”LED:s On” in the FPGA reference VI in a separate sub-VI. The input is just aTRUE/FALSE, checking if the word ”Bosse” was said in either of the two cases.

To be able to know when samples are recorded, nine LED diodes on the bread board areconnected to digital outputs on the myRIO. These diodes are coded in a flat sequence whereevery diode lightens up 0.5 seconds after the one before, until every diode is on after 5 seconds.


Then all are turned off at the same time and the first diode start lightening again and so onindicating a new sample is recorded.

4.2.4 Audio configuration and input

A VI that specifies how the sound from the microphone should be obtained/configurated ismade. In this VI two controls that decides sample rate and read rate are user configurable. Thesample rate and read rate for this system and all tests are 11025Hz and 1Hz respectively whichcorresponds to a total sample rate of 11025 Hz which is in the interval of appropriate samplerate for human speech according to section 3.1.3.3. By multiplying the inverse sample rate withthe internal clock rate of the FPGA (40MHZ) the amount of ticks is obtained. This is writtento the Count control in the FPGA reference VI by a Read/Write control. By multiplying theAudio sample rate and the inverse of the read rate the total amount of samples for one channel(left/right) during 1 second is obtained. This amount is doubled to obtain both the left andright audio input channels samples during 1 second. When the samples have been doubled thisamount is then multiplied by how many seconds the system should record the audio, in thisapplication the system is sampling audio during 5 seconds before it starts over again. With asample rate of 11025Hz and read rate of 1Hz and a sample period of 5 seconds the total bufferor max amount of samples is 110250 (55125 for left audio input and 55125 for right audio input).This is then set as the requested depth as seen in figure 4.19 in the FIFO queue Audio IN.

To read the values from the audio input from the microphone a VI was created and the codeis shown in Figure 4.20. By checking if the total elements in Audio IN FIFO queue is an evennumber and correcting it if it is an uneven number makes it so both the left and right audioinput is the same size. The data in the Audio IN queue is then sorted to 2 arrays ”Left AudioIn” & ”Right Audio In”. For this application the left audio array is used.

Figure 4.19: LabVIEW code for audio configuration


Figure 4.20: LabVIEW code for retrieving elements from Audio IN FIFO queue

4.2.5 Decoding the audio signal

4.2.5.1 Frames

The data from the microphone is passed through a FIR filter with FIR coefficients 1 and 0.95according to equation 3.1. This data is then passed through a VI designed to divide the signalinto frames of 20 milliseconds. The block diagram of the VI is shown in Figure 4.21. The VIdivides the frames into 20 milliseconds with 50% overlap for continuity. The length of the framesis calculated by dividing the chosen frame time (20ms in this case) with the period time of theinput signal from the microphone. The number of total frames are calculated by essentiallytaking the time of the input signal and dividing it with the frame time (20ms). This is thetotal amount of frames without overlap, with 50% overlap the total amount of frames withoutoverlap is multiplied with 4/3. In the for loop the frame length is multiplied with 0.75 andextracted from the microphone data array. In this project the time for the input signal is 5sand the sample rate is 11025Hz i.e. the period time is 90µs, this means the frame length will be220 and the number of frames will be 333. the first 220 elements in the array are sent to thehamming window and then the elements 165-385 are sent and so on. The hamming window andthe equation for the hamming window is described in equation 3.2.

Figure 4.21: LabVIEW code for setting up frame length

4.2.5.2 Threshold

First the signal energy of the noise is calculated by taking the 15 windowed frames with lowestenergy. The mean value of the signal energy of these frames are then multiplied with 10, whichrepresents the threshold value. All signal energies from the windowed frames are then comparedto this threshold value and if it exceeds the threshold value the output for the frame are set to 1,otherwise 0.


4.2.5.3 Start- and end time

Some external noise will still be exceeding the threshold value, why a median filter is used toeliminate these. The median filter look at the three values to the left and right of a spike and ifthese are 0, then the median are 0 and the value will be replaced by a 0. To detect the startvalue, a peak detector is used. The peak detector have a threshold value of 0.5, which mean itwill detect all boolean values of 1 and send out the locations of these. Since the peak detectoruses a quadratic fit to find the peaks the locations are not integers, why a round toward - infinityare used to truncate the values. Then a conversion of the index value to corresponding timeare calculated to get the specific start- and end time. A code sequence for this part is shown inFigure 4.22. The end time is however not so accurate, why a time limit is used instead. Thistime limit is chosen to be 0.6 seconds, which corresponds to 34 rows beginning at the start time.

Figure 4.22: LabVIEW code detecting the start- and end time of the Utterance

4.2.5.4 Feature extraction

The feature extraction begins with the computation of an FFT on all the frames using an FFT.vi,where the number of spectral points are chosen to 1024 giving a 512 point spectrum. Energy ofthe FFT-values are then extracted as the squared value. Parallel to this, a calculation of theMel filter is made. Number of filters should be chosen to somewhere between 20-40 dependingon the resolution wanted, and in this project the number of filters are chosen to 36, which givesthe total amount of centre frequencies. Max- and min frequencies are converted to the Mel scaleaccording to equation 3.5. Then the frequencies are linearly spaced between the minimum andmaximum frequencies and further converted back to Hz in the scale. With the centre frequenciesknown, the triangular pattern are then calculated with an normalized amplitude using a Trianglepattern.vi. This vi specifies the asymmetric character of the pattern and returns an array thatcontains the pattern with samples representing the Mel filter i.e. the Mel filter Bank, these banksare shown in Figure 4.24. LabVIEW code for this part is shown in Figure 4.23. Each spectrumis weighted by multiplying with every bank in the Mel scale. By summarizing all elements andtaking the natural logarithm and then compute the DCT generates the MFCC:s (code in figure4.25). All elements are summarized and the natural logarithm of this sum is calculated and thenthe DCT of this result is computed. This procedure is done for each bank, which generates theMFCC:s. Only twelve coefficients are saved, number 2-13, since the other does not contribute toautomatic speech recognition, they represent fast changing filter bank representation. For moreaccuracy and increase of the performance the delta and delta-delta coefficients (code in figures


4.26 & 4.27) are added to the vector as well as the log energy of the frame, which increases thenumber of dimensions in the acoustic vector to 35 (1 log energy of the frame, 12 MFCC:s, 12delta coefficients & 10 delta-delta coefficients). The vector is normalized by subtracting themean value.

Figure 4.23: Calculation of the Mel filter.

Figure 4.24: The Mel Filter Bank used in the project

Figure 4.25: Calculation of Mel frequency cepstral coefficients.


Figure 4.26: Calculation of delta coefficients.

Figure 4.27: Calculation of delta-delta coefficients.

4.2.5.5 Dictionary

A dictionary is created where each word is saved six times. The words in the dictionary are”Bosse”, ”Red Light”, ”Blue Light”, ”Green Light” and ”Yellow Light”. The process of buildingthe dictionary is to run the program say the word ”Bosse” and save the extracted acoustic vectorfrom the feature extract VI. This process is then repeated 6 times i.e. six vectors for the word”Bosse” has been created. This process is then done for all the other words as well creating6 different sets of 5 words. In this system the dictionary only consists of recordings made byDaniel. When the user runs the program the acoustic vector is saved as ”Match input”. Thisacoustic vector is then compared to the set of vectors in the dictionary. The Match input istested against the first set and the word with the least distance is extracted, the same is done forall the different sets and the word that is extracted the most is regarded as the best match. Thedistance is calculated by DTW as described in section 3.1.3.4 and is illustrated with the help ofa Mathscript in figure 4.29 where Match input is the Test sequence and the reference sequence isthe set of words. The distance for every word is put in an array and the index of the minimumvalue for each set of words are saved in an integer array as illustrated in figure 4.28. Threeconditions are then required to find out the best match. First, the minimum distance value forthe best matching word should be below a threshold value. Second, the difference between theminimum distance value of the other words should not be too close to the minimum distancevalue of the best matching word, with a difference set to at least 55. Finally, the DTW shouldhave concluded in a minimum distance value for the same word at least three times. If any ofthese requirement are not fulfilled, a default value will be sent out. Each word is represented byan index where 0 = Bosse, 1 = Red light, 2 = Green light, 3 = Blue light, 4 = Yellow light and5 = default. The best matching index will go into a case where the command function for thesewords are written. The block diagram for this process and these conditions are illustrated inFigure 4.30.


Figure 4.28: Block diagram of the code where the set of words are tested against the utteredword in Match input creating the Distance matrix and Word index array

Figure 4.29: Calculation of the cost matrix used in DTW


Figure 4.30: Block diagram of the code for the requirements of the best match

Chapter 5

Results & Discussions

A picture presenting both the systems integrated together are presented in figure 5.1. Thesystems can both be regarded as successfully performed. The first system can handle differentcommunication protocols such as WIFI, I2C, Zigbee and UART and control several features byvoice with a near 100% success rate. Alexa always recognizes a command correctly, howeversometimes it responds a little slower. The only time it has failed has been due to WIFI connectionproblems on the personal network. All commands like turn on/off the lamps, changing colors,dim light, voice feedback of temperature and information on the LCD perform as expected.

1

2

3

4 5

Figure 5.1: An overview of the system. 1. The radio module, which transmits Zigbee signals tothe smart lamps. 2. LED diodes, which indicates when a a word could be said to the customizedsystem. 3. The LCD screen which can display temperature and Lux values on command withwith the system based on Amazon Alexa. 4. A light sensor used for light level display on theLCD screen as well as in the light control system. 5. A temperature sensor used for temperature

display on the LCD screen and as voice response of temperature from a speaker.

The Light control feature in the first system is however a bit sensitive and hard to tune due tothe big variations of light intensity from external sources. Two examples presented in figures 5.2& 5.3 show the same tuning under different light conditions.

59


Figure 5.2: A chart of the The PID regulator in the light control system, which changes thelight intensity between different Lux values in range 200-800 Lux. The Y-axis represent the Lux

values and the X-axis the time, where the scale is 1/10 s. Each set point is also marked.

Figure 5.3: A chart of the The PID regulator in the light control system, which changes thelight intensity between different Lux values in range 500-1500 Lux. The Y-axis represent theLux values and the X-axis the time, where the scale is 1/10 s. Each set point is also marked.

The customized system does work well if the user has recorded the dictionary, but fails moreoften if another person has recorded it instead. A test of the system was performed by bothDaniel and Tim, and the success rate is presented in Tables 5.1 and 5.2.


Command Success rate [%] Sample size

Bosse 98 50Red light 98 50Blue light 92 50Green light 96 50Yellow light 96 50

Table 5.1: Success rate for the customized system when Daniel is speaking

Command Success rate [%] Sample size

Bosse 90 50Red light 86 50Blue light 84 50Green light 66 50Yellow light 38 50

Table 5.2: Success rate for the customized system when Tim is speaking

A recorded audio sequence of the command ”Blue Light” is shown in Figure 5.4, where theupper graph represents raw audio input recorded in five seconds. The middle graph representsfiltered audio processed through a FIR-filter and the lower graph represents the 0.51 secondslong utterance extracted from the filtered audio input.

Figure 5.4: Unfiltered & filtered audio input signals and utterance sequence of the command”Blue Light”


The same utterance as in figure 5.4 is analyzed in figure 5.5 which represents the matchingresults compared to the prerecorded dictionary. It is in the distance matrix possible to see thatthe least distance in each row is the third element (beginning at zero), which is also presentedin ”Word index”. These values are all under the threshold value set to 280 (Requirement onefulfilled). ”Mean index” indicates that no other word is close enough, which is a value belowthe ”Minimum” value 196 + 55 (requirement two fulfilled). From ”Word index”, it is possibleto see that all six sets had the least distance at index three, giving a confidence of six which isabove the threshold of three (Requirement three fulfilled). Since all requirements are fulfilled,the output is index three as seen in ”Best Match” representing ”Blue Light”.

Figure 5.5: Matching results and data from a run when the command ”Blue Light” was said

Chapter 6

Conclusions & Future work

From this project it is seen that both the systems perform on a high standard with over 90%success rate (for Daniel in the customized system) for all voice commands which has to be seenas a successful project. The system using the amazon alexa shows the possibility of connectingthird party devices and application to the myRIO via both the internet and hardwiring. Thepossibility to connect devices over the internet to the myRIO is especially good in this projectsince it makes it possible to use the power of speak recognition devices such as amazon alexa,with the setup of a webserver and IFTTT the system is flexible in the fact that IFTTT alsosupports Google Home so this system would be able to work with the Google Home voicerecognition device as well. With the use of the radio module for Zigbee communication thesystem is also complex in the regard that it uses WIFI, I2C, UART and Zigbee communication.With the UART connection to the radio module and the fact that the system is standalone itgreatly reduces the usage of wires which also was one of the goals setup for this project i.e. astandalone IoT system that wirelessly control devices.

The customized voice recognize system is obviously more sensitive for dialects and differentvoices than the system which uses the amazon alexa since the library that compare the uservoice command is much smaller than what the alexa has to compare the user command to.The system does however perform quite well despite its limited word library and can recognizemultiple word commands which was a goal that was set up in the beginning of the project. Thecustomized system is like the system with the alexa a standalone system and uses the radiomodule to transfer information to the Philips Hue bulb which reduces the use of wires andshows the flexibility of the position of the Hue bulb i.e. it is possible to put the Hue bulb inwhatever room the user choses without the need for long wires which is of great importance inan IoT/Smart home setup. In its current form the system is most suitable for one user sincethe library only consist of data from one person (Daniel). The choice of only using Danielsrecordings in the library was intentional. Due to the fact that creating a large library withinformation of multiple persons (male, female, old, young etc.) would take a very long time, thesystem is designed to fit one person i.e. Daniel. The system is then tested to see how well itperforms with another user i.e. Tim. It is seen that the system is able to function fairly welleven when Tim is speaking and it is hypothesized that if more voice recordings of Daniels voicein the library would improve the system performance for both users.

Overall both systems performed well and the goals set before the project has been fulfilled i.e.being able to connect the Alexa to the myRIO wirelessly and control a smart lamp wirelesslyand control other devices via I2C. For the system with the alexa more commands and functioncould always be made for future work. More Philips hue bulbs could be added to the system and

63


although it was not tested in this project the system is not limited to only use the Philips Huebulbs but could in theory use another brand of smart lights using Zigbee communication. Morefunctions could be added for example a function depending on time i.e. a command such as ”turnon light 1 at 07:00” which could be used as an alarm. The customized system would obviouslyneed to expand its library for future work with both different words and with recordings ofdifferent people to be more precise and less sensitive if multiple people said the same word. Alarger library would eventually also need some implementation of predictability, for example aHidden Markov Model to speed up the system, since it otherwise will be too slow. One of thelimitations is also that the customized system is depending on its surroundings, meaning thethe same success rate may not be obtained if the tests had been done in a different room forexample due to different acoustic etc. This all comes back to the library size which would be thenumber one improvement to be made.

Appendix A

Yes

No

Is the value of the variable same as

last iteration?

Read commandvariable

Put value in Statequeue

Process 1

Figure A.1: A chart flow of process 1.

65

Appendix A. Flow charts 66

No

Extract value fromstate queue

Write commandstring to radio module

Yes Is the command function noncontinuous?

Put value in Process 3or 4 queue

Extract radio modulecommand string

Process 2

Figure A.2: A chart flow of process 2


Process 3

No

NoYes Is the commandassociated with

Light 2?

Extract commandfrom Lightshow queue

Extract the radiomodule commandstring from colourarray at index =

current shift registervalue

Write the commandstring to the radio

module

Increment the currentshift register value

Put "Lightshow" inLightshow queue

Put the command inState queue

Set shift register valueto 0

Yes

No

Is the current shift register value +1

equal to 4?



Process 4

No

Yes

Is the command"Temperature off"?

Extract command fromTemperature queue

Flush queue, sendempty string to displayand add command

variable to Temperaturequeue

Get temperature valuefrom Temperature ReadVI and send to LCD

Display VI

Extract command fromLight sensor display

queue

No

Yes

Is the command "lightsensor off"?

Get light sensor valuefrom light sensor readVI and send to LCD

Display VI

Flush queue, sendempty string to display,add command variableto Lightsensor display

queue

Extract command fromLight control queue

YesNo Is the command "Lightcontrol off"?

Flush queue and addcommand variable toLight control queue

Get Setpoint value fromSetpoint variable

Extract command fromTemperature speak

queue

NoYes Is the command "Temp speakon"? Do nothing

Get temperature valuefrom Temperature ReadVI and send to Speaker

VI

Labview PID VI withSetpoint variable,current light sensorvalue and PID

coefficients as input

Dim Level Control VI

Flush queue and addcommand variable value

to State queue

YesNo Is the command variable associated with

light 1?

Add "LightControl on"to LightControl queue



IFTTT

Send webrequest toHTTP method VI

Update Networkvariable with the

command

Alexa

Voice command

Yes

Process 2

No

Process 3

Yes

Is the command associated with

a sensor

Process 4

Yes

NoIs the command"Lightshow"?

Is the command function noncontinuous?

Process 1

Figure A.5: A chart flow of the whole system

Appendix B

Phonetic Alphabet Example word Phonetic letters of example word

AA odd AA DAE at AE TAH hut HH AH TAO ought AOTAW cow K AWAY hide HH AY DB be B IYCH cheese CH IY ZD dee D IYDH thee DH IYEH Ed EH DER hurt HH ER TEY ate EY TF fee F IYG green G R IY NHH he HH IYIH it IH TIY eat IY TJH gee JH IYK key K IYL lee L IYM me M IYN knee N IYNG ping P IH NGOW oat OW TOY toy T OYP pee P IYR read R IY DS sea S IYSH she SH iyT tea T IYTH theta TH EY T AHUH hood HH UH DUW two T UWV vee V IYW we W IYY yield Y IY L DZ zee Z IYZH seizure S IY ZH ER

Table B.1: The phonetic alphabet.

70

Bibliography

[1] ZigBee Alliance. “ZigBee Cluster Library Specification Revision 6”. In: (2016).

[2] Ruhul Amin and Zahida Rahman. Bangladeshi Dialect Recognition using MFCC, Delta,Delta-delta and GMM. East West University Dhaka, Bangladesh, 2015.

[3] Class MelFrequencyFilterBank, Java. url: http://www.gavo.t.u- tokyo.ac.jp/

~kuenishi/java/sphinx4/edu/cmu/sphinx/frontend/frequencywarp/MelFrequencyFilterBank.

html (visited on 02/11/2019).

[4] Dataflow Programming. url: http://www.ni.com/getting-started/labview-basics/dataflow (visited on 02/07/2019).

[5] Rakesh Dugad & U.B. Desai. A TUTORIAL ON HIDDEN MARKOV MODELS. IndianInstitute of Technology, Technical report No. : SPANN-96.1, 1996.

[6] Dummys guide to MFCC. url: https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd (visited on 02/04/2019).

[7] Dynamic time warping time series analysis. url: https://sflscientific.com/data-science-blog/2016/6/3/dynamic-time-warping-time-series-analysis-ii (visitedon 02/07/2019).

[8] Electronic design. url: https://www.electronicdesign.com/what-s-difference-between/what-s-difference-between-ieee-802154-and-zigbee-wireless (visitedon 01/30/2019).

[9] How to do speech recognition with deep learning. url: https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-

learning-28293c162f7a (visited on 02/07/2019).

[10] IEEE Computer Society,IEEE Standard for Local and metropolitan area networks - Part15.4: Low-Rate Wireless Personal Area Networks (LR-WPANs)Sponsored by the LAN/MANStandards Committee, New York. 2011.

[11] E.A. Lee J.C. Jensen and S.A. Seshia. An Introductory Lab in Embedded and Cyber-PhysicalSystems v.1.70. http://leeseshia.org/lab, 2014.

[12] M. Begam L. Muda and I. Elamvazuthi. Voice Recognition Algorithms using Mel FrequencyCepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. JOURNALOF COMPUTING, VOLUME 2, ISSUE 3, ISSN 2151-9617, 2010.

[13] LabVIEW fundamentals, National Instruments, 2005. url: http://www.vyssotski.ch/BasicsOfInstrumentation/LabVIEW%20Fundamentals.pdf (visited on 02/07/2019).

[14] Tu G-H et al. Lei X. The Insecurity of Home Digital Voice Assistants – Amazon Alexa asa Case Study. Michigan State University, East Lansing, MI, USA, 2018.

71

http://www.gavo.t.u-tokyo.ac.jp/~kuenishi/java/sphinx4/edu/cmu/sphinx/frontend/frequencywarp/MelFrequencyFilterBank.html



http://www.ni.com/getting-started/labview-basics/dataflow

http://www.ni.com/getting-started/labview-basics/dataflow

https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd

https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd

https://sflscientific.com/data-science-blog/2016/6/3/dynamic-time-warping-time-series-analysis-ii

https://sflscientific.com/data-science-blog/2016/6/3/dynamic-time-warping-time-series-analysis-ii

https://www.electronicdesign.com/what-s-difference-between/what-s-difference-between-ieee-802154-and-zigbee-wireless

https://www.electronicdesign.com/what-s-difference-between/what-s-difference-between-ieee-802154-and-zigbee-wireless

https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a



http://www.vyssotski.ch/BasicsOfInstrumentation/LabVIEW%20Fundamentals.pdf

http://www.vyssotski.ch/BasicsOfInstrumentation/LabVIEW%20Fundamentals.pdf

Bibliography 72

[15] Roberto Pieraccini & David Lubensky. Spoken Language Communication with Machines:The Long and Winding Road from Research to Business. Springer-Verlag Berlin Heidelberg,2005.

[16] Johanna Nichols. The origin and dispersal of languages: Linguistic evidence. In NinaJablonski and Leslie C. Aiello, eds., The Origin and Diversification of Language, pp.127-70. (Memoirs of the California Academy of Sciences, 24.) San Francisco. CaliforniaAcademy of Sciences, 1998.

[17] J. Price. DESIGN OF AN AUTOMATIC SPEECH RECOGNITION SYSTEM USINGMATLAB. Progress Report for: Chesapeake Information Based Aeronautics Consortium,2005.

[18] Taqi Mohiuddin Rick Bitter and Matt Nawrocki. LabVIEW: Advanced ProgrammingTechniques. CRC Press, 2000.

[19] H. Sakoe and S Chiba. Dynamic programming algorithm optimization for spoken wordrecognition. IEEE Trans. on Acoust. ASSP 26, 43-49, 1978.

[20] L. Salhi and A Cherif. Robustness of Auditory Teager Energy Cepstrum Coefficients forClassification of Pathological and Normal Voices in Noisy Environments. The ScientificWorld Journal, Article ID 435729, 8 pages, 2013.

[21] UART Explained. url: https://developer.electricimp.com/resources/uart (visitedon 02/07/2019).

[22] USER GUIDE AND SPECIFICATIONS, NI myRIO-1900. url: http://www.ni.com/pdf/manuals/376047c.pdf (visited on 02/06/2019).

[23] J. Valdez and J Becker. Understanding the I2C Bus, Application Report, SLVA704. TexasInstruments, 2015.

[24] Wikipedia wifi. url: https://en.wikipedia.org/wiki/Hotspot_(Wi-Fi)#/media/File:WI-FI_Range_Diagram.svg (visited on 02/07/2019).

[25] Mark Gales & Steve Young. The Application of Hidden Markov Models in Speech Recogni-tion. Foundations and Trends in Signal Processing, Vol. 1, No. 3 195–304, 2007.

https://developer.electricimp.com/resources/uart

http://www.ni.com/pdf/manuals/376047c.pdf

http://www.ni.com/pdf/manuals/376047c.pdf

https://en.wikipedia.org/wiki/Hotspot_(Wi-Fi)#/media/File:WI-FI_Range_Diagram.svg

https://en.wikipedia.org/wiki/Hotspot_(Wi-Fi)#/media/File:WI-FI_Range_Diagram.svg

Application of LabVIEW and myRIO to voice controlled home ...1301398/FULLTEXT01.pdf · service known as Alexa (developed by Amazon), ... The other system is more focusing on myRIO

Documents