1. Introduction
More than one billion people worldwide—almost 15% of the population—live with one or more disabilities, according to the World Health Organization [
1]. These disabilities may present early in childhood or develop later in life, such as impaired hand function as a result of stroke [
2]. Every day, people with disabilities significantly struggle to control their home appliances. Therefore, traditional homes should be transformed into “smart homes” to improve the quality of life of those with disabilities. Over the last decades, the objective of IoT technology has been to allow communication between devices without the need for human involvement [
3]. However, currently, IoT technology has been integrated with home devices to enable these devices to be controlled remotely through the internet [
4]. The term IoT describes a network of physical objects, or “things”, that have sensors, software, and other technologies built into them to enable the connecting and sharing of data with other systems and devices over the internet. These devices include light switches that respond to turn-on and off commands or thermostats that change the indoor temperature to reduce energy consumption. Different authors present several solutions for help the disabled person to control the devices remotely via IoT based on the user voice or the smartphone graphical user interface (GUI). Mittal et al. [
5] propose a smart home automation system (SHAS) to control home devices based on user voice commands. The system is designed to control five groups: light, access, fan, utility, and safety. Voice Recognition Module V3 is used to recognize the voice commands, and then the Arduino microcontroller compares these commands with the previously stored command templates to perform the desired action. Abidi et al. [
6] present a voice-control smart home automation system to control four home devices: a TV, LED, fan, and Compact Fluorescent Lamp (CFL). The system architecture consists of an Android phone, Arduino mega, GSM SIM900A, Bluetooth module, and ultrasonic sensor (HC-SR04). A mobile application transforms the voice command into text, and then a Bluetooth module (HC-05) transmits this text to the Arduino Mega to control the dedicated device. The ultrasonic sensor (HC-SR04) is also used to provide home security by detecting any movement and sending a message to the user’s phone via GSM SIM900A to convey that there is movement in the home.
Repeatedly, Mayer et al. [
7] suggest home automation architecture based on IoT technology to help people with disabilities who have trouble using a remote control or a smartphone to control their home devices. First, the voice command is recognized and processed using a speaker recognition system (ASR) on the ASR remote server. Afterwards, the ASR sends an XML file to the IoT gateway, which contains the command sent by Xbee to the appropriate device. Moreover, Saha et al. [
8] have developed a home automation system that lets people use voice commands to control their electronic devices. The system uses an Android phone to capture the user’s voice, which is then sent to an Arduino board via Bluetooth. The Arduino board processes the data and controls the devices accordingly through relays. The system also has an auto-mode that automatically utilizes sensors to automatically control lighting and temperature. Ismaeel et al. [
9] present a home automation system that aims to control the electronic devices through voice or a webpage. If the user uses the voice application, the system receives it over microphone/mobile and employs a Raspberry Pi development board for processing. The user’s voice is transcribed to text using the online speech-to-text interface (Wit). Suppose the user uses the web page to turn on or off a device. In that case, the content script text is sent to the Raspberry Pi via the Apache remote server to control the general-purpose input-output (GPIO) pins to activate or deactivate the relays connected to the devices.
Besides, Jat et al. [
10] introduce the application of voice activity detection (VAD) in home automation systems to control home appliances through voice commands. VAD detects human speech using the microphone and activates a specific action. The system is implemented on a Raspberry Pi 3 Model B+, and the PocketSphinx speech recognition system captures the user’s voice, suppresses the noise, and converts the analyzed speech to text. The transcribed text is then processed to determine if the text includes identified commands. According to the command, the system applies the action to the dedicated device through infrared, wireless LAN, or Bluetooth. Additionally, Alrumayh et al. [
11] present a context awareness for voice assistants system (CANVAS) for use in smart homes with multiple occupants. The system consists of two major phases: a configuration webpage and a context-aware algorithm. The configuration webpage assists the owner of the home in configuring different rules to access the home devices by a drag-and-drop GUI interface. The context-aware algorithm receives the device commands from a human voices assistant to determine whether to discard or execute the user commands according to previous rules. Khan et al. [
12] introduce a web application known as HAS (home automation system) to control many devices from their smartphones. This application uses Google Assistant to convert speech to text for controlling the appliances with the user’s voice. It also monitors the devices’ daily, weekly, and monthly power consumption. Mustafa A. Omran et al. [
13] present a Wi-Fi-based home automation system for a smart home (SH) prototype. The system uses Raspberry Pi 3 Model B+ and Arduino Mega 2560 and is controlled by a smart IOS system via the Blynk app.
Most of the methods mentioned above design their systems based on command templates; if the user uses an unstructured command, the system will not understand it. Therefore, several authors add natural language processing (NLP) to the systems to process and understand various types of commands. Intelligent personal assistant (IPA) [
14] systems have recently gained popularity as a way to help individuals with many of their everyday tasks. This personal assistant system analyzes human speech based on NLP and produces a set of direct instructions. Examples of well-known IPAs are Google Assistant from Google [
15], Siri from Apple [
16], and Alexa from Amazon [
17]. These well-known IPAs set alarms, set ringtones, close applications, make phone calls, order food online, play music, etc., with differing accuracy, according to each company. However, these IPAs do not work with all home devices. These IPAs work with smart devices designed to be controlled by their applications. Many authors integrate these IPAs with a control circuit to control any device. Uma et al. [
18] introduced a home application system to turn on or off home devices via voice or text based on Google Assistant. When the user is not physically present in the environment, he can schedule the state of the appliances and be given the option to turn them on for a predetermined period of time. The user uses the mobile application to send his voice command through Google Assistant. The voice command is fed to the firebase, which the NodeMCU can fetch to turn on or off the home devices. The application function is implemented using Node-RED technology, deployed in a Dialogflow account, and integrated with Google Assistant.
Furthermore, Kumer et al. [
19] introduced a voice-based smart home automation system using IFTTT, Adafruit cloud, Ubidots IoT dashboard, and Google Assistant to efficiently control energy consumption. The voice commands are received by Google Assistant and sent to IFTTT. These commands are then sent to the microcontroller via Wi-Fi to turn the device on or off, according to the user’s command. The microcontroller updates the device status on the Ubidots dashboard, and feedback is sent to the user concerning this status. Putthapipat et al. [
20] develop a home automation system based on the PiFrame framework. The system goal is to create an open platform for researchers to extend the features and develop their designs, similar to products such as Google Home and Amazon Alexa, but with more flexibility. The system hardware consists of a Raspberry Pi, a microphone, a speaker, and relays to control the home devices using voice commands. Google Speech API and Wit.ai are used to understand the user’s command and communicate with the home appliances. The system stores the home device’s current status on the Google Cloud Platform Cloud SQL through the Node.js application on Raspberry Pi. The response to the user is generated by translating the text to speech using Google Translate. Tiwari et al. [
21] proposed a home assistant system that can be controlled and monitored using voice commands. In this system, the user can control the home devices and sensors from any cloud-connected platform through speech, text, or a mobile application. The system uses advanced speech recognition techniques, such as Mel-Frequency Cepstrum Coefficients (MFCC), with other features to create the feature vector. Vector Quantization (VQ) and Principal Component Analysis (PCA) are used for feature vector dimensional reduction and a Gaussian Mixture Model (GMM) to classify the speakers. The system uses cloud services IBM’s Bluemix and Google’s cloud service for speech-to-text conversion. Mehrabani et al. [
22] presented a personalized speech recognition system using dynamic hierarchical language modeling for controlling customizable devices in smart home applications.
The previous methods integrate the IPAs with control circuits to benefit from the NLP of IPAs to increase command understanding accuracy and control any device type. However, while these methods can understand structured and semi-structured voice commands, they still cannot understand unstructured commands. Moreover, these methods depend mainly on the internet to take actions because they save commands on firebase or use the mobile phone as a GUI to send voice commands.
This paper proposes a new IPA system integrated with IoT, called IRON, to help the elderly and disabled people control their home devices remotely through IoT, based on their voices. The IRON system is designed from three modules: a speech-to-text module, a text analysis and classification module, and a command execution module. Instead of mobile phones, the IRON detects the user’s voice from different microphones spread around the home. This option helps disabled people operate the devices from any place inside the home, in case the phone is lost or the internet is disconnected. The speech-to-text module transforms the incoming voice into text. IRON possesses two ways to turn voice commands into text: if the internet is available, it uses the Google Cloud API, and if it is not, it uses a speech recognition library. Thus, IRON can work online or offline. In the text analysis and classification module, the transcribed text passes through two main steps for analysis. First, IRON splits the text into a list of tokens by applying NLP to the transcribed text to search for the device name, according to the list of predefined devices. If the device is not included in the devices list, the user can ask the IRON to add a new device to the devices list and configure it without recoding the IRON. Second, IRON takes the device’s name out of the command and determines if the statement is a question or an order. If the statement is a question, the IRON answers regarding the device status and break; otherwise, IRON considers this statement as a command and sends the rest to the pre-trained machine learning model to classify whether this command is a positive or negative statement. According to the statement type, IRON executes the required action on the device through the GPIO pins. The IRON system is designed to overcome the cons of the previous methods of understanding unstructured commands by adding a machine learning algorithm with NLP to classify these commands as positive or negative for turning on or off a device. The main contribution of this research can be abstracted as follows:
An IRON system that helps the elderly and impairment to control their home devices through their voice is proposed.
The dataset consists of 3000 normal, negative, and unstructured commands to control 100 devices.
Natural language processing is applied to the text generated from the google speech API for splitting the text into tokens for further classification.
A machine learning algorithm is included in IRON for classifying the commands as positive or negative.
Multi-microphones are distributed in different locations in the home to ensure that the elderly and disabled can access IRON from their locations.
New devices can be added and reconfigured by the impaired person’s voice without re-coding the IRON.
The IRON system is designed to work online or offline and to turn off, on, or adjust a device range as fan speed.
The IRON system is secured by requesting a password to start the controlling process.
2. System Architecture and Design
This research proposes an IPA system based on IoT, called IRON, to assist people with disabilities in controlling their home appliances based on their voices.
Figure 1 shows the proposed IRON system architecture. This figure shows that the IRON hardware system consists of a microphone, speaker, Raspberry Pi, Wi-Fi module, and relays. When the user opens the IRON, it introduces itself and tells how to interact with it. IRON is usually in sleep mode. Once the user says “IRON”, it will respond with a “hello” message and then ask the user about what he wants. The user speaks the command, which is then analyzed to control the dedicated device. Algorithm 1 presents the summarized steps for interacting with the IRON system.
Algorithm 1: The IRON system procedure |
Input:user voice command Output:device control Initialization:IRON introduces itself and informs the user instructions about how interact with it. While true
Wait in sleep mode #microphone ready to receive voice.
If the user says “IRON”: #user knows this from instruction.
IRON say: hello message and ask human what he need to do.
Receive Order ()
Speech to text ()
Analysis Text ()
Execute the command ()
return feedback ()
Else:
Check if the user needs to end IRON or not.
End End |
Figure 2 introduces the IRON major modules and flowchart. The IRON system is designed from three modules: speech-to-text module, text analysis and classification module, and command execution module. In a speech-to-text module, Google API receives the user’s voice command through the microphone and converts it to text. The text analysis and classification module receive the text and analyzes it to extract the device name and execute the required command. According to the device and the command, the command execution module applies an action on that device through Raspberry Pi. At the same time, the speaker responds and informs the user that the action is completed. The details of each module of the IRON are described in the following subsections.
In this module, the person’s commands are converted from speech to text using the Google Speech to Text Cloud API. Google Speech to Text Cloud API transcribes the speech file using the most advanced deep learning neural network algorithms for automatic speech recognition (ASR) and returns the text statement [
23]. Google Speech to Text Cloud API is one of the simplest methods for recognizing speech and can analyze up to 1 min of voice data.
This module is responsible for understanding the correct command from the text generated by the Google API and then confirming it with a human to execute the desired action. Because of the uncertainties in human language, it is extremely challenging to create software that correctly ascertains the text’s intended meaning, so NLP is used in this module for manipulating and recognizing the text. NLP deconstructs the text into small units to assist the computer in understanding the ingesting text. Different libraries and algorithms are proposed for NLP, such as the Natural Language Processing toolkit (NLTK) [
24], Apache OpenNLP, Stanford CoreNLP, Pattern, GATE, and Spacy.
Table 1 compares the most popular NLP techniques in terms of the programming language and the processes applied to the text statements. NLTK is considered the most robust and is the mother of all-natural language processing tools; thus, IRON utilizes it for analyzing the text. When IRON receives an input statement, tokenization, stemming, and stop word removal processes from the NLTK library are applied to the text.
Tokenization splits the text into smaller units called tokens to build a list representing all the vocabulary in the text. The tokens can be words, numbers, punctuation, stop words, and possessive markers. Moreover, the stemming step extracts the root word to reduce inflected words to their word base or root form [
25]. It is a well-known fact that many words share morphological characteristics and semantic properties, making these words suitable for use in information retrieval applications. The stemmer removes the phrases to form the root term, such as “consider”, which is the root term of consideration. To reduce processing time and supply more correct results from the enormous corpus of documents, several types of stemmers are employed to break down terms into their root terms. Searching is then performed on these terms rather than the real terms. In the stop words removal step, all the stop words encounter significant noise. These stop words, such as “the”, “a”, “an”, and “in”, do not add additional information, and removing them enhances the execution of the information retrieval system and text analytics [
26].
After analyzing the text, splitting it into tokens, and removing the noise, IRON searches in the tokens list regarding the device name to know which device the user wants to control. When IRON determines the dedicated device, it should start to open or close the device according to the statement type. The statement type is deduced based on a trained classifier to recognize if the statement is positive or negative to switch on or off the device. The proposed IRON uses a logistic regression classification algorithm [
27,
28] to classify the rest of the tokens into positive and negative statements. The logistic regression classifier is an algorithm proposed for binary classification problems. It assesses the likelihood of the target variable, taking either 0 or 1. The classifier operates based on logistic regression, a statistical technique that makes predictions through the relationship between the dependent and independent variables. The logistic regression classifier model can be trained through maximum likelihood estimation or stochastic gradient descent methods. Additionally, the classifier includes regularization options L1 and L2 to counteract overfitting and enhance the model’s generalization ability. The logistic regression equation is a mathematical formula that represents the relationship between the dependent and independent variables. It takes the form of a logistic function and can be written as:
where
p is the predicted probability of the dependent variable taking the value 1, and
z is a linear combination of the independent variables and their coefficients. The coefficients are estimated during the training phase through optimization algorithms, such as maximum likelihood estimation or gradient descent. The resulting equation is then used to make predictions on new data. In the case of classifying positive and negative statements, the dependent variable would represent the statement, with 1 representing positive and 0 representing negative. The independent variables would be features of the statement, such as words used, tone, or other relevant information. The logistic regression classifier would estimate the impact of each feature on the statement through the coefficients for each feature. The classifier would then use the estimated coefficients and features of a new statement to determine a predicted probability of the statement being positive or negative. The prediction can be a thresholder, with a value greater than 0.5 considered positive and less than 0.5 considered negative.
In this module, IRON executes the desired action via Raspberry Pi to turn on or off the dedicated device. The devices are connected to the Raspberry Pi according to their location from the main circuit. If the devices are close to the main circuit, they should be wired together; otherwise, they should be wirelessly connected using the Wi-Fi module ESP8266, as shown in the
Figure 3a,b, respectively. When IRON sends an activated signal to a device, this signal is transmitted through the ESP8266 to turn on or off the relays which are connected to the device. The ESP8266 is selected as a communication module because it is a low-cost Wi-Fi microcontroller, and it supports both TCP/IP and Wi-Fi protocols. Furthermore, it covers a range of up to 400 m in the open air, but it can be reduced to a few meters when obstacles such as walls or furniture are present.
Figure 4 shows the detailed system flow chart, which presents the detailed steps of the proposed IRON system as follows:
Step 1: The user asks IRON to switch on or off a device.
Step 2: IRON receives the user voice command through high-quality BY-MA microphones distributed in different locations in the home to assist the impaired person in reaching it quickly.
Step 3: IRON transcribes the user’s speech-to-text using the Google Speech-to-Text API, if the internet connection is available, or a speech-to-text conversion library, if the internet is disconnected.
Step 4: The transcribed statement is then processed using NLTK, creating a list of tokens. IRON searches the tokens list for the device name to determine which device the user wants to control. If the device name exists, IRON will continue; otherwise, IRON asks the user to speak the device name again.
Step 5: IRON checks the tokens list to determine if the statement is a question about the device status or a command. If the statement is a question, IRON responds with a proper answer about the device status and break; otherwise, IRON treats this statement as a command.
Step 6: After considering the statement as a command, the IRON removes the device name and classifies the rest of the tokens to a positive or negative command using logistic regression. If it is a positive command, Raspberry Pi sends “on” to the device through the GPIO pin; otherwise, it sends “off” to the device to execute this action.
Step 7: IRON generates a response text that matches the user text and uses the offline text-to-speech conversion library to convert this response into voice. IRON plays the generated voice on the YST-1052 speaker, which is connected to Raspberry Pi via USB and an audio jack to inform the user that action is applied to the device, such as, “OK Sir, living room light is on now”.
Over and above, IRON adds a feature for the user to add a new device via voice and configure it, as shown in
Figure 5. First, the user connects the new device to the relay module unit. The user then asks IRON to turn on or off this new device. IRON checks whether this device is in the stored device list or not. If the device is not found in the devices list, IRON asks the user to confirm adding this device to the list. After that, IRON requests the device configuration from the user to add it to the file, as shown in
Figure 6.