Smart-Object-Based Reasoning System for Indoor Acoustic Proﬁling of Elderly Inhabitants

: Many countries are facing signiﬁcant challenges in relation to providing adequate care for their elderly citizens. The roots of these issues are manifold, but include changing demographics, changing behaviours, and


Introduction
The number of people aged over 60 is expected to quadruple within the next two decades [1], placing existing care provision capabilities under strain and driving the need to research into finding more effective and efficient ways of delivering care.Research [2] has shown the vast majority of elderly people prefer to stay in their own homes whenever possible, rather than living in residential care or nursing homes.This view has further been reinforced with the emergence of the COVID-19 pandemic, which highlighted the vulnerabilities of residential care homes [3].Thus, the number of people who will need assistance to remain living independently in their own homes is expected to rise sharply in the coming years.It has long been recognised that technology can play a vital part in supporting people's desire to maintain an independent life as they grow older.For example, sensors and actuators, together with intelligent software, can augment everyday objects to provide additional functionalities to support those in need of care.Products that combine embedded computers, sensors/actuators, and intelligent software are referred to as 'smart objects', and form a fundamental building block of the Internet-of-Things (IoT) [4].Technological advancement in the form of machine learning [5] has enabled these smart objects to work collaboratively, as a whole, creating intelligent environments [6] that can learn how to adapt their collective behaviour to meet the needs of their occupants.As a result, IoT can be used to foster a uniquely supportive environment, which is particularly useful for elderly care settings, such as assisted living systems, where the environment can be aware of and sensitive to the particular needs of its occupants [7][8][9].
While much thought has gone into creating effective financial and care models, a significant ongoing research challenge is how best to lever advances in smart technologies, to develop context-aware systems which promote the well-being of elderly people by improving their day-to-day living experiences.Towards those ends, monitoring a person's physical state is an effective approach, as it can establish the functional status of an individual which, in turn, can be used to determine the level of assistance required, with minimum intervention [10][11][12].In systems that monitor and gather behavioural data about people, inference techniques are commonly used [10,13], as no prior knowledge about the individuals is required.While applications designed to help older people to perform their day-to-day tasks have increasingly gained recognition [14,15], the vast majority of these applications rely on "active monitoring" approaches to deliver key information.Generally, monitoring is classified as either active or passive.Active devices require physical user interaction to function, while passive devices monitor the local environment in autonomous fashion.Active devices are regarded as being intrusive, interfering with the user's normal routine in order to function, whereas passive devices are seen as non-intrusive, since they place no physical or mental load on the user.Moreover, active devices, such as an alert button, have the disadvantage of requiring the user to be physically conscious to operate it, which, depending on the status of the individual at the time when assistance is required, may not be possible.While active participation can offer benefits within a tele-medical monitoring system [16,17], we argue that using passive or ambient environmental monitoring has the potential to develop a comprehensive assisted living scenario, in a less intrusive and less costly way.This can be achieved by combining Artificial Intelligence (AI) with IoT to create an environment that responds to the occupant's needs without reliance on their manual input, which overcomes issues such as incapacity, by exploiting the capability of passive seniors [7].Developments in ambient assisted monitoring include fall-detection and 'lack of mobility' alerts which, for example, involve using passive infrared (PIR) motion sensors, which work in tandem with accelerometers built into furniture and flooring [13].Other techniques, such as ubiquitous sensing, which are driven by a specific knowledge-driven framework, have also shown promising results [18].Thus, it has been recognized that pervasive passive monitoring is a good way forward to enable elderly people to sustain greater independence and enjoy a better quality of life [19,20].The pervasive nature of noise in the modern domestic environment means that monitoring ambient sound levels offers good potential for the detection of abnormal patterns of behaviour.Furthermore, since this approach needs only a few strategically placed sound sensors around a person's home, it is easy and cheap to install.However, while this is a most promising approach, there is a sparsity of research on this topic with no off-the-shelf solutions being available.Furthermore, the few existing examples (discussed in Section 2) generally have built-in transceivers, with many being designed to be carried by users, which limit their practicality.
An overview of the most common unobtrusive wellness monitoring methods is given in [21], but to be accurate, they need to be backed by ontological-based algorithms [22], especially when the activities to be recognised are similar [23], issues that are even more critical when designers have access to only a reduced training data set.
Thus, investigating these issues with the aim of overcoming existing barriers (especially the ones given by the lack of training data) is the main focus of the work reported in this paper; more precisely, whether sound (acoustic signals) can be used as the primary ambient source to create a relatively low-cost IoT system for identifying abnormal situations in domestic homes where elderly people may need external assistance.
Clearly, the focus of any research project (in our case, the AI methods) will contain bespoke and novel methods and technologies.However, in research that is aimed at practical deployment or commercial exploitation, for the findings to be convincing to a broader base (e.g., the industry), it is helpful if supporting technologies and standards utilised are ones which are readily available to those that might seek to exploit and deploy the work.Thus, in our work, for such secondary platforms, we sought to build on widely used industry standards.For networking, we used the IEEE 802.15.4 protocol, which is the most popular wireless sensor network standard.Likewise, for Middleware, we chose the popular open (and free) standard LinkSmart, developed by the FP7 EU project EBBITS.In this way, the debate on our results (theoretical and practical contributions) was better able to be appreciated by the wider ambient intelligence community.The system functionality requirements were gathered by consulting residents in a residential home for the elderly operated by a UK local government authority.Initial experiments were conducted in a domestic home to assess the capability of the sound sensors for gathering suitable data (the distribution of sensors and its effect on live/dead spots) in areas of interest (e.g., kitchens) based on typical events (e.g., walking, boiling a kettle) that relate to care of elderly people.In this work, we employed a Neural Network classifier combined with a context reasoning system to analyse and classify the acoustic data.In addition, we employed a goal-based approach to determine whether an alert should be triggered when an abnormal event was detected.
The remainder of this paper is structured as follows: Section 2-Low-Cost Wireless Sensor IoT Network Design; Section 3-Methodology, Design and Implementation; Section 4-System Prototyping and Discussions; Section 5-Experimental Setup and Results Analysis; Section 6-Conclusions.

Low-Cost Wireless Sensor IoT Network Design
The Internet-of-Things (IoT) comprises a network of embedded networked computers (implanted into everyday 'things') that are able to sense the environment, as well as communicate and interact with people and each other [24].The falling costs of embedded devices and advanced networking has led to a proliferation of IoT applications in industries, cities, homes, and the natural world, all of which enhance people's lives [25].Consequently, the 'assisted living' community are investing much ongoing research and development into the use of IoT.For example, an important area of IoT research needed to realise the 'assisted living' vision is inter-device communication, especially that relating to the development of the wireless sensor network (WSN).WSNs are networks that consist of multiple sensor nodes, with a gateway and a wireless transmission path.Generally, node communication is supported by technologies such as Wi-Fi, Low Energy Bluetooth, ZigBee, 6LoWPAN, and 4/5G.
Table 1 presents a comparison of four main wireless technologies (Wi-Fi, Bluetooth, ZigBee, Zwave) typically used to create WSN-based smart environments, a function of six characteristics: (1) standards, (2) indoor/smart home market share, (3) data throughput, (4) range, (5) reliability, and (6) ease of use.It can be noticed that the Wi-Fi protocol offers the highest data throughput for the transmission range (∼10 m) required in this study, and therefore, the lowest communication latency and highest.In addition to the connectivity requirements, another key IoT element that requires special attention when designing assisted living systems is the environmental monitoring module [26,27].
To date, most of the approaches taken have been fairly conventional, relying on sensor networks to measure parameters such as temperature or humidity [28][29][30][31] to predict and maintain occupant comfort [26].The use of IoT ambient acoustic systems for human activity prediction and recognition is somewhat rare in environmental monitoring.For example, in 1995, Reynolds [32] investigated the Gaussian Mixture Models (GMM) technique to predict users' activities or events.Environmental sound generated from users' activities/events has been explored by a number of scholars, including Virone and Istrate's work in 2007 [33], which used an environmental sound sensor for monitoring home activities using a simulator.Later in 2009, Shaikh et al. [34] investigated sound virtualization in a virtual world where they tried to capture environmental sound together with interpreting associated activities.While Virone and Istrate's research demonstrated the validity via simulation, they did not investigate the implementation of IoT.Virone and Istrate's work in identification of the types of sound was further developed by Segura-Garcia et al. [35] eight years later in 2015, where the new team focused on analysing urban noise levels.Segura-Garcia et al. used Zwicker's annoyance model to analyse road traffic noise in the City of Barcelona, utilising a WSN similar to the technological set-up of the IoT application we developed in this study [35].However, our settings and focus differ to this earlier work in that our application has aimed at a domestic home environment occupied by elderly people.Nevertheless, the conclusions drawn by Segura-Garcia et al. [35] added weight to our hypothesis that an acoustic sensing approach had the potential to deliver a low-cost solution to the provision of care within a domestic setting.

Methodology, Design and Implementation
The main goal of this study was to identify sequences of events, and their meaning, that occur in domestic environments based on analysing and contextualising ambient acoustic signals.Subsequently, our study is divided into three areas.The first area is related to understanding the needs of the elderly; the second area is related to the technique used to identify and profile the acoustic signals; and the final area is related to the context system for improving overall system performance.The sections below describe these areas in more detail:

Understanding the Needs of the Elderly
To better understand the needs of the elderly, we utilised a core-task analysis (CTA) approach developed by Norros [28] and Pesonen et al. [36].

The Analysis Process
The methodology involved analysing the views and preferences of the target end-users and mapping this information into usability requirements, which were then embedded into the design process.The process comprised of three phases: • Phase 1: Science-based modelling of core tasks.This relied on gathering expert knowledge on elderly care practices within the UK, drawn from scientific and professional literature.The goal was to collect important information regarding the care needs of elderly people.• Phase 2: Consultations with residents and staff in a residential home for elderly people in the UK to identify residents who would be suitable and willing to participate in our research.• Phase 3: Identify habitual home activities in order to inform/train the analytic system to classify them, and then infer "abnormal" activities at a later deployment stage.
Our study focused on residential homes for elderly people run by county councils in the UK.The elderly homes studied in this research were a small sample of the 462,000 units of accommodation provided by specialist housing for older people in the social sector across England, Scotland and Wales.The residential homes in our study had either one or two occupants.The local council concerned (Cambridgeshire) recognised that the ageing population had become the key players in the housing market and that they needed to provide better and more cost-effective support to this sector via the use of innovative technology.This policy was echoed in an earlier report [37] published by the UK Department for Work and Pensions (DWP) and the Ministry of Housing, Communities and Local Government (MHCLG) in 2016, highlighting the need to deliver housing solutions to meet older people's needs through, for example, supporting them to live independently in their own homes for as long as possible.The following section describes the process of end-user data-gathering and preliminary results.

Consultation and Focus Group Interviews
Two separate sessions with the inhabitants of the residential homes were undertaken; one being administration of a questionnaire to identify their usual habitual activities, the second being an interview to assess their attitudes towards the acceptance of technology.As the target group for these sessions were elderly people, the interviews were designed to be broad, "light-hearted", and conducted in a manner that was as open and sensitive as possible.The questionnaires were prepared based on an earlier questionnaire format used in Oxford Time-Use Research (daily activities diary).In this study, household activities were divided into three categories: kitchen, living room, and other.Each category had between 6 to 16 pre-listed common activities in addition to an open-ended question labelled "other" for participants to complete if they could not find a particular activity that was itemised (Table 2).To cater for age-related vision problems, the questionnaire's font size was set to 20 points.The interview sessions were conducted in the community room within the residential home complex.No recordings were made, apart from note-taking.The procedure was as follows: 1.
Residents of the elderly homes were invited to attend the sessions.

2.
Residents were briefed about the study and asked to give their consent.
Questionnaires (second session: 1 h) Eleven elderly residents participated in each of the sessions, with ages ranging from 60 to 95.Residents were asked for their views on assisted technology and if they had any concerns.They were also asked to record how many times during the day they thought they had conducted the activities listed on the questionnaire.Concerning the views on assisted technology, preliminary results showed that among the 11 elderly people, six expressed the view that they were quite happy to accept help from technology when needed; three had reservations; while two totally rejected the idea.Further analysis showed that for those who expressed reservations in accepting assisted technologies, cost was the main issue, followed closely by privacy.
In terms of household activities, preliminary results showed that the most frequent activity done in the kitchen area was making tea/coffee, whereas in the living room area it was watching TV while sitting on a sofa.Because making tea/coffee involves readily identifiable discrete activities, such as filling up the kettle, switching the kettle on, and waiting for the kettle to boil, the kitchen area was chosen as the experimental focus for our study.

Neural Network Classifier
Detection of anomalies in ambient sound is a special form of the speech recognition problem.A conventional solution is to train a series of classifiers to map some sound features to the events of interest.One way to train the classifiers is by using neural networks (NN).A NN is a computational model inspired by the structure of a biological brain.It consists of a large number of simple neural units.Each neural unit is connected with many others through weighted strengths that can be increased or reduced to affect the computed value of the adjoining neural units.

Neural Network Classifier Description
The chosen neural algorithm, commonly used in a variety of applications, is the Feed-Forward (FF) algorithm, briefly described through Equations ( 1) and (2).In the inference phase, the FF algorithm starts with the computation of the output value of each of the neurons as a weighted summation of all inputs (x), with their corresponding weights (w), that is, the net value, (1) followed by the firing of the net value with an activation function, such as the sigmoid function, (2).
In order for the classifier to be able to recognize patterns, it has to be trained.There are many training algorithms available for FF NNs, and their selection is done according to the data to be classified.For large-scale data, one of the most appropriate learning algorithms is the scaled conjugate gradient method.The algorithm is based on the standard training process that minimizes the error function E, defined as the squared difference between the actual output value of the output layer neuron for the applied pattern and t, the target output value, (3), by adjusting the neurons' weights (for this, a search direction, η, and a step size, ε, are set), such as the new error function (calculated based on the new weights) is less than the previous error function (4). (3) One option is to set the search in the steepest descent direction that corresponds to the negative gradient of the error function in respect to the weights (the gradient descendent algorithm), but in the context of large-scale NNs, the algorithm shows poor convergence and time-consuming line search.However, if the search direction uses information from the second-order approximation of the error function, combining the Levenberg-Marquardt method with the conjugate gradient approach, the training time can be drastically reduced.A detailed explanation of the algorithm is given by its author in [38].
NNs have been used to solve a wide variety of tasks.For example, in 2013, Swarnkar et al. [39] explored this method for classifying sound signals to auto-detect human coughs.The same technique was recently used in a wearable device to detect sound from a heartbeat [40] and to monitor a person's health.In hardware, Boes et al. [41] investigated NN learning models on low-cost devices to detect volatile events by learning signalling patterns.In their study, the authors discovered that better results could be achieved by providing greater flexibility and accommodation (in the learning model) but without losing prior knowledge.Recently, Ramanujam et al. investigated NN deep-learning models for human activity recognition, with specifically targeted mobile and wearable devices.However, the study was limited to pre-existing datasets instead of using a real-time data acquisition method [42].The section below describes the NN classifier design and implementation.

Neural Network Classifier Design and Implementation
In our study, we trained a series of NN classifiers to identify the sound of the most significant events that occur in the daily life of older people.We determined these events from our focus group interviews, which revealed that the most important sounds include sudden bangs, footsteps, water from taps, kettles boiling, and toilets flushing.
First, we collected data through making our own recordings of the sounds (in WAV format) from the home of a volunteer (see Section 5.1), as well as the FreeSound website.In this way, we successfully obtained more than 36,000 s of sound for training and validation purposes.The main objective was to develop a proof-of-concept for classifying acoustic scenes; hence, we used standard feed-forward neural networks (FFNN) with mel-frequency cepstral coefficients (MFCC) as audio features.For this, Matlab programs have been developed to calculate the MFCCs used to train, validate, and test the FFNN.
Data sound collected through recordings and downloaded from the FreeSound website had a sampling frequency of 44.1 kHz, 24 bit depth on one or two channels.For calculating the MFCCs, the short-time Fourier transform has been initially applied on a window size of 2048 samples (46 ms) with a shift of 1024 samples (23 ms).The mel-spectrum was obtained by applying a mel filter bank of 128 bandpass filters in the range of 300 Hz to 22 kHz.The cepstral analysis was later performed, and 13 cepstral coefficients (including the 0th order) were obtained for each window analysed.The audio features were then stored in a data matrix of 130 rows, (which corresponded to 13 MFCC features/time bin multiplied by 10-time bins) and N columns, where N is the corresponding number of frames (windows) of the sound analysed.Each matrix value had been pre-processed by subtracting the minimum value of its corresponding column and normalizing it by dividing it by its column standard deviation.Consequently, the FFNN design consisted of two layers: an input layer with 130 neurons, a hidden layer with 10 neurons, and an output layer with two neurons with the sequence of '10' for true (the sound belongs to the classifier) and '01' for false (the sound does not belong to the classifier).The training algorithm applied was the scaled conjugate gradient backpropagation with a sigmoid activation function on the hidden layer, and a linear one on the output layer.On occasions when more than one classifier gave a positive detection, the one with the highest confidence value was chosen.
For cases where all classifiers gave a negative detection ('01'), the sound was considered to be ambient noise.

Context Reasoning System and Implementation
In addition to the NN classifier, we also developed a context reasoning system to improve the monitoring performance.In many previous studies of homes monitoring [9,28,43], events were considered as being independent from each other, and therefore an abnormal event was defined as a single event, such as a sudden bang due to falling down or breaking glass.However, in reality, we argue that events are related to each other.For example, before the kettle boils, we can expect for there to be some sound from the tap (to fill the kettle with water); before the sound of the tap, we can expect some footstep sounds (indicating someone entering the kitchen or the area where the kettle is located).This chain of events provides a contextual clue as to what will happen next, and more certainty concerning any particular event.Therefore, an abnormal event is not a single event, but can be viewed as a "surprise" out of an otherwise normal routine of events.This concept is very important for improving the performance of home monitoring.Previously, Chik et al. [44] used this contextual reasoning system to predict prospective risks at home, in a robotic observation system with positive results.Others, in order to overcome the limitations imposed by the large data set requirement to train the supervised classifiers, have used unsupervised approaches in conjunction with ontological (translated in a Markov Logic Network) methods, as reported in [22].In the current study, we employed the context reasoning system applied on NN models trained with a low data set to achieve activity-increased recognition system reliability.
The data structure for the context reasoning system was developed as shown in Table 3.The first column shows different events that have been recognised by the NN classifier described in Section 3.2.2.As explained above, we defined the events manually, based on the interviews with occupants of a residential home for the elderly.For the purposes of this paper, our system recognised a fixed number of habitual events, but our plan for the future is to develop a novel detector that will learn and create dynamic events automatically.The second column in the table shows the maximum duration of the event.This was learnt from the training data gathered from the home of a volunteer (Section 3.1.2).The third, fourth, and fifth columns show the top three next most likely events (a pragmatic choice).For this, we counted the frequency of occurrence between subsequent events, and obtained the top three events with the highest likelihood to happen next (A, B, C).Thus, for example, Table 3 shows that, if the current event is tap water, then the next event is likely to be the kettle boiling.Another possibility is, after the sound of tap water, the next event might be a footstep.A total of 1000 s of audio data were collected from the participant's home.The data contained sequences of events.The sequences were split into individual clips and fed into a NN classifier.The results were then used to construct the context data and train the system.After training, we deployed the system to assess its ability to generate alerts for potential accidents.We derived the alarm criteria as: 1.
If the next event is not as expected, then send an alarm; 2.
If the duration of the current event is longer than 1.5 times the maximum duration in the context table, then send an alarm.
The alarm is delivered in two ways.First, the sensor nodes will beep to alert the resident.Second, an email will be delivered to the assigned carers.We recognise this is a somewhat simplistic interpretation of alarm detection, but its purpose is simply to illustrate the principle, which would be adapted to reflect more complex conditions which can be dynamically determined and adjusted in our future work.

System Prototyping and Discussion
The main concern gathered from interviewing elderly people (see Section 3.1.2was the cost, which is understandable as they were all retired and living on small pensions. Our investigation into the development of a low-cost unobtrusive monitoring system that can passively monitor domestic environments via processing its ambient acoustic information could help with this concern.As explained earlier, we adopted an IoT-enabled LinkSmart semantic middleware (Figure 1), an open source technology developed by the FP7 EU project EBBITS, which provides open interoperable capabilities suitable for applications using distributed physical devices with communication.In addition, the LinkSmart middleware has a modular framework, making it readily customisable.Furthermore, its ontological approach, to provide semantic representation of devices in the form of services, provides great flexibility and robustness, making it suitable for low-cost implementation.
As explained in Section 2, a Wi-Fi-based WSN was chosen for our prototype implementation.In our prototype, the LinkSmart Virtual Entity module was used to act as the main semantic virtualization mechanism for representing and handling all the data on the WSN; while the coordinator node provided the data via IoT Service (Figure 1).
Nodes were distributed and discoverable via LinkSmart's device discovery feature and distributed architecture.LinkSmart is a popular standard, adopted by a number of other projects, including ELLIOT to create smart offices [45] for business use.Our study is the first to explore its use in a domestic assisted-living environment.
The development of a WSN for monitoring construction noise was investigated previously by Hughes et al. [46].
Their work compared different transmission modes, examining the effectiveness of a Bluetooth Low-Energy (BLE) network with a ZigBee network, specifically for an outdoor construction site setting.In their study, Beaglebone, Arduino, Raspberry Pi and Pi Zero computers were assessed for cost and availability.The node-to-gateway data transmission route choice was ZigBee, although BLE was considered.Other examples of low-cost IoT WSNs for environmental monitoring include the work by Ferdoush and Li [47] on analysing the combination of the Raspberry Pi and ZigBee components in a multi-node network.In their study [47], they drew the conclusion that a WSN with a Raspberry Pi base enabled the prototyping of a low-cost flexible WSN.In addition, Ahmed et al. [48] investigated various sensor types using Raspberry Pi and ZigBee.Their work emphasized the flexibility of the Raspberry Pi, particularly as the foundation of a WSN, and they concluded that it was a robust platform suitable for a wide variety of WSN locations.However, ZigBee has slightly higher power consumption and low data rate capability, and it is not suitable for supporting tasks requiring high bandwidth, such as gathering ambient acoustic signals.
In addition to the cost concerns raised by the participants, a literature review showed the adoption of smart environment technology had many other limiting factors, such as ease of implementation and perceived usefulness [49].These factors have been taken into consideration in the design of our prototype, which involved the use of low-cost devices: Two Raspberry Pi 3 were selected to act as WSN nodes, one equipped with a USB mini-microphone and the other with a speaker.Pi3 had a built-in Wi-Fi module and data were transmitted between the nodes and the gateway over Wi-Fi (Figure 2).

IoT Hardware and Sensor Nodes Implementation
The prototype development was based on the principles employed in our previous study [28], the key aspects of which can be summarised as:

•
The design of the Wireless Sensor Network (WSN) should be based around affordable devices with high-capacity processing [25].

•
In line with IoT principles, a distributed network of devices should be employed to collect data independently and autonomously of other devices on the network [50].• Raw data should be collated by a coordinator (control) node.

•
On-node data processing should be restricted and the data transmission rate balanced against power consumption to work within the limits of the IoT devices.
The overall system architecture is shown in Figure 2. It uses a modular framework comprising hardware/device-level workflow, IoT middleware communication, NN learning modules, and GUIs (as explained in Section 3.2).All sensors/nodes communicated directly with the coordinator (control) node via Wi-Fi.The coordinator (control) node communicated via LinkSmart middleware.As mentioned earlier, the main benefit from adopting the LinkSmart was device interoperability, while all nodes were implemented using a low-cost Raspberry Pi.A wireless LAN and an antenna were built into the board on the Pi-3, reducing the number of USB ports that were required and allowing for a smaller enclosure.All nodes were powered by a mains electricity connection to ensure a continuous and reliable data feed.SD cards of 8 GB were configured with the Linux-based Raspbian (Jessie), a Debian distribution, to provide an operating system for the Raspberry PIs.In addition, for the gateway node, an Apache server was used to present the data in graphical form.The prototype WSN nodes used low-cost, off-the-shelf microphones from Kinobo, which were easily obtained through on-line retailers.
As mentioned, our aims were to use cheap components; hence, the range was limited.However, sound levels recorded by the sensor node were sufficient for prototyping purposes.Consistency in the recordings from the microphones was ensured through comparative testing.Ambient acoustic data were gathered by a node placed in the kitchen, before being passed to the central controller node via Wi-Fi for analysis.

Experimental Setup and Results
For this study, we followed an experimental setup similar to the one used in our previous work [28] but with two main differences: the WSN was based on Wi-Fi, and the setup consisted of two nodes.The experiment was conducted in a domestic home, with one bedroom, kitchen, living/dining room, and shower room, measuring approximately 50 sq.meter in total (Figure 3).One node (acting as a gateway node with an attached speaker) was placed in the living room and another node (a sensor node with an attached microphone) was placed in the kitchen area (Figure 4).The data that were collected informed the size of the different networks, as the optimal number of nodes for different locations varied due to the location/room size and its ambient noise level.The gateway node received the transmissions from the sensor nodes and unpacked the data.A Python-based NN library was installed on the gateway node, which was used to test the data and generate predictions.

Experimental Procedure
The experiment involved two phases.The first phase was to gather data for training purposes.In this phase, the data were gathered in one of the participant's home.A total of three sequences of activities, each repeated 10 times, were gathered.The three sequences of activities were: 1.
Walking, grasping the kettle, going to the tap, filling the kettle with water, switching the kettle on, the kettle boiling; 2.
Walking, grasping water filtered jug, (1a) going to the tap, (1b) filling the jug with water (2) pouring the water into the kettle, switching the kettle on, the kettle boiling; 3.
Walking, opening the fridge, closing the fridge, walking.
The data gathered were processed offline and used to train the NN model and context reasoning system using a Dell Optiplex computer, described in Sections 3.2 and 3.3.Once the models were fully trained, they were then deployed to the gateway node, ready for the second phase of the experiment.The preliminary data of the second phase were then collected and subsequently analysed.We used the methodology developed in our previous work [28].
This involved using the same data collection and transmission script on the node running automatically, from boot, on the sensor node (the recording, analysis of the recording, and transmission of analysis ran in an infinite loop).A new recording and analysis started every second.The on-node analysis of recordings was done using the SoX, a cross-platform audio-processing command line utility.The audio data were then sent via Wi-Fi to the gateway node for further processing and testing.The gateway node stored the data in a local database which were then visualized through real-time rendering on a graphical interface.This allowed quick visual comparisons of consecutive time periods.The participants were first asked to repeat the same sequence of activities as normal, and then 'walking three steps and stopping' (remaining quiet).Figure 5

Results and Analysis
The data collected from the Phase 1 experiment were used to train a NN classifier and context reasoning system.For comparison purposes, the graphic representation of the sound waveforms, the logarithm of the filter bank energies, and the mel-frequency cepstrum for the kettle and footstep sounds are displayed in Figure 6 and Figure 7, respectively.The accuracy of the NN classifier was investigated.In the experiment, the aim was to identify the sound of the kettle boiling, as distinct from other distracting noises such as walking, opening/closing the fridge, filling the kettle with water, switching the kettle on, the kettle boiling, and staying quiet.Initially, we used only the neural network classifier without the context reasoning.After training with 20 positive and 20 negative examples, we tested the system with 20 positive and 55 negative events based on the data gathered from one individual's home.Next, we included the context reasoning.We trained another two neural network classifiers to identify the water tap sound and footstep sound (again using 20 positive and 20 negative examples for each case).The logic of the context reasoning was, if the water tap sound and footstep sound were not detected before a probable detection of the kettle boiling sound, then the probability (or confidence) of kettle boiling detection will be reduced.
The test results are shown in the corresponding confusion matrix shown in Table 4 for the NN without context reasoning, and in Table 5 for the NN with context reasoning.The table shows the performance of the NN trained to recognise the Boiling Kettle sound identified as "True Positive" (TP) when the NN predicted the Boiling Kettle when the Boiling Kettle sound was presented, "True Negative" (TN) when the NN predicted a non-Boiling Kettle when a non-Boiling Kettle sound was presented, "False Positive" (FP) when the NN predicted kettle boiling when a non-Boiling Kettle sound was presented and "False Negative" (FN) when NN predicted a non-Boiling Kettle when a Boiling Kettle sound was presented.The confusion matrix helped calculate the sensitivity (TP/(TP + FN)), specificity (TN/(TN + FP)), precision (TP/(TP + FP), and accuracy ((TP + TN)/(TP + FP + TN + FN)).Table 6 shows the performances of the NN without and with a context reasoning system.Analysing the statistical data indicates that performance improvement comes from a reduction in false positives by taking the context into account.We expect that the accuracy could be improved further by designing more complex context reasoning logic.However, putting these prediction percentages in the context of the Bayesian statistics, where the sensitivity and specificity are the conditional probabilities, the prevalence is the prior probability, and the positive/negative predicted values are the posterior probabilities, the model prediction is usually involved in unconditional events, and therefore the positive predictive value (PPV), defined in (5) can better characterise the overall performance of the classifier when the prevalence is not 50%.
The calculated PPV values are shown in Table 7.As expected, the value increased considerably, showing the importance of using a context reasoning system, especially on NNs trained with a low data set size.Collating the data from all our test experiments and applying the context reasoning system on our NNs trained to recognize Kettle Boiling, Tap Water, and Foot Step, the calculated accuracy and PPv values are displayed in Table 8.Applied in the Phase 2 experiment, where the participant was asked to repeat the same habitual sequence of activities, the system correctly did not trigger any alarm (as the sequence was in the predefined sequences of activities), but when the participant walked three steps and stopped (not in the predefined list, and therefore treated as "abnormal" sequencing), an alarm was triggered.These demonstrate that the system design is capable of recognising the sounds produced and classifying it as normal or abnormal.

Conclusions
The summary of the research, that is, the goals achieved and how they were attained are the following: (i) The main goal of this study was to investigate a holistic approach using a combination of acoustic sensing, artificial intelligence, and the Internet-of-Things as a means of providing a cost-effective approach to alerting care providers or relatives when an abnormal event is detected.(ii) We investigated the use of low-cost IoT devices and NN methods to unobtrusively monitor an ambient acoustic domestic environment.A prototype was successfully developed for this purpose.In developing the prototype, we investigated the performance of Wi-Fi WSN and the semantic LinkSmart IoT middleware as a platform to support on-node real-time audio sample processing, and distributed logic, in IoT systems, proving its architectural viability for such purposes.(iii) We investigated NN techniques as part of our acoustic processing work and combined the context reasoning system with the NN classifier trained on a low-training data set, and good overall accuracy was obtained.Moreover, our work has shown that by contextualising events within a stream of activities, the overall accuracy, and therefore the reliability of the alerting system can be improved (i.e., reduction of the number of false alarms).Although we have left a more focused and detailed study of the sequential logic and how it might be refined for a follow-on project, the findings from this research confirm that is possible to create a stable, low-cost wireless sensor network for monitoring and modelling ambient acoustic levels using an NN learning model on a distributed Raspberry Pi network over Wi-Fi and LinkSmart middleware.
As we explained earlier in this paper, our main goal was to explore the potential for a combination of low-cost IoT devices, NN models (trained using a low data set), and a context reasoning system to unobtrusively monitor ambient acoustic domestic environments for the purposes of providing care to elderly people.At the core of that work was establishing a set of hardware and software architectural principles (e.g., the nature of, and the distribution of processing arrangements and AI models) capable of ensuring a deployable system in home care, which we believe this work has achieved.
Clearly, there are many directions that this work could follow in the future, and below we are suggesting nine of them (numbered as i-ix): (i) One direction could include the investigation of more complex acoustic profiles (e.g., by including prior knowledge and context).(ii) A wider range of sound measurements could be captured to allow for a more detailed analysis of different types of sound and a more refined alert system (e.g., associating footsteps to particular people).(iii) To improve the context reasoning system, the sequential logic might be refined to involve a larger-scale study covering many more events.Likewise, it would be beneficial to (iv) further refine the NN classifier to optimise the network structure, the number of hidden units and the training features, or make it more sustainable when scaling up the system, (v) to replace the FF-BF NN with a CNN, which is a more accurate NN when recognising picture-type patterns (as the mel-frequency cepstrum provides).(vi) Another useful line of research would be to address the cocktail party problem (several sounds mixing with each other) by, say, using blind source separation or other techniques.(vii) HCI is another aspect worthy of further work.
Although the project has focused on how the system could be of benefit in AAL settings, the demonstrated WSN could be easily adapted for different environments and applications that would benefit from remote ambient sound monitoring and alerts.For example, beyond the care of elderly people, the system might be applied to (viii), baby/child monitoring, monitoring children with ADHD (as proposed in [51], where accelerometers and proximity sensors were used), security systems, and even pest control.
Finally, (ix) concerning commercial deployment routes, given the ever-increasing popularity of smart-home voice control systems such as Amazon's Echo, it may be possible to integrate the techniques presented in this paper into such systems, thereby endowing them with a broader sound analysis capability, enabling care services to benefit from the 'economics of scale' associated with commercial smart-home technologies.
Thus, we are confident that our results will encourage further research on applications using systems trained with a low data size and support the increasing interest in applying sound analysis to a range of emerging IoT and smart home applications, which will benefit us all and especially older people who, to date, have not been the main beneficiaries of the technology revolution that is changing our world at an unprecedented rate, which in our opinion, has had a knock-off effect in gathering reliable behavioural data.Ultimately, such work would benefit all of us, since most of us are destined to be elderly later in our lives [51].

Figure 3 .
Figure 3. Floor plan of the experiment setup with two nodes in a domestic environment.

Figure 4 .
Figure 4. Sensor nodes with the USB microphone and speaker attached.
shows an example of the audio data gathered during the first phase of the experiment.The red circle indicates a 'kettle boiling' event as recognised by the NN classifier.The green circle indicates a 'footstep' event.

Figure 5 .
Figure 5. Audio signal recorded by Pi 3 and transmitted to the computer for analysis.

Figure 6 .
Figure 6.The sound waveforms, the logarithm of the filterbank energies, and the mel-frequency cepstrum graphic representation for the kettle sound.

Figure 7 .
Figure 7.The sound waveforms, the logarithm of the filterbank energies, and the mel-frequency cepstrum graphic representation for the sound of the footsteps.

Table 1 .
Key characteristics of wireless protocols.

Table 3 .
Context data structure.

Table 4 .
Neural network confusion matrix without context reasoning.

Table 5 .
Neural network confusion matrix with context reasoning.

Table 8 .
The accuracy and PPV values of the NN classifiers with context reasoning.