Fusion of Unobtrusive Sensing Solutions for Home-Based Activity Recognition and Classification using Data Mining Models and Methods

. Abstract: This paper proposes the fusion of Unobtrusive Sensing Solutions (USSs) for human Activity Recognition and Classiﬁcation (ARC) in home environments. It also considers the use of data mining models and methods for cluster-based analysis of datasets obtained from the USSs. The ability to recognise and classify activities performed in home environments can help monitor health parameters in vulnerable individuals. This study addresses ﬁve principal concerns in ARC: (i) users’ privacy, (ii) twearability, (iii) data acquisition in a home environment, (iv) actual recognition of activities, and (v) classiﬁcation of activities from single to multiple users. Timestamp information from contact sensors mounted at strategic locations in a kitchen environment helped obtain the time, location, and activity of 10 participants during the experiments. A total of 11,980 thermal blobs gleaned from privacy-friendly USSs such as ceiling and lateral thermal sensors were fused using data mining models and methods. Experimental results demonstrated cluster-based activity recognition, classiﬁcation, and fusion of the datasets with an average regression coefﬁcient of 0.95 for tested features and clusters. In addition, a pooled Mean accuracy of 96.5% was obtained using classiﬁcation-by-clustering and statistical methods for models such as Neural Network, Support Vector Machine, K-Nearest Neighbour, and Stochastic Gradient Descent on Evaluation Test.


Introduction
Recognising individual activities of people susceptible to hazardous behaviours such as falls, wandering, and agitation has been an active research topic, which has witnessed the use of pervasive and non-pervasive Sensing Solutions (SSs) [1].Interestingly, many cases of hazardous behaviours in ageing adults can be prevented [2,3].While there are several SSs that can detect these behaviours when they occur, it would be of great benefit if they can be predicted prior to their occurrence.This may be achieved by using Data Mining (DM) and Machine Learning (ML) models, which can help discover patterns and potential deviations from established patterns in the data gleaned from a sensorised environment.
Pattern Deviation Assessment (PDA) in activity recognition is a vital tool in detecting abnormal activities [4].Its outcome helps to determine if an ageing individual can be considered to be independent or not whilst performing certain activities [4].This is an important part of the home-based assessment process to gauge if a person can remain living in their own home.PDA can also help determine the extent of recovery from injury, potential hazardous behaviour, and an individual's effectiveness.Pattern deviation can take forms such as detecting incomplete activities and sudden changes in activity, disposition, and posture.PDA outcomes are often positioned in clusters to help access a set of activities or patterns on demand.The present work benefits from cluster-based analysis of patterns discovered from features extracted from thermal images using DM models and methods.
Research in ARC has often considered the use of wearables such as accelerometers and video-based solutions such as Kinect [5][6][7][8][9][10][11]. Whilst accelerometers can provide information on orientation and angular acceleration of the worn part, wearability and data disruptions are some of the disadvantages.Likewise, Kinect has problems ranging from interference with external infrared sources to privacy and reflections in home environments [12][13][14][15].This work tackles these problems through the usage of Unobtrusive Sensing Solutions (USSs) such as Infrared Thermopile Array (ITA) thermal SSs, which are nonwearable and not prone to reflections in home environments.
This article extends our previous research in [16] by conducting K-Means analysis on the fused, lateral, and ceiling sensor datasets using DM models.The extended work also includes the use of statistical tools such as an interval plot, a Two-Sample T-Test, and ANOVA to analyse average values of the models.Additional diagrams and annotations are also provided in the present work.The novel contributions of this work are four-fold.First, it presents an unobtrusive data collection through the use of non-wearable (i.e., privacy-friendly) USSs.Secondly, it presents a comprehensive analysis of the data gleaned from two ITA sensors through the use of DM models and methods.Thirdly, it proposes the fusion of data from the ceiling and lateral thermal sensors to address instances of occlusion.Fourthly, it compares the averages of models from the lateral, ceiling, and fused datasets using statistical methods such as T-Test and ANOVA.
The remainder of this paper is organised as follows: Section 2 discusses related work; Section 3 presents the materials and methods; Section 4 presents the experimental results; Section 5 presents discussions around the study findings and conclusions.

Related Work
Many SSs have been deployed over the years for the purposes of activity recognition [17][18][19].These have included the use of wearable or non-wearable solutions or the fusion of both.Whilst they help data acquisition in the environment where they are deployed, their use in home settings can be negatively influenced by signals from other legacy systems and obstructive materials.Work in [20] proposed the use of a Hidden Markov Model to recognise human activities based on data gleaned from a waist-worn accelerometer.The model also classified collected signals according to a corresponding class.In the study, continuous monitoring was performed by a Gaussian Mixture Model.A further study by Ni et al. [21] used a Multivariate Online Change Detection algorithm for activity recognition.
Accelerometers for activity recognition have been featured in many studies [20,22].In [23], the use of triaxial accelerometers was proposed for monitoring rest, movement, transition, and emergency states in ageing adults.Although the successful detections of the activities were noted in the study, the ability to distinguish between activities and classify them accordingly was considered for further improvements.In [24], a triaxial accelerometer was used to monitor daily physical activity.In addition to the challenges of the approach presented in [23], wearability was an issue reported in the latter study.Another multi-wearable sensor study was carried out by Gao et al. [22].Whilst a garment-based accelerometer might exhibit improved performance in a laboratory environment, as illustrated by [22], real-life usage may suffer the risks of explosion or damage to the sensors during washing activity.Additionally, long term usage can cause a feeling of discomfort for the user.
Activity Recognition and Classification (ARC) through the use of mobile devices has also been researched [16].Work by Figo et al. [7] explored the use of a smartphone's accelerometer to recognise and classify activities such as running and walking during a certain period of the day.The study obtained information from the GPS sensor to suggest to the user routines similar to those performed in previous days.The work presented by [25] suggested that mobile devices should be optimised to enhance the continuous monitoring and processing of data acquired from their sensors.Whilst these suggestions seem innovative and worthy of exploration, battery life and the users' ability to remember to carry mobile devices around are major setbacks.Furthermore, in Konios et al. [26], a probabilistic examination of temporal and sequential aspects of activities using an approach based on the Cumulative Distribution Function is employed to determine abnormalities in activities.This approach involved deriving probabilities of normal behaviours with respect to the duration and the stages of an activity.Whilst this study introduced an effective way to detect (ab)normal activities, a profile analysis of users aimed at ensuring more precision in detecting the presence of health-related abnormalities is still being researched.
Data fusion from homogeneous and heterogenous sensors has also been deployed in ARC.Garcia-Constantino et al. [18] investigated the fusion of data from wearable (accelerometer) and ambient (thermal) sensors by extracting relevant features from both.Initial results from this approach indicated an improvement in abnormal behaviour detection.
DM and ML models have positively influenced human activity recognition, clustering, and classification in home settings.In [27], the importance of cluster-based analysis of datasets is stressed beyond homogeneous to heterogeneous datasets, which are prevalent in real-world applications ranging from home-based to digital environments [28].Work in [29] proposed the use of a Deep Neural Network (DNN) algorithm for home-based activity recognition.The study presented a fused DNN-based architecture that could predict the label of actions performed in a kitchen environment.In [30], dangerous and abnormal behaviours are predicted using a lossless algorithm and situation awareness mechanisms.Whilst many activity monitoring models can exhibit excellent performance in a controlled environment such as laboratories [21], others can only be moderated by trained personnel [31].This often results in successful laboratory work which cannot be deployed in a real-life setting.
Presently, ARC in a home environment has featured sophisticated SSs.These solutions are often used to acquire data in different areas, including the prediction of prevalence and management of individuals with diseases such as dementia, osteoporosis, and increased fragility [32,33].They also help to detect hazardous incidents [19].Nevertheless, data acquisition in a home setting can be negatively influenced by gadgets that can interfere with signal propagation from different SSs.Whilst the many advantages of using a video camera for home monitoring solutions cannot be understated, lack of privacy protection and changes in lighting conditions are the main concerns for its use.This study was performed to address five principal concerns in ARC: (i) users' privacy, (ii) wearability, (iii) data acquisition in a home environment, (iv) actual recognition of activities, and (v) classification of activities from single to multiple users.Hence, this study presents the fusion of data from unobtrusive (i.e., privacy-friendly) SSs for home-based ARC using DM models and methods.

Materials and Methods
Research in human activity recognition is an important monitoring process in smart homes [31] that has witnessed the use of wearable and non-wearable SSs.In this study, attention was given to privacy-friendly USSs.Additionally, the study was carried out in a smart laboratory kitchen that mimics a typical home kitchen [34].More than 11,000 thermal blobs were recorded from 10 participants with two Infrared Thermopile Array (ITA) sensors.
Participants were asked to prepare either a cup of tea or coffee.
The present work uses two ITA-32 sensors to monitor and recognise activities in a laboratory kitchen, which is similar to a home kitchen.The two thermal sensors are used simultaneously to address instances of missing thermal blobs due to occlusion.Automated processing techniques are used to synchronise and extract features and to fuse data from both sensors.Contact sensors are used as the baseline to compare their timestamps with those of thermal sensors.The study was carried out in a laboratory kitchen (Figure 1), which measures 3.9 m by 3.4 m.Ten healthy participants took part in the study, and each of them participated in a total of seven experiments.To have a more realistic scenario, participants were allowed to take as long as they wished to complete the activities in each experiment.There were no time constraints or control on the duration of the activities undertaken.The present work uses two ITA-32 sensors to monitor and recognise activities in a laboratory kitchen, which is similar to a home kitchen.The two thermal sensors are used simultaneously to address instances of missing therma blobs due to occlusion.Automated processing techniques are used to synchronise and extract features and to fuse data from both sensors.Contact sensors are used as the baseline to compare their timestamps with those of therma sensors.The study was carried out in a laboratory kitchen (Figure 1), which measures 3.9 m by 3.4 m.Ten healthy participants took part in the study, and each of them participated in a total of seven experiments.To have a more realistic scenario, participants were allowed to take as long as they wished to complete the activities in each experiment.There were no time constraints or control on the duration of the activities undertaken.The laboratory kitchen is composed of cupboards (labelled 1-4 in Figure 2) where tea, coffee, cups, and sugar we stored.Underneath the cupboards is a worktop with a microwave, a kettle, and a sink, thus mimicking a real-life kitchen.A refrigerator is located on the floor beneath the worktop, as indicated in Figure 2. The main kitchen area is where participants walked around to prepare a hot beverage (either tea or coffee), which was then taken to the table area for consumption.The laboratory kitchen is composed of cupboards (labelled 1-4 in Figure 2) where tea, coffee, cups, and sugar were stored.Underneath the cupboards is a worktop with a microwave, a kettle, and a sink, thus mimicking a real-life kitchen.A refrigerator is located on the floor beneath the worktop, as indicated in Figure 2. The main kitchen area is where participants walked around to prepare a hot beverage (either tea or coffee), which was then taken to the table area for consumption.
In Figure 2, the lateral and the ceiling SSs are represented as T1 and T2, respectively.Whilst T1 s indicative coverage included half of the kitchen area, as represented by the triangular shades in Figure 2, T2 s coverage included a larger portion of the kitchen area, as indicated by the oval shades.During data acquisition, each participant (at a time) walked in through door D1 to the main kitchen area where the cups were located.While some participants preferred to boil water in the kettle before going for the cups, others did the opposite.Data acquisition began a few seconds prior to opening door D1, notwithstanding the activity preferences of the participants.sensors are indicated by the navy-blue oval shape as T1 and T2 for lateral and ceiling thermal sensors, respectively.The coverage of T1 is indicated by the triangular area, while that of T2 is indicated by the oval area.
In Figure 2, the lateral and the ceiling SSs are represented as T1 and T2, respectively.Whilst T1′s indicative coverage included half of the kitchen area, as represented by the triangular shades in Figure 2, T2′s coverage included a larger portion of the kitchen area, as indicated by the oval shades.During data acquisition, each participant (at a time) walked in through door D1 to the main kitchen area where the cups were located.While some participants preferred to boil water in the kettle before going for the cups, others did the opposite.Data acquisition began a few seconds prior to opening door D1, notwithstanding the activity preferences of the participants.
Data from T1 and T2 were stored in a bespoke time-series database referred to as SensorCentral [35,36].A total of 11,980 frame data (1198 from each participant) were collected from the seven experiments.The contact sensors, which were also associated with the database, were able to record the times when each activity began and ended.Moreover, contact sensors were used as the baseline to compare the timestamps of both types of sensors.They also Data from T1 and T2 were stored in a bespoke time-series database referred to as SensorCentral [35,36].A total of 11,980 frame data (1198 from each participant) were collected from the seven experiments.The contact sensors, which were also associated with the database, were able to record the times when each activity began and ended.Moreover, contact sensors were used as the baseline to compare the timestamps of both types of sensors.They also help to indicate which of the participants had tea or coffee.DM tools and algorithms were used to extract features and to fuse data from both sensors.The DM algorithms used included the Hierarchical Clustering Algorithm (HCA) and the K-Means Algorithm (KMA).Metrics such as Classification Accuracy (CA), Specificity, weighted average (F1), Recall, and Area Under the Curve (AUC) were used the ascertain the performance of DM models such as K-Near Neighbours (KNN), Logistic Regression (LR), and Neural Network (NN).Others included Random Forest (RF), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM).

Results
Experimental results indicated that activities such as using a bottle of milk could be identified and distinguished from using a kettle of hot water (Figure 3) using thermal blobs from T1.While a bottle of milk was seen as monochromatic shades of black due to its low temperature, a kettle of hot water had shades of white representation due to its high temperature, as presented in Figure 3.Moreover, it is important to note that notwithstanding the closeness of the participants to the thermal sensor (Figure 3), their identities were still protected.The RGB equivalents of the activities such as opening the fridge (Figure 4a), heating a hot kettle (Figure 4b), and having tea or coffee at the kitchen table (Figure 4c) are also presented for comparative purposes.
the K-Means Algorithm (KMA).Metrics such as Classification Accuracy (CA), Specificity, weighted average (F Recall, and Area Under the Curve (AUC) were used the ascertain the performance of DM models such as K-Near Neighbours (KNN), Logistic Regression (LR), and Neural Network (NN).Others included Random Forest (RF), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM).

Results
Experimental results indicated that activities such as using a bottle of milk could be identified and distinguished from using a kettle of hot water (Figure 3) using thermal blobs from T1.While a bottle of milk was seen as monochromatic shades of black due to its low temperature, a kettle of hot water had shades of white representatio due to its high temperature, as presented in Figure 3.Moreover, it is important to note that notwithstanding the closeness of the participants to the thermal sensor (Figure 3), their identities were still protected.The RGB equivalents of the activities such as opening the fridge (Figure 4a), heating a hot kettle (Figure 4b), and having t or coffee at the kitchen table (Figure 4c) are also presented for comparative purposes.Recall, and Area Under the Curve (AUC) were used the ascertain the performance of DM models such as K-Nea Neighbours (KNN), Logistic Regression (LR), and Neural Network (NN).Others included Random Forest (RF), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM).

Results
Experimental results indicated that activities such as using a bottle of milk could be identified and distinguished from using a kettle of hot water (Figure 3) using thermal blobs from T1.While a bottle of milk was seen as monochromatic shades of black due to its low temperature, a kettle of hot water had shades of white representatio due to its high temperature, as presented in Figure 3.Moreover, it is important to note that notwithstanding the closeness of the participants to the thermal sensor (Figure 3), their identities were still protected.The RGB equivalents of the activities such as opening the fridge (Figure 4a), heating a hot kettle (Figure 4b), and having t or coffee at the kitchen table (Figure 4c) are also presented for comparative purposes.After preparing a cup of tea, it was easier to know from the thermal blobs whether the user successfully reached the table.In addition, it was necessary to know where the participant placed the hot kettle (after using it), which is a potential hazardous object.As presented in Figure 5, these activities were clearly viewed on the thermal image.Whilst the hot kettle was represented as a large blob adjacent to the participant, the tea/coffee cup was viewed as a small bright spot in what could be viewed as the hand of the user (Figure 5).
After preparing a cup of tea, it was easier to know from the thermal blobs whether the user successfully reach table.In addition, it was necessary to know where the participant placed the hot kettle (after using it), which potential hazardous object.As presented in Figure 5, these activities were clearly viewed on the thermal ima Whilst the hot kettle was represented as a large blob adjacent to the participant, the tea/coffee cup was viewe small bright spot in what could be viewed as the hand of the user (Figure 5). Figure 5. Distinguishable thermal blobs.On thermal_407, the blue arrow points to the hot kettle, the black a points to the participant, and the red arrow to the tea/coffee cup after the initial act of tea/coffee makin In some instances, the heat spot of a cup or kettle may be occluded by a participant when it is viewed from th lateral thermal sensor (see, Figure 6).When this happens, abnormal behaviours or activities may go unnotice address these concerns, the ceiling sensor (T2) can be used to collect an aerial view as presented in Figure 7.
Hence, the essence and usefulness of dual sensing in this study.In some instances, the heat spot of a cup or kettle may be occluded by a participant when it is viewed from the lateral thermal sensor (see, Figure 6).When this happens, abnormal behaviours or activities may go unnoticed.To address these concerns, the ceiling sensor (T2) can be used to collect an aerial view as presented in Figure 7. Hence, the essence and usefulness of dual sensing in this study.
table.In addition, it was necessary to know where the participant placed the hot kettle (after using it), which is a potential hazardous object.As presented in Figure 5, these activities were clearly viewed on the thermal image.Whilst the hot kettle was represented as a large blob adjacent to the participant, the tea/coffee cup was viewed as small bright spot in what could be viewed as the hand of the user (Figure 5). Figure 5. Distinguishable thermal blobs.On thermal_407, the blue arrow points to the hot kettle, the black arrow points to the participant, and the red arrow to the tea/coffee cup after the initial act of tea/coffee making.
In some instances, the heat spot of a cup or kettle may be occluded by a participant when it is viewed from the lateral thermal sensor (see, Figure 6).When this happens, abnormal behaviours or activities may go unnoticed.T address these concerns, the ceiling sensor (T2) can be used to collect an aerial view as presented in Figure 7.
Hence, the essence and usefulness of dual sensing in this study.

Sensor Data Fusion
Sensor fusion using DM tools helps extract, cluster features and merge data from both SSs.A block diagram of th sensor data fusion architecture employed in this study is presented in Figure 8 [37].

Sensor Data Fusion
Sensor fusion using DM tools helps extract, cluster features and merge data from both SSs.A block diagram of the sensor data fusion architecture employed in this study is presented in Figure 8 [37].

Figure 7.
Heat pots from tea/coffee cups occluded from the lateral thermal sensor (T1) but indicated by the ceiling thermal sensor (T2).The black arrow on thermal_242 points to the location of T1, the white arrow points to th heat spot, and the red arrow points to the hand of the participant (occluding the heat spot).

Sensor Data Fusion
Sensor fusion using DM tools helps extract, cluster features and merge data from both SSs.A block diagram of th sensor data fusion architecture employed in this study is presented in Figure 8 [37].
In Figure 8, data acquisition and preprocessing are performed by individual thermal sensors (T1 and T2).Up to 1000 features are extracted from the thermal (grayscale and binary) images.Thermal blobs gleaned from the ITA sensors are stored in a predetermined folder with timestamps to enable a time-based fusion of the data.During sensor fusion, data from T1 and T2 were imported into the data merging system.The system then created an imaginary table for the two sets of data before carrying out a matching row appending.Whilst file-import enables the reading of tabular data and their instances from an Excel spreadsheet or a text document, the image-import In Figure 8, data acquisition and preprocessing are performed by individual thermal sensors (T1 and T2).Up to 1000 features are extracted from the thermal (grayscale and binary) images.Thermal blobs gleaned from the ITA sensors are stored in a predetermined folder with timestamps to enable a time-based fusion of the data.During sensor fusion, data from T1 and T2 were imported into the data merging system.The system then created an imaginary table for the two sets of data before carrying out a matching row appending.Whilst file-import enables the reading of tabular data and their instances from an Excel spreadsheet or a text document, the image-import toolkit helps upload images from folders.Information such as image width, size, height, path, and name is automatically appended to each image uploaded in a tabular format.
Preliminary feature extraction was programmed to begin automatically.To ensure that the features are correctly matched, a matching row appending was used.Moreover, definitive feature extraction takes place at a data embedding capsule where more than 1000 features, represented as vectors (n 0 to n 999 ), are extracted from each ITA image.The extraction was performed by using the SqueezeNet architecture, a deep neural network model for image recognition [37].The SqueezeNet architecture contains fewer parameters that require lesser bandwidth and communication across servers during a distributed training process [38].Unlike many sensor fusion or classification architectures that manually allocate clusters to images, the Louvain clustering algorithm [37] was used alongside distance metrics to automatically detect clusters.One of the advantages of using Louvain clustering is that of determining the number of clusters detected.The Louvain clustering algorithm further detects and integrates communities into the module.It also converts grouped features into a KNN graph and optimises their structures to obtain nodes that are interconnected.
Distance metrics, such as the cosine rule, were utilised in the Distances Application (DA).Additionally, feature normalisation, which performed column-wise normalisation for both categorical and numerical data, was applied [37].The output of DA was connected to the hierarchical clustering module for the classification of the distanced features.Moreover, a dendrogram corresponding to a cluster of similar features from the DA was computed using the HCA.The clusters were primarily affected by resolution and Principal Component Analysis (PCA) parameters.In essence, increasing any of these parameters resulted in a corresponding decrease in the number of clusters that the algorithm detected.Data fusion outputs were viewed using a scatterplot, a data table, and a data viewer widget.
One of the advantages of the sensor data fusion architecture proposed in this study includes viewing clusters comprising all similar activities, as presented in Figure 9, even if the activity was performed at different times by different participants.In Figure 9, for example, it could be easily deduced that a participant codenamed C_ID was at the kitchen table with a hot cup of tea/coffee on the 8 May 2019 at a different date and time as another participant codenamed C_OR.With this information, activities can be easily monitored in clusters, notwithstanding the times and dates they were performed.It is important to note that up to 1000 features (labelled n0 to n999) were extracted from each thermal image during the feature extraction process.Using these features, a PCA and scoring of the clusters performed between features n525 and n830 at 99% variance coverage indicated a regression coefficient (r) of 0.98 and 1.00 for Clusters 2 and 12, respectively, as presented in Figure 10.
It is important to note that up to 1000 features (labelled n 0 to n 999 ) were extracted from each thermal image during the feature extraction process.Using these features, a PCA and scoring of the clusters performed between features n525 and n830 at 99% variance coverage indicated a regression coefficient (r) of 0.98 and 1.00 for Clusters 2 and 12, respectively, as presented in Figure 10.
Similarly, a PCA and scoring analysis performed between features n246 and n170 for Clusters 1, 6, and 9 yielded (r) of 0.83, 0.99 and 1.00, respectively, as presented in Figure 11.These resulted in an average (r) of 0.95 for all the tested features and clusters, which were randomly selected from the HCA interface.
To further ascertain the certainty of the predicted clusters, an Evaluation Test was performed on all the clusters in the HCA using the KNN, LR, NN, and RF models.While KNN yielded the lowest CA of 85.0%, LR and NN gave CAs of 96.1% and 100.0%, respectively, as presented in Table 1.In addition, the proportion of true positives of the positively classified instances (Precision) followed a similar trend as the CA.Furthermore, the NN yielded a value of 100.0% for the AUC, F1, CA, Precision, Recall, and Specificity followed by RF with an average of 99.7%, as presented in Table 1.It is important to note that up to 1000 features (labelled n0 to n999) were extracted from each thermal image durin the feature extraction process.Using these features, a PCA and scoring of the clusters performed between featur n525 and n830 at 99% variance coverage indicated a regression coefficient (r) of 0.98 and 1.00 for Clusters 2 an 12, respectively, as presented in Figure 10.Similarly, a PCA and scoring analysis performed between features n246 and n170 for Clusters 1, 6, and 9 yielded (r) of 0.83, 0.99 and 1.00, respectively, as presented in Figure 11.These resulted in an average (r) of 0.95 for all t tested features and clusters, which were randomly selected from the HCA interface.To further ascertain the certainty of the predicted clusters, an Evaluation Test was performed on all the clusters in the HCA using the KNN, LR, NN, and RF models.While KNN yielded the lowest CA of 85.0%, LR and NN gav LogLoss, also referred to as cross-entropy loss, accounts for the performance of the classification model with respect to its variation from the actual label and was relatively low (less than 0.4%) for all the models (Table 1).NN had the most negligible value of 0.001%.While an average regression coefficient of 0.95 was obtained in the PCA and scoring test, an average accuracy of 96.5% was obtained for all the metrics (in Table 1) in the Evaluation Test.
Another demonstration of the accuracy of the architecture was in the analysis of the ceiling and lateral thermal sensors data using the K-Means Clustering Method (KMCM).The KMCM is rated as a useful tool capable of providing quantitative and qualitative insight in multivariate analysis [39].The data fusion and evaluation architecture based on the KMCM [40], is presented in Figure 12.LogLoss, also referred to as cross-entropy loss, accounts for the performance of the classification model with respect to its variation from the actual label and was relatively low (less than 0.4%) for all the models (Table 1).NN had the most negligible value of 0.001%.While an average regression coefficient of 0.95 was obtained in the PCA and scoring test, an average accuracy of 96.5% was obtained for all the metrics (in Table 1) in the Evaluatio Test.
Another demonstration of the accuracy of the architecture was in the analysis of the ceiling and lateral thermal sensors data using the K-Means Clustering Method (KMCM).The KMCM is rated as a useful tool capable of providing quantitative and qualitative insight in multivariate analysis   2.  The KMCM-based architecture (Figure 12) fused thermal blobs data from thermal sensors T1 and T2.The fusion toolkit was linked directly to the image embedder.At the embedder, Inception V3, Google's ImageNet trained model [41] was used to embed the thermal blobs.KMA performed a maximum of 300 iterations of the data after column normalisation in the K-Means toolkit.The output from the K-Means toolkit was used to train DM models such as KNN, NN, SGD, and SVM based on a 66% training-set size.The evaluation result from the analysis based on a 10-fold cross-validation [42] are presented in Table 2. Legend: KNN = K-Nearest Neighbours, NN = Neural Network, SGD = Stochastic Gradient Descent, SVM = Support Vector Machine, CA = Classification Accuracy, and AUC = Area Under the Curve.
In Table 2, an average accuracy of more than 95% was obtained in all the parameters evaluated.The parameters included AUC, CA, F1, Specificity, Precision, and Recall.Specificity has the highest average accuracy of 99.8%, followed by an AUC of 99.1%.CA and F1 had the least accuracy (in Table 2) as 95.0%.A closer look at each model indicated that KNN has the least accuracies in CA, F1, Precision, and Recall.Although the pooled average in KMCM was the same as PCA's, they cannot be directly compared because different models were used in their analysis.KMCM, however, presented a very useful and explanatory analysis of the datasets compared with HCA.
Another KMCM-based analysis was performed to evaluate the models and parameters for T1, T2, and fused (F1) datasets.Data from T1 and T2 were analysed separately for the four models: KNN, SGD, NN, and SVM.The evaluation results are presented in Tables 3  and 4 for T1 and T2, respectively.Legend: KNN = K-Nearest Neighbours, NN = Neural Network, SGD = Stochastic Gradient Descent, SVM = Support Vector Machine, CA = Classification Accuracy, and AUC = Area Under the Curve.Legend: KNN = K-Nearest Neighbours, NN = Neural Network, SGD = Stochastic Gradient Descent, SVM = Support Vector Machine, CA = Classification Accuracy, and AUC = Area Under the Curve.
In Table 3, AUC and Specificity's average accuracy are obtained as 99.6% and 99.7%, respectively.Comparing these values to those in Table 4 (98.3% and 99.3%), AUC and Specificity had their highest accuracies in Table 3.Additionally, the metrics (in Table 3), namely, CA, F1, Precision and Recall obtained accuracies that were 4.1% higher than those in Table 4.A combination of the averages of all the metrics (excluding LogLoss) in Tables 2-4 is presented in Table 5.In Table 5, a combination of the parameters, AUC, CA, Precision, F1, Recall, and Specificity, indicated that T1 has the highest accuracy in all the models compared with those from T2 and F1 datasets.In addition, the Mean accuracy for all the models indicated 98.0%, 95.0%, and 96.5% for T1, T2, and F1 datasets, respectively.This implied that T1 obtained the highest Mean accuracy, followed by F1 and then T2.An interval plot can further illustrate the Mean accuracy of T1, T2, and F1 datasets, as presented in Figure 13.It should be noted that the intervals were calculated using the pooled Standard Deviation (SD).
Nevertheless, although previous analysis indicated a higher Mean average in favour of T1, one-way ANOVA of the models in the T1, T2, and F1 datasets using Welch's Test at 95% Confidence Interval indicated that there was no significant difference (p = 0.105) between the average values of the parameters.In addition, a Two-Sample T-Test between T1 and F1, T2, and F1 indicated no significant difference between the fused data and those from individual SSs, p = 0.08 and 0.156, respectively.Further analysis with Grubbs' Test on T1, T2, and F1 datasets at a 5% significant level indicated no outlier in the Mean values of the datasets.Nevertheless, although previous analysis indicated a higher Mean average in favour of T1, one-way ANOVA of th models in the T1, T2, and F1 datasets using Welch's Test at 95% Confidence Interval indicated that there was no significant difference (p = 0.105) between the average values of the parameters.In addition, a Two-Sample T-Test between T1 and F1, T2, and F1 indicated no significant difference between the fused data and those from individu SSs, p = 0.08 and 0.156, respectively.Further analysis with Grubbs' Test on T1, T2, and F1 datasets at a 5% significant level indicated no outlier in the Mean values of the datasets.
The pooled SD indicating the weighted average of the SDs for the three groups yielded a lower value of 1.5.In addition, the pooled Mean accuracy of all the models and parameters was obtained as 96.5%.Detailed analyses of the Mean values are presented in Table 6.

Discussion and Conclusions
This study presented the fusion of data gleaned from USSs for the purposes of recognising and classifying indoor activities in home environments.It considered the use of DM models and methods for the cluster-based analysis o data obtained from the USSs.Results from data analysis demonstrated a pooled Mean accuracy of 96.5% for all th models and metrics considered in the study.Although the Mean accuracy in F1 data was slightly lower than in T1, one-way ANOVA of the samples, T1, T2, and the F1 datasets indicated no significant difference between their The pooled SD indicating the weighted average of the SDs for the three groups yielded a lower value of 1.5.In addition, the pooled Mean accuracy of all the models and parameters was obtained as 96.5%.Detailed analyses of the Mean values are presented in Table 6.

Discussion and Conclusions
This study presented the fusion of data gleaned from USSs for the purposes of recognising and classifying indoor activities in home environments.It considered the use of DM models and methods for the cluster-based analysis of data obtained from the USSs.Results from data analysis demonstrated a pooled Mean accuracy of 96.5% for all the models and metrics considered in the study.Although the Mean accuracy in F1 data was slightly lower than in T1, a one-way ANOVA of the samples, T1, T2, and the F1 datasets indicated no significant difference between their Mean values.In addition, data fusion provided more information on instances of occlusion, which can make an incident go unnoticed.
The advantage of the proposed method in this work over other indoor activity recognition research [33,43] includes privacy-friendly postures and better accuracy.The accuracies obtained in this work can be compared with those obtained in [44], which used channel state information of a WiFi system to recognise activities such as lying down, standing, and walking.While the WiFi-based system has no information on the postural orientation of participants or the presence of hazardous objects, our model included privacy-friendly postures.Knowledge of the pose of room occupants and the surrounding objects can give further details, such as hot liquid spills, which can be hazardous to vulnerable individuals.The application of this study to smart homes and healthcare facilities can help encourage independent living [45][46][47].
One of the limitations of this study is the use of the contact sensors to determine if an occupant drank tea or coffee during the experiments since both (tea and coffee) were placed in the same cupboard.This implies that depending on the data from the thermal sensors alone, it would be difficult to determine if an occupant had tea or coffee.In a real-life setting, however, this confusion could be resolved if tea and coffee are placed on separate cupboards that are more than 1 m apart.Another challenge with using the thermal sensors only without the contact sensors is determining if the occupant used milk or cold water if both are placed in a similar container.To address this limitation in a real-life application, milk and cold water should be placed in containers of different sizes so that their blobs could be easily differentiated.
In conclusion, this study presented the use of low-cost unobtrusive (privacy-friendly) SSs for indoor ARC in a laboratory kitchen environment similar to a home environment.Experimental results indicated instances of activity recognition during activities such as making a cup of tea/coffee and classification of the same actions using DM models and methods with a pooled Mean predictive accuracy of 96.5%.Future studies will calculate the speed and range of these activities, including the use of DM tools to score and evaluate their performance.

Figure 1 .
Figure 1.Pictorial view of the smart laboratory kitchen used for the study.A detailed description of the kitchen layo is presented in Figure 2.

Figure 1 .
Figure 1.Pictorial view of the smart laboratory kitchen used for the study.A detailed description of the kitchen layout is presented in Figure 2.

Figure 2 .
Figure 2. Laboratory kitchen layout.The areas marked in red indicate the location of the contact sensors.Thermal sensors are indicated by the navy-blue oval shape as T1 and T2 for lateral and ceiling thermal sensors, respectively.The coverage of T1 is indicated by the triangular area, while that of T2 is indicated by the oval area.

Figure 2 .
Figure 2. Laboratory kitchen layout.The areas marked in red indicate the location of the contact sensors.Thermal sensors are indicated by the navy-blue oval shape as T1 and T2 for lateral and ceiling thermal sensors, respectively.The coverage of T1 is indicated by the triangular area, while that of T2 is indicated by the oval area.

Figure 3 .
Figure 3. Thermal blobs of a bottle of cold milk (shades of black) distinguishable from a hot kettle (shades of whit

Figure 3 .
Figure 3. Thermal blobs of a bottle of cold milk (shades of black) distinguishable from a hot kettle (shades of white).

Figure 3 .
Figure 3. Thermal blobs of a bottle of cold milk (shades of black) distinguishable from a hot kettle (shades of whit

Figure 4 .
Figure 4. RGB equivalents of activities: (a) opening the fridge, (b) heating a hot kettle, and (c) having tea or coffee at the kitchen table.

Figure 6 .
Figure 6.Thermal images from the lateral sensor indicating instances of occluded tea/coffee cups that are visi the ceiling sensor.Refer to the thermal blobs with the same name as thermal_345, thermal_357, thermal_58 thermal_598 in Figure 7.

Figure 5 .
Figure5.Distinguishable thermal blobs.On thermal_407, the blue arrow points to the hot kettle, the black arrow points to the participant, and the red arrow to the tea/coffee cup after the initial act of tea/coffee making.

Figure 6 .
Figure 6.Thermal images from the lateral sensor indicating instances of occluded tea/coffee cups that are visible the ceiling sensor.Refer to the thermal blobs with the same name as thermal_345, thermal_357, thermal_587, thermal_598 in Figure 7.

Figure 6 .Figure 7 .
Figure 6.Thermal images from the lateral sensor indicating instances of occluded tea/coffee cups that are visible on the ceiling sensor.Refer to the thermal blobs with the same name as thermal_345, thermal_357, thermal_587, and thermal_598 in Figure 7.Appl.Sci.2021, 11, 9096 9 o

Figure 7 .
Figure 7. Heat pots from tea/coffee cups occluded from the lateral thermal sensor (T1) but indicated by the ceiling thermal sensor (T2).The black arrow on thermal_242 points to the location of T1, the white arrow points to the heat spot, and the red arrow points to the hand of the participant (occluding the heat spot).

Figure 10 .Figure 10 .
Figure 10.Features-based Principal Component Analysis and scoring of clusters.Features n525 and n830 are indicated on the X-and Y-axes, respectively.The clusters are colour coded, and the colour of each regression line on the graph matches the colour on the cluster legend on the right.

Figure 11 .
Figure 11.Features-based Principal Component Analysis and scoring of clusters.Features n170 and n246 are indicated on the X-and Y-axes, respectively.Additionally, the clusters are colour coded, and the colour of eac regression line on the graph matches the colour on the cluster legend on the right.

Figure 11 .
Figure 11.Features-based Principal Component Analysis and scoring of clusters.Features n170 and n246 are indicated on the X-and Y-axes, respectively.Additionally, the clusters are colour coded, and the colour of each regression line on the graph matches the colour on the cluster legend on the right.

[ 39 ]
. The data fusion and evaluation architecture based on the KMCM [40], is presented in Figure 12.

Figure 12 .
Figure 12.Simplified data fusion architecture based on the K-Means Clustering Method (KMCM).

Figure 12 .
Figure 12.Simplified data fusion architecture based on the K-Means Clustering Method (KMCM).

Figure 13 .
Figure 13.Interval plot of Lateral, Ceiling, and Fused datasets computed from their pooled standard deviation.

Figure 13 .
Figure 13.Interval plot of Lateral, Ceiling, and Fused datasets computed from their pooled standard deviation.

Table 1 .
Evaluation results from data mining models for parameters such as AUC, CA, FI, Precision, Recall, LogLoss, and Specificity.

Table 2 .
K-Means evaluation results for fused datasets (F1) using data mining models such as KNN, SGD, NN, and SVM.

Table 2 .
K-Means evaluation results for fused datasets (F1) using data mining models such as KNN, SGD, NN, and SVM.

Table 3 .
Evaluation results for Lateral Sensor (T1) data using the K-Means Clustering Method (KMCM).

Table 6 .
Detailed analyses of Mean values from Lateral, Ceiling, and Fused datasets using one-way ANOVA.

Table 6 .
Detailed analyses of Mean values from Lateral, Ceiling, and Fused datasets using one-way ANOVA.