SemImput: Bridging Semantic Imputation with Deep Learning for Complex Human Activity Recognition

The recognition of activities of daily living (ADL) in smart environments is a well-known and an important research area, which presents the real-time state of humans in pervasive computing. The process of recognizing human activities generally involves deploying a set of obtrusive and unobtrusive sensors, pre-processing the raw data, and building classification models using machine learning (ML) algorithms. Integrating data from multiple sensors is a challenging task due to dynamic nature of data sources. This is further complicated due to semantic and syntactic differences in these data sources. These differences become even more complex if the data generated is imperfect, which ultimately has a direct impact on its usefulness in yielding an accurate classifier. In this study, we propose a semantic imputation framework to improve the quality of sensor data using ontology-based semantic similarity learning. This is achieved by identifying semantic correlations among sensor events through SPARQL queries, and by performing a time-series longitudinal imputation. Furthermore, we applied deep learning (DL) based artificial neural network (ANN) on public datasets to demonstrate the applicability and validity of the proposed approach. The results showed a higher accuracy with semantically imputed datasets using ANN. We also presented a detailed comparative analysis, comparing the results with the state-of-the-art from the literature. We found that our semantic imputed datasets improved the classification accuracy with 95.78% as a higher one thus proving the effectiveness and robustness of learned models.


Introduction
Over the past few decades, a rapid advancement has been observed in pervasive computing for the assessment of cognitive and physical well-being of older adults. For this purpose, monitoring of Activities of Daily Living (ADLs) is often performed over extended periods of time [1]. This is generally carried out in intelligent environments containing various pervasive computing and sensing solutions. Recognition of ADLs has been undertaken across a wide variety of applications including cooking, physical activity, personal hygiene, and social contexts. Generally, solutions for recognizing ADLs are underpinned with rule-based or knowledge-driven supported by conventional Machine Learning (ML) algorithms [2,3]. In such environments, the embedded or wireless sensors generate high volumes of streaming data [4], which in a real world setting can contain huge amounts of missing values or duplicate values [5]. Such noisy and imprecise data may lead to one of the major causes of an erroneous classification or imprecise recognition. Conversely, several challenges also exist while coping with missing values hence an efficient mechanism for imputation of the sensory data are thus

Problem Statement
In this section, we first introduce key definitions, which are carried throughout the paper. These definitions are necessary for understanding concepts referred to in this paper. Later, a robust illustrative example is presented to represent the research problem for HAR referred in this study.

Some Definitions
In this section, we first give preliminary definitions of problems that the methodology aims to address. Laterally, we introduce the notion of Semantic imputation. .., t p } be the set of training tuples for dataset D n containing missing attributes or their values. Let t m is a tuple with q attributes A 1 , . . . A q , which may have one or more missing attributes or its value where t m ∈ T d . Let t ma be the missing attribute A and t mv be the missing value on attribute A where A ∈ A q . Given a candidate imputed set, t m = m 1 (t ma ∪ t mv ) for a possible missing attributes or its value for t m .
Definition 3. (Ontology) A core ontology is a structure O := (C, ≤ c , R, σ, ≤ r ) consisting of two disjoint sets concept identifiers 'C' and relation identifiers 'R', a partial order ≤ c on C, called concept hierarchy or taxonomy, a function σ representing signature, and a partial order ≤ r on R defining relation hierarchy. Definition 6. (Conjunctive Query) Conjunctive queries Q enable answers by identifying attributes or their values, which are rewritten as ∀ĀR(Ā,C k ) ∧ not(N(Ā,C k )) (1) whereĀ represents vector of attributes (A 1 , . . . , A q ), vectors of concept instancesC k , conjoined predicates (relations)R, and a vector of disjoined predicates (relations)N.

Problem Formulation: Semantic Imputation
A Knowledge Base is a consistent structure K = (T , A), and we revise the Abox A to A I such that K = T , A I should also be consistent: since (A m = D n \ A) (2) I(A m ) = I SS (A m ) + I SI (A m ) + I L (A m ) where A m represents missing attributes or their values and I SS (A m ), I SI (A m ), I L (A m ) measure structural-based, instance-based and longitudinal imputations for missing attributes and their values, respectively. Hence, we define our problem in a 4-tuple (D, K, Q, I) such that D denotes the input data, modeled over the ontology O having assertion set A which are retrieved using conjunctive queries Q with the results used to perform semantic imputation I(A m ) introducing improved assertions A I . We ensure that, during the whole process, K remains consistent with the addition of imputed assertions A I .

Preliminaries of Sensing Technologies
In this section, we describe the nature of available HAR public datasets D n with underlying sensing technologies. These can be differentiated into two broad categories of unobtrusive and obtrusive activity sensing based on the wearables and data sources. We, therefore, provide a brief description of both categories using UCamI [19], Opportunity [20], and UCI-ADL [21] public datasets for their distinct sensing functionalities, signal type, sampling frequencies, and protocols.

Unobtrusive Sensing
Unobtrusive sensing enables continuous monitoring of activities and physiological patterns during the daily life of the subject. These wearables most often involve binary sensors (BinSens), PIR sensors, and pressure sensors embedded within smart objects or the ambient environment. BinSens generate an event stream comprising of binary values, working on the principles of the Z-Wave protocol. Such protocols are implemented through some unobtrusive wireless magnetic sensors. This can be explained through the Prepare breakfast example in Figure 1. For 'Pantry', 'Refrigerator', and 'Microwave' objects, Open state means magnets are detached and they are in use, whereas Close state shows they are not in use. The inhabitant's movements are recorded at a sample rate of 5 Hz, using the ZigBee protocol implemented in 'PIR sensors' such as the 'Sensor Kitchen Movement' [22]. It also produces binary values with Movement or No Movement. The presence of an inhabitant on the 'Sofa', 'Chair', and 'Bed' objects are collected via the Z-Wave sensing protocol, implemented through the 'Textile Layer Sensors', which produce binary values Present or Not present. Similarly, a continuous stream of data are also observed for unobtrusive spatial data gathered through the suite of capacitive sensors installed underneath the floor. The dataset generated through the BinSens is of a challenging nature as the duration of the generated stream may be instantaneous, lasting for a few seconds or may continue for hours. As shown in Figure 1, filling the gaps between two states for BinSens is of a challenging nature since every BinSens has a different operation nature and state transition time depending on the activities performed.

Obtrusive Sensing
The proximity data from the Bluetooth Low Energy (BLE) beacons is collected through an android application installed on the smart-watch at a sample rate of 0.25 Hz [22]. BLE beacons are measured through RSSI. The value of the RSSI is higher if there is the smaller distance between an object and the smart-watch and vice versa. BLE beacons are used for 'Food Cupboard', 'Fridge', 'Pot Drawer', etc., for the Prepare breakfast activity example in Figure 1. Ambulatory motion is represented by Acceleration data, which is again gathered through the android application installed on the smart-watch. The 3D acceleration data are collected in a continuous nature using a sampling frequency of 50 Hz. Such acceleration data [20] is also measured through body-worn sensors, object sensors and ambient sensors, which measure 3D acceleration using inertial measurement units, 3D acceleration with 2D rate of turn and 3D acceleration with multiple switches, respectively.

Methodology
In this section, we demonstrate the proposed methodology, overall functional architecture and workflow in Section 3.1. An ontology model to represent the activities is presented in Section 3.2 and a detail of specially designed SPARQL queries for semantic segmentation in Section 3.3. Ontology-based complex activities identification and conjunction separation for semantic data expansion is explained in Section 3.4. An algorithm to perform semantic imputation is then described in Section 3.5. Lastly, the classification method describing HAR using DL based ANNs is presented.

High-Level Overview of the SemImput Functional Framework
The presented work describes a layered Semantic-based Imputation (SemImput) framework, which supports an innovative means to synchronize, segment, and complete the missing sensor data. This is achieved by automatically recognizing the indoor activities within the smart environment. The architecture depicted in Figure 2 comprises of (a) Data Sensing and Representation Layer designed to capture data; (b) the Semantic Segmentation Layer segments the data based on the timestamps for over 1-second; (c) the Semantic Expansion Layer segregates the concurrent activities represented by separate features into a sensor event matrix; (d) the Semantic Imputation Layer, responsible to fill the missing data, sensor states, which are of periodic nature and provides continuity to the data by using the proposed strategies; (e) the Semantic Vectorization receives the filled sensor event matrix and generates vector sets; (f) and finally the Classification Layer, which uses a neural network to classify the augmented Semantic Vectors for evaluation purposes.

Data Sensing and Representation
The Data Sensing and Representation layer utilizes the sensor streams which are simulated over a dynamic sliding window. We used ontological constructs, which are derived through the data-driven techniques for representing sequential and parallel activities. This layer is encapsulated by the newly modeled set of OWL2 Semantic Imputation Ontologies (SemImputOnt) to map sensory data. It models sensor streams, identifies patterns, and discovers the overlapping temporal relations in them. It supports generality in terms of data semantization [23], offers more expressiveness, and helps in decoupling the concurrent fragments of sensor data rather than using non-semantic models. It not only provides a basic model for representing the atomic and complex ADLs but also supports the expansion of dataset instances through the SPARQL queries.

Taxonomy Construction
We followed and utilized the data-driven techniques to model sensor streams for identifying complex concurrent sensor temporal state patterns. These state patterns become the basis for the parallel and interleaved ADLs, which are of static and dynamic nature as mentioned in Table 1. An ontology engineer utilizes the complete knowledge of involved sensors and the nature of the data produced by them. In addition, the core vocabulary required to model and design the SemImputOnt is obtained through the temporal patterns of sensor stream data, describing the complex ADL's main class definitions. The descendants of these main classes, however, have been described to model each sensor object, which generates discrete or continuous sensory data. These primitive classes are related to ADLs using "SensorStateObject" properties. These object properties such as hasBinarySensorObject shows the relationship between the ADL and the core sensor object defining its state. Again, the state is linked by a property hasBinarySensorState with SensorStateObjects. Similarly, the other obtrusive sensor objects have the properties hasAccelerometer, hasBLESensor with the hasRSSI data property. All these sensor objects define the ADL with open intervals without any prior knowledge of Start-time or End-time [1]. The temporal relations for each sensor object are obtained using object properties hasStartTime and hasEndTime.
How comprehensive SemImputOnt is at representing disjoint ADLs can be visualized and explained through an example of the activity Breakfast modeled in Figure 3. In this example, an ADL Breakfast is represented as a class. The ADL Breakfast is a descendant of the Activities class, defined as being an equivalent class relating to the instances of BinarySensorObject, BinarySensorState, Accelerometer, Devices, FloorCapacitance, BLESensors, and DaySession. This means that, to be a member of the defined class Breakfast, an instance of the Activities class must have a property of type hasBinarySensorObject, which relates to an instance of the SensorKitchenMovement class, and this property can only take as value an instance of the SensorKitchenMovement class. The instance of the Activities class must also have a property of type hasBinarySensorState, which relates to an instance of the Movement class, or the NoMovement class, and this property can only take as value an instance of one of them. The instance of the Activities class must also have a property of type hasAccelerometer, which relates to an instance of the x class, y class, and z class. This property must only relate to the instances of these three classes. The instance of the Activities class must also have a property of type hasDevice, which relates to an instance of the Device1 class, and Device2 class. This property must only relate to the instances of these two classes. The instance of the Activities class must also have a property of type hasFloorCapacitance, which relates to an instance of the C1 class, C2 class, C3 class, C4 class, C5 class, C6 class, C7 class, and C8 class. This property must only relate to the instances of these seven classes. The instance of the Activities class must also have a property of type hasBLESensor, which relate to an instance of the Tap class, FoodCupboard class, Fridge class, and WaterBottle class for this example. This property must only relate to the instances of these four classes and every class must also have a property hasRSSI, which relates to the instance of RSSI class. Moreover, the instance of the Activities class must also have a property of type hasDaySession, which relates to an instance of the Morning class and only to an instance of the Morning class. Thus, if an instance of the Activities class fulfills the seven existential restrictions on the properties hasBinarySensorObject, hasBinarySensorState, hasAccelerometer, hasDevice, hasFloorCapacitance, hasBLESensor, and hasDaySession, the instance will be inferred as being a member of the Breakfast class.

Concurrent Sensor State Modeling
The object properties introduced in SemImputOnt as an existential restriction support management of concurrent and sequential sensor states as explained in the Breakfast activity model example. These properties not only describe the hierarchy of sensor object states, and their actions by establishing object-data relationships but also support in augmenting the incomplete sensor sequences using SPARQL queries. Moreover, the relationship also supports, while generalizing data-driven rules as shown in the anonymous equivalent class for the activity Breakfast. These rules map sensor states in SemImputOnt to model an activity rather than tracking rigid sensor state patterns. These sensor state patterns are identified and linked to their respective timestamps using temporal datatype properties such as hasStartTime and hasEndTime. SemImputOnt comprehensively models sensor situations using sensor state concepts independently and concurrently by exploiting their relationships using Allen's temporal operators [15].

Semantic Segmentation
The Semantic Segmentation Layer in the SemImput framework describes the ontological operations to illustrate the modeling patterns of ADLs, by observing them in a sliding window. The first step is to retrieve and synchronize the non-segmented sensor state instances obtained from obtrusive and unobtrusive data sources along with their temporal information. We used a non-overlapping and static sliding time windows [24] approach, in which each sensor state is identified by a timestamp. For this, we used a set of 9 SPARQL-based query templates for retrieving and interpreting rules to deal with underlying temporal sensor state relations, as well as their structural properties. Moreover, the SPARQL queries require additional parameters in order to correlate, interpret, and aggregate sensor states within the endpoints of the sliding window [25]. Some of the initializing parameters include start-time, end-time, and a list of sensors within the sliding window identified based on the start-time and datatype properties. These parameters provide support for manipulating concurrent sensors states, which are expanded and imputed as illustrated in further sections. SemImputOnt is also used for validating temporal constraints and for the verification of property values within a sliding window [26]. The sensor state endpoints are retrieved through the following custom set of conjunctive ABox SPARQL queries CQ where (cq i CQ) over the sliding time window: whereas the concurrent sensor states are retrieved through following SPARQL-based query templates, which are also coincidental at their: • cq 6 : start-time and still Open sensor states • cq 7 : start-time but Closed sensor states • cq 8 : end-time but still Open sensor states • cq 9 : end-time but Closed sensor states The SPARQL query, cq 1 , refers to the identifiers from the SemImputOnt retrieved instances, which are still active but are yet to be finished. These states are identified based on their initialization timestamps represented by the start-time. The query cq 2 retrieves SemImputOnt instances having both endpoints identified by start-time and end-time. The query cq 3 retrieves the start-time of the sensor initialization, which may deactivate and at the same time becomes active in a current sliding time window. The query cq 4 retrieves sensor state, which has just started in the sliding window; this query provides the start-time. The query cq 5 , a specially designed query to monitor the sensor state, which is currently active in the sliding window and changes its states to deactivation or off state. This query retrieves the end-time for such state transition. The query cq 6 retrieves active concurrent sensor states for more than one sensor, based on the start-time within the current sliding time window which is yet to finish. The query cq 7 on the other hand fetches the start-time for such concurrent sensors, which have closed states with valid end-times. Similarly, the queries cq 8 and cq 9 retrieve the active and inactive concurrent sensor states based on some end-time data value, respectively. The above-mentioned queries cq 3 , cq 4 , and cq 6 are responsible for initializing a separate thread to monitor and keep the track for sensor states which are to become inactive by identifying the end-time.
The segments returned through the SPARQL queries may be considered complete if they contain both the endpoints represented by dissimilar sensor states. If one of the end points goes missing, however, the segment becomes anomalous or erroneous in the sensor stream data. Such erroneous behavior is identified by using semantic data expansion and resolved through the semantic imputation.

Semantic Data Expansion
The proposed set of SemImputOnt models sensor objects (concepts and properties) and their states (instances) from the segmented D n datasets. It not only maps sensor streams but also captures structure, preserving the associations within the sensor state instances using a data-driven approach. A structure-preserving transformation encompasses each sensor object, their associations, and subsumptions relating to different concurrent activities [27]. These preserved semantics and associations are separated by understanding the complex activity structures. The separation process includes conversions of these semantics into distinct columns while conjunctions in between them provide essential existential conditions for representing activities in a matrix.

Ontology-Based Complex Activity Structures
To encode more detailed structure, the SemImputOnt uses primitive and defined concepts with value-restriction and conjunctions as concept-forming operators. These value restrictions are enforced through classifiable attributes (roles) and non-classifiable attributes (non-definitional roles) to model HAR datasets. In SemImputOnt, primitive-concepts (Activities) provide necessary conditions for membership, whereas defined concepts (Sensors, Objects, Data sources) provide both necessary and sufficient conditions for membership as mentioned below: where A is any Activity name, and C defines a primitive concept or a defined concept as mentioned in Equations (4) and (5), respectively. These concepts are used to form an expression, which can be either a sensor state, or conjunction of sensor states with or without a value-restriction as described below: Here, A 1 , A 2 are attribute, R is a conjoined predicate, and C 1 , C 2 are concept instances forming expressions.
Utilizing the Description Logic (DL) notations, an example of Breakfast Activity from UCamI dataset can be described in DL expression as: Morning whereas the same activity Breakfast using the DL attributes from UCI-ADL dataset is described as: In both the expressions, the activity Breakfast is represented by different concept attributes modeled into their corresponding ontologies in the SemImputOnt. It is evident that this activity is represented by different sets of underlying ontological concepts depending upon the nature of sensors deployed for acquiring the datasets for that activity. Keeping the same definition of each activity represented by different underlying constructs may result in recognition performance degradation. For this reason, they are defined separately, as the focus of the study is to fill in the gaps for missing sensor states.
The primitive concepts are mapped into partial concepts using Web Ontology Language (OWL), which are encoded with rdfs:subClassOf construct (Equation (4)). In addition, the defined concepts are mapped to complete concepts in OWL, which are encoded as class equivalence axioms represented as owl:equivalentClass (Equation (5)). The concept names and concept conjunctions are mapped to class names and class intersections in OWL, respectively, whereas roles are mapped with object properties. These primitive and defined concepts definitions map the data instances into SemImputOnt models for representing complex activities.

Conjunction Separation
The concepts expressed in the DL for Breakfast definition uses conjunctions for relating the sensor state events [28]. The Breakfast equivalent class forming a complex activity with the involvement of several Class concepts, relationships (object & data properties), and data instances. All the involved Class concepts coupled with conjunctions defining the Activity equivalent classes are transformed into independent entities by separating them based on involved conjunctions [14]. Conjunction separation emphasizes the idea of concept (ϕ, ψ, ω, χ . . . ) separation over the intention I such as: These independent entities are transformed into multi-dimensional vectors representing the features from all sensor states for a particular activity w.r.t. associated timestamps. The size of the multi-dimensional vector may vary for each activity based on the conjunctive class concepts learned through the data modeled over SemImputOnt.

Feature Transformation
The predicates separated in the previous step produces a row vector identified by a single activity label, whereas column represents the class concepts with states as an instance. These predicates in the feature space represent activities along with the timeline. These features ensure the reliability of activities through mappings with the SemImputOnt [12,28]. In our case, SemImputOnt supports essential properties while generating and validating the data into ABox A features as provided using an example from the UCamI dataset.

Semantic Data Imputation
The resulting n-dimension feature vector matrix has missing sensor states (Null), which lead to the loss in efficiency for the activity classification model. Such losses can be dealt with suitable imputation techniques, which enriches the expanded data semantically by filling in the missing sensor states. We propose a Semantic Imputation algorithm to capture the temporal missing sensor states semantically and perform an overall feature vector matrix enrichment [29]. We adapt two similarity-based methods and a time-series longitudinal imputation strategy to assess similarity of the concepts T and instances A for imputation I(A m ) as described in Algorithm 1. for all timestamp t = 1 to T do 3: function ImputeBinSens(A m , CQ, A, T) BinSens attrib with their state imputation 4: for (cq i CQ) do 5: BinSens Attrib ← execute(cq i ). f ilter(BinSens, A m ) using SPARQL Queries 6: BinSens Target ← execute(cq i ). f ilter(BinSens Attrib , T )

26:
function ImputeFloor(A m , CQ, A) Imputation for Floor sensors and their values 27: for (cq i CQ) do 28:

38:
A Acc ←Update A mAcc ∪ mean(acctuples) update using mean for last 10 tuples 39: Return Imputed A Acc

40:
A Imp m ← A BS A Prox A f loor A Acc The structural patterns in TBox (T ) are identified and exploited using SPARQL queries over the SemImputOnt. These queries could retrieve T assertions based on the query criteria to measure semantic similarity with target activity patterns. However, choosing a suitable pattern from target activities and selecting the appropriate sensor state to fill in the missing ones is addressed through structure-based similarity measure. We define structural similarity function for a target set of description A n and activity A m with missing attributes to identify maximum probability as: It returns semantically equivalent sensor states where the child nodes for two concepts are similar [30]. We use the Tanimoto coefficient between A n and A m for measuring the structural similarity. A n gives the binary description for the involved sensors and A m are the available sensor predicates for the activity with missing predicates mentioned below: The I SS (A m ) function determines the structural similarity among the target A n and A m , the higher the numerical value is, a more closer structural description of A m instance is with A n description [31,32]. As a result, structural attributes are suggested for a tuple A m with missing attributes.

Instance-Based Imputation Measure
The ABox A is comprised of a finite set of membership assertions A referring to the concepts and membership roles to their respective TBox T . The set of assertions A for the UCamI dataset is represented as: A ← (ts, r s , R i , V i ) Each of the assertion is a combination of sensors r s with their certain states V i at a timestamp ts.
where m is the mapping between A n and A m in conjunction with concept-to-concept and roles-to-roles. In addition, A n A m represents the disjoint union of memberships pertaining to concepts and their roles between them. Instance-based similarity exploits neighborhood similarity by measuring similarity through Sim I (A n , A m ) function. Thus, an instance with high similarity value is chosen for attribute states to be imputed for a tuple A m with missing states.

Longitudinal Imputation Measure
The quality of data, resulting from structure and instance-based imputation in a matrix form, is further improved by using classical techniques of Last Observation Carried Forward (LOCF) and Next Observation Carried Backward (NOCB). LOCF and NOCB are applied to the data in an observable manner by analyzing each longitudinal segment, as described in Equation (7), for activity states retrieved through SPARQL queries. While observing the binary sensors and their states in a time series longitudinal segments, it is observed that the sensor states are triggered once either for activation or deactivation. For example, an object Washing Machine in UCamI dataset has a contact type sensor with Open state at T 1 = 2017-11-10 13:37:56.0 and Close state at T 2 = 2017-11-10 13:38:39.0. In this case, while synchronizing this sensor data with other states per unit time, Null values appear after T 1 till T 2 as the states triggered for once. For this LOCF, a sample-and-hold method is activated, which carries forward the last state and imputes the Null values with this last available sensor state. Similarly, NOCB imputes the missing values from next available state, which is carried backwards. The missing states for Proximity sensors in the case of the UCamI dataset are imputed in a slightly different way as elaborated in Algorithm 1. It identifies the proximity sensors and their respective RSSI values within the sliding window. The proximity sensor utilizes maximum value imputation in which the LOCF method is applied until some other proximity sensor with a value greater than the already known value is identified. For continuous data such as Floor and Acceleration, a statistical approach is adopted to replace the missing states with the mean of corresponding observed attributes. Mean imputation method tends to be robust and easy to substitute the missing values.

Classification
To cross examine the effectiveness for imputed datasets using proposed SemImput framework, we used a Deep Learning-based Artificial Neural Network (ANN) classifier [34]. The experimental results proved to be suitable for multimodal, multi-sensory, and multi-feature datasets for HAR. For this, an ANN model is trained with the labeled 2D training matrix instances for the UCamI, Opportunity and UCI-ADL datasets. The computational complexity and recognition accuracies are then assessed.

One-Hot Code Vectorization
It has been observed as advantageous to transform categorical variables using suitable feature engineering before applying neural network [35]. For this, we used one-hot encoding, a robust feature engineering scheme, for generating the suitable feature vector indices [16]. These categorical features are mapped into sensor state vector indices representing the concurrent sensor activation patterns for a particular activity. This scheme expands the dimension of the feature matrix for 2 n possible combinations based on the binary states for the "n" sensors involved in the feature vector. As described in Algorithm 2, n-dimensional sparse vector per unit time is obtained for populating feature matrix required for classification. The value 1 is encoded where the sensor has an active state and the value 0 is assigned for missing state in a row vector [35]. The missing value indicator r in the matrix is represented as r n,p with n th row and p th column:

Artificial Neural Networks for HAR
We introduced a Semantic Deep Learning-based Artificial Neural Network (SemDeep-ANN) having the ability to extract hierarchy of abstract features [36,37] using a stack of convolutional operators, which are supported by Convolutional Neural Networks (CNN). SemDeep-ANN consists of three layers namely input layer, hidden layers, and output layer, which use vectorized data to train model for probability estimation over the test data. The estimated probabilities are obtained from the output layer through the soft_max activation function in addition to gradient descent algorithm. Further details of the SemDeep-ANN are given in Algorithm 3. for all timestamp t = 1 to T do 3: function BinSensVectorization(CQ, A Imp m ) 4: for (cq i CQ) do 5: BinSens Attrib ← execute(cq i ). f ilter(BinSens, A Imp m )) using SPARQL Queries 6: BinSens states ← execute(cq i ). f ilter(BinSens Attrib )) 7:

21:
M ← BinSens stride Prox stride A f loor A Acc applying nonlinear transformation σ using y = σ w T x + b 9: f c y ← f ully_connected_NN(y)

10:
A n ← so f t_max( f c y ) Update weights in the network 11:

12:
Compute Cross entropy gradient Use trained network to predict Activity labels 13: Apply gradient descent Update network parameters 14: Activity Labels ← Use trained network model Predict labels

Results and Discussion
The performance evaluation for SemImput framework is measured using non-imputed and semantically imputed HAR datasets. The results are compared with other popular methods, which were investigated using the same datasets.

Data Description
To compare the HAR performance of the proposed SemImput framework, firstly, the experiments were performed on the UCamI dataset. It offers recognition of 24 set of activities for non-imputed and imputed datasets. Secondly, the Opportunity dataset contains manipulative gestures of short duration such as opening and closing, of Doors, Dishwasher, and Drawers. These were collected for four subjects who were equipped with five different body attached sensors for the tracking of static and dynamic activities [38]. Due to the involvement of several sensors, data transmission problems among wireless sensors lead to segments of data being missed represented by Null. For this reason, we analyzed the data and performed the required imputation in order to complement the missing segments of data [37,39]. Lastly, we tested SemImput framework on the UCI-ADL dataset, which was collected while monitoring 10 different ADLs [40] using passive infrared, reed switches, and float sensors. These sensors were used to detect motion, opening and closing binary states of the objects and activities such as toileting, sleeping, Showering.

Performance Metrics
We measured the impact of imputation against the non-imputed datasets using commonly used metrics, such as accuracy, precision, and f-measure. The SemDeep-ANN models were validated by splitting the datasets independently into train and test sets using a leave one day out approach. During the evaluation process, we retained one full day from each of the dataset for testing, whereas the remaining samples are used as a training set. This process is repeated for each day, with the overall average accuracy obtained as a performance measure.

Discussion
This study examines and evaluates the SemImput framework for HAR classification results for which the precision and recall curves are shown in Figure 4a-h. The framework achieved an overall accuracy of 71.03% for set of activities recognized from non-imputed UCamI dataset as mentioned in Table 2. The activity Prepare breakfast (Act02) yielded the highest precision of 87.55%, but it was also misclassified with the activities Breakfast (Act05) and Dressing (Act22) respectively. Similarly, the activity Enter the Smartlab (Act10) was also classified with the highest precision, it was, however, misclassified as the activity Put waste in the bin (Act15). The activity Breakfast (Act05) with the lowest precision 52.14% was mostly misclassified as activities Prepare breakfast (Act02) and Wake up (Act24). Furthermore, the activity Eat a Snack (Act08) with lower precision of 57.95% was misclassified as the activity Prepare Lunch (Act03) due to the involvement of similar sensors and floor area. The activity Visit in the SmartLab (Act14) and Wash dishes (Act19) was hard to detect as they have lessor number of annotated examples. The experimental results indicate an increased recognition accuracy to 92.62% after modeling the UCamI dataset into ontology-based complex activity structures and by performing the semantic imputation as shown in Figure 4b. The plot for these illustrates that the activity Breakfast (Act05) having the lowest recognition precision of 81.54% was most often classified as the activity Prepare breakfast (Act02). The activities Play a videogame (Act11) and Visit in the SmartLab (Act14) were recognized with 100% accuracy, which were having lower accuracies with the non-imputed data. Similarly, the activity Relax on the sofa (Act12) was also recognized with the highest precision rate of 98.44% as shown in Table 2. This suggests that semantic data imputation provided positive data values, which resulted in the increase of classification accuracies for individual activities.    The Opportunity dataset represents 17 ADLs and is of complex nature by having missing samples labeled as Null due to sensor disconnections. Figure 4c,d shows the per class precision and recall for recognized ADLs with the Opportunity dataset. The presented framework evaluates the Opportunity dataset without the 'Null' class by obtaining an overall accuracy of 86.57%, and an increased accuracy with the imputed dataset by 91.71%. The comparisons for both confusion matrices are shown in Table 3.
As shown in Figure 4e,f for the UCI-ADL Ordóñez-A raw dataset, an overall classification result with 82.27% accuracy was obtained. It included activities like Grooming, Spare_Time/TV, and Toileting having the most number of instances and the activity Lunch with minimum number of instances. However, the classification results as mentioned in Table 4 show that the activities Leaving and Breakfast have the highest recognition accuracy as compared to the activity Grooming with the lower classification accuracy. In order to verify the proposed SemImput framework, it was also tested on the semantically imputed UCI-ADL Ordóñez-A dataset. This resulted in an increased recognition accuracy for activities such as Breakfast, Lunch, and Leaving significantly as shown in Figure 4f. It was due to the introduction of the semantic structure understanding of events with respect to morning, afternoon, and generalization of semantic rules for such activities for imputing missing values. The improvement in statistical quality through imputation raised the recognition accuracy significantly up to 89.20%. Similarly, an increased performance is also observed for the UCI-ADL Ordóñez-B dataset for the overall activities with imputed data, especially for the Dinner and Showering as shown in Table 5. The global accuracy for UCI-ADL Ordóñez-B dataset was improved from 84.0% to 90.34%, which also proves the significance of proposed framework as shown in Table 6.  (Imputed) Ground Truth Activities

Method Datasets Number of (Mean Recognition Accuracy) Standard Activities
Non-Imputed Imputed Deviation As shown in Table 7, the proposed SemImput framework along with SemDeep-ANN model not only improved the recognition rate for individual activities within the datasets but also improved the global accuracy over each dataset. We also compared the activity classification performance of our framework with a different state-of-the-art methods. The presented results show the potential of SemImput framework with significant accuracy gain. Although for the UCI-ADL Ordóñez-A and Opportunity datasets, our methodology was worse, it still achieved significant recognition performance score of 89.20% and 91.71%, respectively. These findings show that combining the ADLs classification with semantic imputation can lead to comparatively better HAR performance.

Conclusions and Future Work
This paper proposed a novel SemImput framework to perform Semantic Imputation for missing data using public datasets for offline recognition of ADLs. It leverages the strengths of both structure-based and instance-based similarities while performing semantic data imputation. By using ontological model SemImputOnt, it uses SPARQL queries executed over the ABox data for semantic data expansion, conjunction separation, identification of missing attributes, and their instances leading towards semantic imputation. In order to further increase the quality of the data, we also utilized time-series longitudinal imputation. The obtained results and presented analysis suggest that gain in recognition accuracy varies with the nature and quality of dataset through the SemImput. We validated it, over UCamI, Opportunity, and UCI-ADL datasets. It achieves the highest accuracy of 92.62% for UCamI dataset using a SemDeep-ANN pre-trained model. A substantial, comprehensive, and comparative analysis with state-of-the-art methodologies for these three datasets were also performed and presented in this paper. Based on the empirical evaluation, it was shown that DeepSem-ANN consistently performed well on semantically imputed data by achieving an improved overall classification accuracy. Such a technique can be applied for HAR based systems, which generate data from obtrusive and unobtrusive sources in a smart environment. In the future, we plan to explore, execute, and enhance the SemImput framework for real-time HAR systems. Furthermore, we plan to extend our methodology for improving longitudinal imputation as some accuracy degradation is observed while recognizing HAR. We believe that our approach will help in increasing the quality of smart-home data by performing missing data imputation and will increase the recognition accuracy. On the negative side, the SemImput framework requires an ontology modeling effort for any activity inclusion or an introduction of a new dataset. For this, we plan to explore a scheme for unified activity modeling ontology for representing the same activities and investigate it further for HAR performance.
Author Contributions: M.A.R. is the principal researchers and main authors of this work. He has initially proposed the idea, implemented, and evaluated the methodology. I.C. has reviewed the initial manuscript and modified representations. S.L. and C.N. have supervised the whole ideation phase and provided discussions for achieving this scientific content. All authors have read and approved the final manuscript.