Support Vector Machine Binary Classiﬁers of Home Presence Using Active Power

: The intelligent analysis of electrical parameters has been facilitated by the Internet of Things (IoT), with capabilities to access a lot of data with customized sampling times. On the contrary, binary classiﬁers using support vector machines (SVM) resolve nonlinear cases through kernel functions. This work presents two binary classiﬁers of presence in the home using total household active power data obtained from the automated reading of an IoT device. The classiﬁers consisted of SVM using kernel functions, a linear function, and a nonlinear function. The data was acquired with the Emporia Gen 2 Vue energy monitor for 20 days without interruption, obtaining averaged readings every 15 min. Of these data, 75% was for training the classiﬁers, and the rest of the data was for validation. Contrary to expectations, the evaluation yielded accuracies of 91.67% for the nonlinear SVM and 92.71% for the linear SVM, concluding that there was similar performance.


Introduction
Multiple proposals allow automation of the home, requiring various sensors to determine the states inside and outside the house [1,2]. These intelligent devices facilitate data acquisition and decision-making. Devices that help reduce energy consumption are highly appreciated but have high costs [3]. Additionally, the recent pandemic has led to severe economic problems [4]. Thus, low-cost technologies have emerged to facilitate access to these features through data acquisition and control devices [5]. But even with these benefits, the infrastructure of many homes makes it difficult to implement new technologies. On the other hand, artificial intelligence can potentially reduce the number of devices required to determine states in the home [6].
The Internet of Things (IoT) is a technology in constant growth, generating different commercial and research proposals that allow the automation of various processes to include the best features [7][8][9][10]. However, these services have been criticized for multiple reasons, such as security, privacy, standardization, ethics, scalability, reliability, and quality [8,[11][12][13][14][15]. On the other hand, the IoT offers not only automated systems but also offers real-time data availability and the continuous acquisition of information for subsequent analysis. This allows precise decisions to be made that feedback on the operation of the system to adapt to the needs of the environment [16][17][18].
At present, the IoT is widely used in electric power data acquisition and monitoring, and for controlling network statistics for efficient power supply [19][20][21][22]. On the other hand, the smart home is an interconnected home where all kinds of elements interact through the Internet, automating domestic activities. These benefits include the monitoring of electrical energy, which allows decisions to be made to optimize the use of electricity [23][24][25]. Several features of these smart homes include managing lighting, security, and more. Improvements may include the integration of sunlight and efficient management of the running time of electrical appliances [26][27][28]. For these features to work correctly, it is necessary to provide the system with a certain intelligence [29,30]. Some benefits of energy optimization using artificial intelligence are the derivation of comfortable temperature, device-free sleep prediction, and occupancy-based outing prediction.
Machine learning (ML) has been widely used in the modeling and prediction of power systems because it allows the simulation of offline behavior and anticipates failures. This has led to a remarkable increase in the precision, robustness, and ability to generalize the behavior of these systems [31][32][33]. The models obtained with ML allow forecasting of the energy consumption and performance of buildings, and although there are many techniques for determining these models, these all provide reasonable accuracy by providing a large amount of data and optimizing the parameters [6,34,35]. A prediction model based on a support vector machine (SVM) is fast, easy, reliable, and accurate, and in energy, has been used to forecast solar and wind resources [36,37]. Classically, SVMs are linear binary classifiers, but these have been extended to nonlinear cases using kernel functions. Linear SVM is characterized by requiring less training time, but regarding performance, there are discrepancies about which is better, obtaining different results depending on the case [38][39][40].
Regarding similar works, in [41], the authors used machine learning for the nonintrusive detection of absence in the home based on the electrical use of household appliances. Several machine learning algorithms were evaluated, and the results show that detecting home absences using the energy consumption of household appliances is feasible. On the other hand, in [42], the authors developed a home care video monitoring system to detect abnormal and normal events. Specifically, in the decision-making component, the SVM method was applied. The experimental results were obtained using a fall detection dataset to validate the reliability of the proposed method, achieving a high detection rate.
In [43], a new approach for load monitoring was proposed that aimed at activity recognition based on IoT architecture. The primary function of appliance recognition is to tag the sensor data and allow the implementation of different applications for the home. For classification, three proposals are used: forward neural network, short-term memory, and SVM. The characteristics are extracted according to the consumption (watt-hour) and the hours of ignition. In validation, the system was applied to a different house, significantly reducing the accuracy. The results suggest that before having the system fully operational, it might be necessary to retrain the classifier with the new data.
Finally, [44] presented two new incremental SVM methods to improve the performance of SVM classification for the recognition of human activity. Feature extraction was based on the dependency sensor and focused only on the last sensor event. A cluster-based approach and a similarity-based approach were also used to boost the learning performance of incremental SVM algorithms that exploit the relationship between the data chunk and the previous chunk's support vectors. The results demonstrated the feasibility and improvements in real-time learning and classification performance achieved by these methods. The similarity-based incremental learning was 5 to 9 times faster than other methods, in terms of training performance. And the introduced last-state sensor function method achieved an improvement of at least 5% over the reference SVM classifier.
This manuscript presents two home presence classifiers based on active power data obtained over 20 days in an automated manner using an IoT device. These types of classifiers are useful in the development of sustainable cities. The document has been organized into four sections: Section 1 presents the introduction to the topic; Section 2 describes materials and methods; Section 3 presents the results of the training and evaluation; and, Section 4 contains the conclusions.

Materials and Methods
This research was developed according to the scheme in Figure 1. Initially, the total household active power data was continuously acquired for a period of time. This was done through a commercial device with IoT features; in this way, the data of the mobile application were accessed to export later. At the same time, the presence of the inhabitants in the home was recorded to generate the supervised learning labels. A part of the data Designs 2022, 6, 108 3 of 11 was used in the training of two SVM models, a nonlinear model and a linear model. In the evaluation, the rest of the data was used to determine the performance of each classification model and compare the results.

Materials and Methods
This research was developed according to the scheme in Figure 1. Initially, the total household active power data was continuously acquired for a period of time. This was done through a commercial device with IoT features; in this way, the data of the mobile application were accessed to export later. At the same time, the presence of the inhabitants in the home was recorded to generate the supervised learning labels. A part of the data was used in the training of two SVM models, a nonlinear model and a linear model. In the evaluation, the rest of the data was used to determine the performance of each classification model and compare the results.

Data
The data was obtained with an Emporia Gen 2 Vue energy monitor [45]; this device has a wireless internet connection via WiFi. In addition, it includes clamp sensors to measure electrical current of up to 200 amps without contact. Figure 2 shows the connections made for monitoring electricity in the home for data acquisition (L.-Line, N.-Neutral). The voltage was obtained by connecting the live and the neutral directly to the energy monitor and the current using the clamps for each cable. The meter was installed after the protection breaker. On the other hand, the states of the household are obtained manually, registering the labels of state 1 for presence and state −1 for non-presence (absence). In the study home, there were four people with different work schedules, and sleep hours were recorded as non-presence. The sensor has a sampling frequency of up to 1Hz and an accuracy of 2%. Table 1 presents more information about the sensor.

Data
The data was obtained with an Emporia Gen 2 Vue energy monitor [45]; this device has a wireless internet connection via WiFi. In addition, it includes clamp sensors to measure electrical current of up to 200 amps without contact. Figure 2 shows the connections made for monitoring electricity in the home for data acquisition (L.-Line, N.-Neutral). The voltage was obtained by connecting the live and the neutral directly to the energy monitor and the current using the clamps for each cable. The meter was installed after the protection breaker. On the other hand, the states of the household are obtained manually, registering the labels of state 1 for presence and state −1 for non-presence (absence). In the study home, there were four people with different work schedules, and sleep hours were recorded as non-presence.
The sensor has a sampling frequency of up to 1Hz and an accuracy of 2%. Table 1 presents more information about the sensor.

Materials and Methods
This research was developed according to the scheme in Figure 1. Initially, the total household active power data was continuously acquired for a period of time. This was done through a commercial device with IoT features; in this way, the data of the mobile application were accessed to export later. At the same time, the presence of the inhabitants in the home was recorded to generate the supervised learning labels. A part of the data was used in the training of two SVM models, a nonlinear model and a linear model. In the evaluation, the rest of the data was used to determine the performance of each classification model and compare the results.

Data
The data was obtained with an Emporia Gen 2 Vue energy monitor [45]; this device has a wireless internet connection via WiFi. In addition, it includes clamp sensors to measure electrical current of up to 200 amps without contact. Figure 2 shows the connections made for monitoring electricity in the home for data acquisition (L.-Line, N.-Neutral). The voltage was obtained by connecting the live and the neutral directly to the energy monitor and the current using the clamps for each cable. The meter was installed after the protection breaker. On the other hand, the states of the household are obtained manually, registering the labels of state 1 for presence and state −1 for non-presence (absence). In the study home, there were four people with different work schedules, and sleep hours were recorded as non-presence. The sensor has a sampling frequency of up to 1Hz and an accuracy of 2%. Table 1 presents more information about the sensor.

Classifiers
For this work, two binary classifier models were used; the objective was to determine the presence in the home, and the classifiers were based on [46]. The general SVM model is expressed according to Equation (1), where b is the bias and α i are the coefficients of the model, depending on the number of support vectors. This model was based on a kernel function K(x 1 ,x 2 ) dependent on the study variables; in this case, x 1 represented 24 h a day, and x 2 represented active power measurements in units of watts [W]. That is, it was about predicting the presence in the home according to the time of day and active power measurements.
For this investigation, two SVM models were obtained, a nonlinear model according to the Gaussian kernel function (2) and a linear model according to the linear kernel function (3), to discern the best model between two different options.

Results
The data were recorded for 20 days continuously. In total, 480 h of recording were obtained; Figure 3

Training
For the training of both SVM models, 75% of the data obtained was used, that is, the active power record of 15 days. In total, 360 continuous hours of measurements were available, which were 1440 active power readings, as shown in Figure 4.

Training
For the training of both SVM models, 75% of the data obtained was used, that is, the active power record of 15 days. In total, 360 continuous hours of measurements were available, which were 1440 active power readings, as shown in Figure 4.

Training
For the training of both SVM models, 75% of the data obtained was used, that is, the active power record of 15 days. In total, 360 continuous hours of measurements were available, which were 1440 active power readings, as shown in Figure 4. The training data was organized in the 24 h of the day, including the labels of presence and non-presence in the home. Figure 5 shows the dispersion of active power readings according to the status label. In general, the highest readings correspond to the presence in the home, although there were particular cases with non-presence labels. The training data was organized in the 24 h of the day, including the labels of presence and non-presence in the home. Figure 5 shows the dispersion of active power readings according to the status label. In general, the highest readings correspond to the presence in the home, although there were particular cases with non-presence labels.  The nonlinear SVM model was obtained, identifying 400 support vectors, which was 27.74% of the data used in training. Figure 6 shows the nonlinear binary classifier with the support vectors. Two particular cases of this model are evidenced. Between the 2 and 3 h of the day, there was always non-presence, because everyone was sleeping. And between the 20 and 22 h (GMT-5) of the day, they were always at home. The nonlinear SVM model was obtained, identifying 400 support vectors, which was 27.74% of the data used in training. Figure 6 shows the nonlinear binary classifier with the support vectors. Two particular cases of this model are evidenced. Between the 2 and 3 h of the day, there was always non-presence, because everyone was sleeping. And between the 20 and 22 h (GMT-5) of the day, they were always at home. The nonlinear SVM model was obtained, identifying 400 support vectors, which was 27.74% of the data used in training. Figure 6 shows the nonlinear binary classifier with the support vectors. Two particular cases of this model are evidenced. Between the 2 and 3 h of the day, there was always non-presence, because everyone was sleeping. And between the 20 and 22 h (GMT-5) of the day, they were always at home. On the other hand, the linear SVM model was obtained, identifying 341 support vectors, which was 23.65% of the data used in training. Figure 7 shows the linear binary classifier with the support vectors. A line with a slight negative slope is evidenced with the vertical intersection located approximately at 314 [W]. On the other hand, the linear SVM model was obtained, identifying 341 support vectors, which was 23.65% of the data used in training. Figure 7 shows the linear binary classifier with the support vectors. A line with a slight negative slope is evidenced with the vertical intersection located approximately at 314 [W].

Validation
For the validation of both SVM models, 25% of the remaining data was used, that i the active power record of 5 days. In total, 120 continuous hours of measurements wer available, which included 480 active power readings, as shown in Figure 8.

Validation
For the validation of both SVM models, 25% of the remaining data was used, that is, the active power record of 5 days. In total, 120 continuous hours of measurements were available, which included 480 active power readings, as shown in Figure 8.
In evaluating the nonlinear SVM model, the classification presented in Figure 9 was obtained, observing the classification errors for both labels. In addition, Table 2 shows the confusion matrix, determining an accuracy of 97.4% in the detection of non-presence, and an accuracy of 81.4% in the detection of presence; the results indicate a greater number of false negatives. In summary, the nonlinear SVM classifier had an accuracy of 91.67%.

Validation
For the validation of both SVM models, 25% of the remaining data was used, that is, the active power record of 5 days. In total, 120 continuous hours of measurements were available, which included 480 active power readings, as shown in Figure 8. In evaluating the nonlinear SVM model, the classification presented in Figure 9 was obtained, observing the classification errors for both labels. In addition, Table 2 shows the confusion matrix, determining an accuracy of 97.4% in the detection of non-presence, and an accuracy of 81.4% in the detection of presence; the results indicate a greater number of false negatives. In summary, the nonlinear SVM classifier had an accuracy of 91.67%.  In evaluating the linear SVM model, the classification presented in Figure 10 was obtained, observing the classification errors for both labels in the same way. In addition, Table 3 shows the confusion matrix, determining an accuracy of 91.75% in the detection of non-presence and an accuracy of 85.47% in the detection of presence. However, false negatives also predominated, and the accuracy improved in the presence prediction. In summary, the linear SVM classifier had an accuracy of 92.71%, a value slightly higher than that obtained in the nonlinear SVM classifier.  In evaluating the linear SVM model, the classification presented in Figure 10 was obtained, observing the classification errors for both labels in the same way. In addition, Table 3 shows the confusion matrix, determining an accuracy of 91.75% in the detection of non-presence and an accuracy of 85.47% in the detection of presence. However, false negatives also predominated, and the accuracy improved in the presence prediction. In summary, the linear SVM classifier had an accuracy of 92.71%, a value slightly higher than that obtained in the nonlinear SVM classifier.
In evaluating the linear SVM model, the classification presented in Figure 10 was obtained, observing the classification errors for both labels in the same way. In addition, Table 3 shows the confusion matrix, determining an accuracy of 91.75% in the detection of non-presence and an accuracy of 85.47% in the detection of presence. However, false negatives also predominated, and the accuracy improved in the presence prediction. In summary, the linear SVM classifier had an accuracy of 92.71%, a value slightly higher than that obtained in the nonlinear SVM classifier.

Discussion
In this investigation, power readings were used to determine presence in the home. For this, SVM classifiers were used, which are widely used due to their speed of training and response. In addition, in this proposal, the data does not include pre-processing to guarantee applicability with minimum computational requirements. The excellent performance of the classifiers confirms this assertion, despite using raw data.
In the literature, few non-invasive studies have determined activity at home, and most of those studies used multiple sensors installed in the home. In [44], the authors used modern classifiers (incremental SVM), but this classification is based on data from various sensors of temperature, movement, and door status. Thus, the results are not comparable, and complex infrastructure is required to implement a similar system.
In the results of [43], the precision was around 80% for the SVM classifier, which is different from our work, where both SVM classifiers exceeded 90% accuracy. Also, in this work, the labels were classified based on the energy consumed, and a non-binary classification was performed. In this context, studies are required for non-binary classifiers with different kernel functions to establish which ones generate better results compared to other SVMs. Another similar study tried to detect absence in the home [41], but did not have a data set that provided energy data and real data on absenteeism in the home. For this, artificial output events were introduced in the data set, which does not guarantee their usefulness with real data.

Conclusions
This work presents the training and evaluation of two different binary SVM classifiers from data acquired using the Emporia Gen 2 Vue energy monitor. The nonlinear SVM model, which was based on the Gaussian kernel function, obtained a return of 91.67%, and the linear SVM model, which was based on the linear kernel function, obtained a return of 92.71%. According to the data obtained in this research, both binary classification methods