An Enhanced Inference Algorithm for Data Sampling Efficiency and Accuracy Using Periodic Beacons and Optimization

Transferring data from a sensor or monitoring device in electronic health, vehicular informatics, or Internet of Things (IoT) networks has had the enduring challenge of improving data accuracy with relative efficiency. Previous works have proposed the use of an inference system at the sensor device to minimize the data transfer frequency as well as the size of data to save network usage and battery resources. This has been implemented using various algorithms in sampling and inference, with a tradeoff between accuracy and efficiency. This paper proposes to enhance the accuracy without compromising efficiency by introducing new algorithms in sampling through a hybrid inference method. The experimental results show that accuracy can be significantly improved, whilst the efficiency is not diminished. These algorithms will contribute to saving operation and maintenance costs in data sampling, where resources of computational and battery are constrained and limited, such as in wireless personal area networks emerged with IoT networks.


Introduction
In health applications, wearable devices are being increasingly used for various purposes, such as monitoring the cardiovascular system for capturing electrocardiography (ECG), blood pressure, pulse rate, glucose levels etc. Information collected through sensors and implantable devices require smart inference functions for intelligent processing, and to communicate wirelessly with external systems accurately.The purpose of such a smart inference function is to manage huge data loads through filtering, and to ensure that critical information can be transmitted between sensor devices in wireless body area networks (WBAN) and other external networks which provide healthcare services and applications (e.g., IoT or healthcare).The application of such an inference system can include areas such as the office, home, and public spaces [1].Hence, in the context of big data, the training and sampling of data are critical in determining the quality of data wrangling and accuracy, as well as in having an effect on network performance and capacity.For example, in mobile health networks, transferring data over radio consumes 2.5 times the battery power compared to sensing data by sensor devices [2].
We could improve the accuracy of data by taking a higher number of data points for transferring, however it will increase the battery consumption, which reduces efficiency.Thus, the frequency of data transfer significantly affects the battery life of devices which can be implanted inside the body.Not to mention, the inconvenience of replacing batteries in implanted monitoring devices leads to the questions of how accurately and efficiently data are required to be sampled at the sensors and recovered at the destination.We may improve the efficiency by reducing the number of sampling and transferring, however at the cost of reducing the data accuracy.As there is a compromise between the two aspects of accuracy and efficiency, we propose to enhance the sampling method with these characteristics in mind by incorporating fine tuning aspects in the algorithm.
In this paper, we propose to improve the data inference system at sensor nodes using optimized algorithms to further reduce battery power of sensors when they transfer data to smart devices, which will subsequently forward them to servers in the cloud.
The main contribution of this work is to facilitate sensors to improve accuracy of data sampling, whilst maintaining the same efficiency during data sampling.This is achieved by introducing hybrid methodology of algorithms applied to a raw dataset with fine-mode method (FM) and coarse-mode method (CM) combined with beacon-mode method (BM).These methods are called beacon-fine-mode method (BFM) and beacon-coarse-mode method (BCM).New algorithms are used to compute the best accuracy and efficiency, in addition to the hybrid methodologies, such as BFM and BCM so that these can be further improved for better accuracy without diminishing the efficiency.
The rest of the paper is organized as follows.In Section 2, we provide the methodology of evaluating our proposed data sampling and inference mechanism.In Section 3, we describe the algorithms introduced to intelligently select the data points and inference mechanism using fine FM, CM, BM, BFM, and BCM.The results of our experiments, to compare these algorithms as well as hybrid algorithms, are presented in Section 4. The final section describes the conclusion and future works.

Methodology
Some existing methodologies address the issue of improving the reading and the sampling of sensed data.For example, to provide an inference method for social networks dealing with cluster structure modelling, a Bayesian inference method has been proposed to compare the performances of the inference algorithms with infinite rational model and variation Bayesian inference methods [3].Another Bayesian inference method has been developed for data intensive computing, using Gibbs sampling method in random Bayesian inference algorithm to derive and infer the final result [4].
Sampling data in a distributed sensor networks with non-uniform periods [5] has been proposed, and this can be further considered in a comprehensive situation with large scaled datasets in a rapid growth of big data for real time analysis, such as social networks.The existing normal distribution sampling method can be extended to generate a summary of datasets to help these cases [6].
Our method is using a novel approach, with inference algorithms for accuracy and efficiency, as opposed to existing methodologies for assessment.The advantage of our method is that it allows for the demonstration of the feasibility of using an inference system to save bandwidth and battery power, as opposed to determining the best method for sensing, processing, and transferring data with the best accuracy, which was not within the scope of this project.To assess the efficiency of our proposed inference algorithm, the factors below have been considered for assessment [7]:

•
Efficiency: Ratio of saved (reduced) data volume and actual transmitted data.

•
Savings: Ratio of reduced data and sensed data (%).

•
Accuracy: Ratio of total value of transmitted data and original data (%).

•
Number of Sensed data: Total number of data points sensed by sensors.These data are used to take sample data for inferencing.

•
Number of Transferred data: Total number of inferred data to be transferred to smart devices from the sensor after inference algorithm has been applied.Gaps between the original and the inferred volume are the areas added below: Figure 1a reflects the gap between the upper and lower lines after application of an inference mechanism.The bigger the gap area (e.g., green and yellow areas in Figure 1a, the less accurate the result is, as shown in Figure 1a.The gap in Figure 1a, as proposed by Kang et al. [8], can be further improved for the accuracy by adjusting the sampling data point, as shown in Figure 1b.To do this, an additional decision-making step is required to calculate the area of S on whether to adjust the sampling data point against the pre-defined threshold data.In this scenario, two (2) data points were added to improve the accuracy, however this was done at a cost to efficiency.To compensate for the loss of efficiency by the added data points, it requires further adjustment.As shown in Figure 1c, the data point w (dark blue) can allow reduction of the existing two data points a and b (light blue) without degrading the accuracy, i.e., area of gap.Gaps between the original and the inferred volume are the areas added below: Figure 1a reflects the gap between the upper and lower lines after application of an inference mechanism.The bigger the gap area (e.g., green and yellow areas in Figure 1a, the less accurate the result is, as shown in Figure 1a.The gap in Figure 1a, as proposed by Kang et al. [8], can be further improved for the accuracy by adjusting the sampling data point, as shown in Figure 1b.To do this, an additional decision-making step is required to calculate the area of S on whether to adjust the sampling data point against the pre-defined threshold data.In this scenario, two (2) data points were added to improve the accuracy, however this was done at a cost to efficiency.To compensate for the loss of efficiency by the added data points, it requires further adjustment.As shown in Figure 1c, the data point w (dark blue) can allow reduction of the existing two data points a and b (light blue) without degrading the accuracy, i.e., area of gap.
Comparing Figure 1a and Figure 1c, the number of total data points of the examined area are the same, whilst the accuracy has been significantly improved.Figure 1 depicts the aforementioned steps and scenarios.Comparing Figures 1a and 1c, the number of total data points of the examined area are the same, whilst the accuracy has been significantly improved.Figure 1 depicts the aforementioned steps and scenarios.

Algorithm Development
This section describes the algorithms with steps to implement the analysis and adjustment of data points for better accuracy and efficiency, based on sample data points and historic data, or pre-defined comparison values stored in a database of inference values.This algorithm provides a high-level overview of the proposed implementation, with reference to the following workflow and description.The algorithm compares values of the pre-determined values, sample data points, and S values.
Definitions used in the algorithm are given below (Table 1):

S upper
The upper gap after inference has been applied for the first and last data point (x and y)

S lower
The lower gap after inference has been applied for the first and last data point Absolute value of difference between the original and inferred gap for the first and last data point

|S sum |
Absolute value of summation of original and inferred gap for the first and last data point

[x, y]
Data points to be added to SD I from SD O X Current selected Data points to be compared for gaps

[(a, b)]
Data point to be removed from SD I Step 1: The original data points are monitored and inferred to provide sample data points.The algorithm 2 infers the gap area S, which produces accuracy and efficiency rates between the first and last data point, as shown in Figure 4.The S is composed of S diff , which is the absolute value of difference between the original and inferred gaps for the first and last data point, and S sum , which is an absolute value of summation of original and inferred gap for the first and last data point.
The following steps will be executed for the n − 1 times for a pair of adjacent data points (i.e., n and n + 1) for the following equation: The Algorithm 1 monitors the original data points and inferred to create sample data points.For example, the original data point range is 1-1800 and the sample (inferred) data point range is 1-70, thus, n = 70.
The Algorithm 1 logic to provide sample data points is as below: 1.
Compare the value of data point at X with Y = X − 1 and Z = X + 1.When the value G (gap between X and (X − 1 or X + 1)) is larger than a value of K (this can be percentile, e.g., 2%, 5%, etc., and it determines "fine" or "coarse" method of the inference), the data point is selected as a sample, and moved to the next (i.e., data point at X + 1 to compare with X and X + 2).

2.
If a sample is not selected, then it will compare with X − 2 and X + 2. When the gap is larger than K, it will be selected as a sample data point.Counter C increments until a pre-defined value D, which determines the distance to neighbor data points.

3.
If a sample is not selected, then do the same till X and Y.

4.
If a sample is not selected and C is larger than D, then it ignores X, so move the data point X to X+1 to compare with Y and Z.  ------------------Algorithm 1 Logic----------------- ------------------End of Algorithm 1 Logic----------------- Step 2: The gap area S between the selected data points of the sample against the original data points are calculated and denoted as  , as shown in Figure 1a.The algorithm (Algorithm 2) checks the difference between  and  to determine the accuracy of the mean value, and uses the outcome in Algorithm 3 so that it can adjust the sample data point.
This outcome is compared against the inference threshold database for a pre-defined value (P), which will determine whether the Algorithm 3 will be executed to adjust the sample data points.For example,  in Figure 1a may invoke the Algorithm 3, if it is larger than the pre-defined value P. P  find pre-defined threshold data from inference threshold database dbI 4.
Q  find pre-defined threshold data from inference threshold database dbI 5.
R  find pre-defined threshold data from inference threshold database dbI 6.

22.
//if the gap difference or gap summation is larger than the pre-defined P and Q respectively 23.
call Algorithm 3
-----------------------------End of Algorithm 2 Logic-------------------- Step 3: Step 2: The gap area S between the selected data points of the sample against the original data points are calculated and denoted as S upper , as shown in Figure 1a.The algorithm (Algorithm 2) checks the difference between S upper and S lower to determine the accuracy of the mean value, and uses the outcome in Algorithm 3 so that it can adjust the sample data point.
adjustDataSample2(w, [a,b]) 9. end if 10. ------------------End of Algorithm 4 Logic------------------Step 5: Algorithm 4 compares the adjacent data points up to the metric of M from the said data point, and removes the original sample data point as required to improve efficiency.For example, existing data point (a) and data point (b) in Figure 1c are removed (deselected from the sample), and the new data point (w) is selected along with the recently added data point (x) and (y).Therefore, data point (a) and (b) will be removed from the sample data point set.In consequence, the total number of sample data points are not changed, i.e., (x) and (y) selected and (a) and (b) deselected from the group, whilst the gap area S has been significantly reduced for accuracy improvement.
We depict in Figure 3, the workflow for the implementation of our analysis of data sampling efficiency by adjusting data points in the data points sample based on the original data points.

Testing Results and Discussion
Subjects provided their written and informed consent in accordance with ethical clearance codes, e.g., the Australian National Statement on Ethical Conduct in Human Research [9].Devices used are

Testing Results and Discussion
Subjects provided their written and informed consent in accordance with ethical clearance codes, e.g., the Australian National Statement on Ethical Conduct in Human Research [9].Devices used are wearables (Fitbit Charge HR and Intel Basis Peak), Raspberry Pi (Raspberry Pi 2 Model B), PC, Smartphone (Samsung Note 4), and production servers (Fitbit) in the cloud.Data were retrieved from the cloud using a customized program developed by Maple Hill Software [10] for analysis.
Heart pulse rate was monitored over 90 min during walking exercises using a fitness tracker with fine and coarse inference level, with or without beacon sampling.Two subjects are used to collect heart rates with 2 devices each to avoid sensing errors.We observed sensors can omit detection of HR data in sensing due to misplacement of the device on the wrist.To complement the errors, each subject wore two devices in each wrist, and the results have been averaged for preprocessing.

•
Original data points monitored over 90 min and selected 1800 data points as raw data.to be used for inferencing with algorithms to apply based on sensing every second • Beacon samples selected as baseline to compare with no beacon samples • Fine and coarse inference level were used to select samples • Beacon data points combined with fine and coarse samples for comparison • Algorithms 3 and 4 are applied to the original data for sampling with further adjustment.
The evaluation approach includes three aspects.Firstly, a coarse method (CM) of inference is used.In Figure 4, original data points sensed on a second basis resulted in 1800 data points, and every minute a sample is taken, which resulted in 70 data points.In Figure 5, the accuracy has been improved a lot with 99%, however the efficiency is very poor with 75.1% savings rate, which is not acceptable.To overcome the efficiency issue of FM method and accuracy issue of CM, a new method is used.It is simply to take samples in a regular interval, referred to as the beacon method (BM).As shown in Figure 6, accuracy is lower than other methods even though the efficiency is good.When the accuracy is less than the expected value, then a fine method (FM) is used to take samples.As shown in Table 2, the accuracy of BM is between FM and CM, with minimum number of samples taken showing the best efficiency.In order to improve the accuracy in this case, a further hybrid method can be used by combining beacon and other methods.Figure 7 shows the beacon method combined with FM (BFM), which shows the best accuracy rate of 99.5%.CM can also be improved by combining with BM, i.e., beacon and coarse method (BCM), showing significant improvement in accuracy, as shown in Figure 8.In terms of efficiency, both FM and CM fail to show positive improvement, as both take many more data points for samples.As a result, it is concluded as below:

•
Accuracy trades off with efficiency • BM produces the best efficiency with a reasonable accuracy rate • Combining BM with other methods can improve the accuracy, however the efficiency will decrease To achieve the best accuracy, BFM can be used with the least efficiency.The following shows the results of the inferencing algorithms applied to a raw dataset for heart rates with FM, CM, and BM, followed by the hybrid methods BFM and BCM.

Conclusions and Future Work
This work proposes a novel approach in the inferencing of data from body sensors, which can be used in various real-world applications, such as in health services, transport, smart cities, and various other IoT applications.With big data and convergence with the IoT network, extra demand is expected to overload body sensors with extra transactions and traffic.This paper has proposed a novel inference system that improved data accuracy with the aim of transmitting critical information for timely decision making.We demonstrated several sampling methods to find the optimal sampling data points.BM is a critical method to improve the efficiency, whilst it provides reasonable accuracy.We presented an algorithm in sampling data points for inference to increase the accuracy by adjusting sample data points by reselecting samples within the same (or less) number of samples.Initially, the algorithm adds more samples to increase the accuracy, then reassesses the samples and adjusts the total number of data points.Overall, our model achieved improved accuracy with negligible loss in efficiency.
As future works, we will implement a large scaled database environment to perform real-time evaluation, to automatically present the improvement in data accuracy and efficiency.This will also allow a customized approach, where users can request for expected accuracy and efficiency so that the system can produce the best optimized methodology for individual needs.When health data are collected and processed across heterogeneous networks, it requires compliance using a template, as well as defining data requested for certain applications.To achieve this, it is required to be able to send an automatic data request for immediate inquiries, including data integrity to inference in cloud storage to verify the integrity of a third party for security purposes [11][12][13], when an additional request should be sent out to all networks.We will implement and experiment to measure the network response time with accuracy in this case, as well as how accurately the proposed method can provide the best optimized data sampling at large scale using big data in the cloud server, which may be located in a centralized location, in the context of software defined network (SDN) infrastructure with fault tolerant mechanisms [14,15].
In the case of wide area networks across states and countries, data servers may be required in multiple networks with synchronization to minimize the access time from end users to the server.This may need further study to implement solutions such as SDN or software defined wide area networks (SDWAN).When the knowledge of past historical data is not accurate or is insufficient for processing, the data accuracy may not be enough for inferring, which may cause numerous false

Conclusions and Future Work
This work proposes a novel approach in the inferencing of data from body sensors, which can be used in various real-world applications, such as in health services, transport, smart cities, and various other IoT applications.With big data and convergence with the IoT network, extra demand is expected to overload body sensors with extra transactions and traffic.This paper has proposed a novel inference system that improved data accuracy with the aim of transmitting critical information for timely decision making.We demonstrated several sampling methods to find the optimal sampling data points.BM is a critical method to improve the efficiency, whilst it provides reasonable accuracy.We presented an algorithm in sampling data points for inference to increase the accuracy by adjusting sample data points by reselecting samples within the same (or less) number of samples.Initially, the algorithm adds more samples to increase the accuracy, then reassesses the samples and adjusts the total number of data points.Overall, our model achieved improved accuracy with negligible loss in efficiency.
As future works, we will implement a large scaled database environment to perform real-time evaluation, to automatically present the improvement in data accuracy and efficiency.This will also allow a customized approach, where users can request for expected accuracy and efficiency so that the system can produce the best optimized methodology for individual needs.When health data are collected and processed across heterogeneous networks, it requires compliance using a template, as well as defining data requested for certain applications.To achieve this, it is required to be able to send an automatic data request for immediate inquiries, including data integrity to inference in cloud storage to verify the integrity of a third party for security purposes [11][12][13], when an additional request should be sent out to all networks.We will implement and experiment to measure the network response time with accuracy in this case, as well as how accurately the proposed method can provide the best optimized data sampling at large scale using big data in the cloud server, which may be located in a centralized location, in the context of software defined network (SDN) infrastructure with fault tolerant mechanisms [14,15].
In the case of wide area networks across states and countries, data servers may be required in multiple networks with synchronization to minimize the access time from end users to the server.This may need further study to implement solutions such as SDN or software defined wide area networks (SDWAN).When the knowledge of past historical data is not accurate or is insufficient for processing, the data accuracy may not be enough for inferring, which may cause numerous false positives (FPs).To avoid this case, we will develop algorithms to infer the situation with previous history of accuracy and efficiency rates, to predict how much the algorithm can improve those aspects.

Figure 1 .
Figure 1.Difference of original and inferred value with gaps (a) and adjusted sampling data point to improve the accuracy (b).This can be improved with further adjustment to compensate for efficiency by matching the data points samples (c).
count I The total number of sample data after each adjustment in the sample data db I Inference Database D Pre-defined value for distance to neighbor data point G Gap between X and its neighboring data point value K Threshold value of FM and CM N Total number of original data points.According to the Figure 4, N = Total number of DP n Total number of sampling data points.According to the Figure 4, n = Total number of Inferred DP P Pre-defined threshold data for decision making in comparison with gap difference Q Pre-defined threshold data for decision making in comparison with gap summation R Pre-defined threshold data for decision making SD I [ ] Collection of sample data after inference SD O [ ] Collection of original sample data

-Figure 2 Figure 2 .
Figure2shows the flow of the algorithm to create sample data points.

Algorithm 2 .
Calculation and comparison of Gap area 1. // Calculates the Gap area and compares the outcome with the inference threshold database values 2. function retrieveThresholdValue() 3.

Figure 2 .
Figure 2. Flowchart of Algorithm 1, which takes samples based on pre-defined fine-mode method (FM) and coarse-mode method (CM) threshold values against neighboring data points.

Algorithm 2 .
Calculation and comparison of Gap area 1. // Calculates the Gap area and compares the outcome with the inference threshold database values 2. function retrieveThresholdValue() 3. P → find pre-defined threshold data from inference threshold database db I 4. Q → find pre-defined threshold data from inference threshold database db I 5. R → find pre-defined threshold data from inference threshold database db I 6. return P,Q,R 7. end function 8. function calculateGapArea(first,last) 9. S upper → calculateUppergapVolume(first,last) 10. S lower → calculateLowergapVolume(first,last) 11. S di f f → S upper − S lower 12. |S sum | → S upper + S lower 13.return S di f f , |S sum | 14. end function 15.

Algorithm 3 .
Adjustment of data points sample based on gap difference and summation 1. function adjustDataSample1([x, y]) 2. if ( x in SD O [ ] and y in SD O [ ] ) then 3. until ( S di f f <= P ) or ( |S sum | <= Q) do loop 4. add (x, y) data point in SD I [ ] from SD O [

Algorithm 4 .
Adjustment of data points sample based on total number of data points 1. function adjustDataSample2(w, [a, b]) 2. remove (a, b) data point from SD I [ ] 3. add (w) data point in SD I [ ] from SD O [ ] 4. end function 5.

Figure 7 .
Figure 7. Original data points versus Beacon-Fine-mode method (BFM) applied for inference (original DP: 1800, inferred DP: 496).It has the greatest number of DPs, however, provides the best accuracy rate, representing the best outcome of the original DPs.

Figure 7 .
Figure 7. Original data points versus Beacon-Fine-mode method (BFM) applied for inference (original DP: 1800, inferred DP: 496).It has the greatest number of DPs, however, provides the best accuracy rate, representing the best outcome of the original DPs.

Figure 7 .
Figure 7. Original data points versus Beacon-Fine-mode method (BFM) applied for inference (original DP: 1800, inferred DP: 496).It has the greatest number of DPs, however, provides the best accuracy rate, representing the best outcome of the original DPs.

Table 1 .
Definitions of denotation

Algorithm 1 .
Creation of sample data collection 1. /* To create a collection of sample data points as SD i [ ] from the original data points array that is SD O [ ].*/ 2. function retrievePredefinedValues( ) 3. K → retrieve pre-defined threshold values for FM and CM from inference threshold database db I 4. D → retrieve pre-defined threshold values for distance to neighboring data point from db I 5. return K,D 6. end function 7. function createSampleDataPoints (X,C) 8. Y = X − C // to store the value of neighbor data point-one prior data point 9. Z = X + C // to store the value of next consecutive data point 10.G → Calculategap 11. if ( G > K ) then 12.Add X to SD i [ ] // add to collection of sample data array after inference 13. end if 14. end function15.

Table 2 .
Summary of samples and inference results.Efficiency rate is not shown in this table as it can be calculated from Savings rate.Beacon fine-mode method (BFM), Beacon coarse-mode method (BCM)