# Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- RQ1: Which aspects of sensors are linked to the variation of data and the potential presence and role of outliers? Are we making some assumptions that are not met?
- RQ2: Which statistical solutions can provide valuable information about these aspects, so that we can analyze them and understand how they behave for different sensors? How can we use, combine and interpret their outcomes?
- RQ3: In case outliers are present, how do they affect the basic characteristics of sensors’ data? Are there alternatives that could mitigate the problem?

- RQ4: How does correlation work with different types of sensors? What is the potential utility of correlation in understanding data?
- RQ5: How do the different solutions perform? Are they also affected by outliers? How?

- RQ6: Is it possible to define a set of processes, as a way to automate and formalize data understanding, applicable to different domains?
- RQ7: Can we exploit computational resources in a better way, increasing the efficiency of the proposed solution?

## 2. Related Work

## 3. Research Methodology and Data Used

#### 3.1. Research Methodology

- Activity 1. Problem identification and motivation—As stated in the introduction, we believe that it is necessary to formalize data understanding processes and to provide a tool for supporting such formalization, based on statistical tests (as they can be fast and flexible), and applicable to multiple domains. We identified several research questions related to the approaches to use and the aspects to address (in Section 1). With such tools, it will be possible to understand how data are expected to behave, discarding solutions that might be affected by the nature of the data (e.g., as mentioned in Section 2, some ML-based algorithms could show problems when using datasets that are not following a normal distribution).
- Activity 2. Define the objectives for a solution—A set of objectives were defined, including the concrete aspects that must be addressed (variability, distribution of the data, presence of outliers and their effect), the application of multiple statistical solutions (we explore as many as possible), the applicability to different domains (using datasets from multiple domains was a must) and the efficiency of execution (therefore, the parallel implementation, as a way to experiment with execution in multiple cores, as available in edge environments and Cloud solutions). The solution must provide enough information for the users to make decisions on how to process the data.
- Activity 3. Design and development—This activity was the most complex one, applying a quantitative research method approach, in which certain experiments were carried out to analyze different statistical solutions, with the purpose of observing which solutions were working, what results they were providing and how they could be used. As a result, we identified the key aspects to include and designed the three processes to be implemented. Then, these processes were implemented with R scripts, applying parallelization to the code.
- Activity 4. Demonstration—As a way to demonstrate the validity of the solution, we carried out a complete analysis of a new dataset (with a new sensor type not studied before) using the implemented R scripts. We observed the information generated by the scripts by configuring the usage of two air quality sensors. The demonstration included the execution with different numbers of cores, showing how the implementation could be scaled up (10 executions were performed per script and core configuration).
- Activity 5. Evaluation—We observed that it was possible to quickly obtain valuable information about the characteristics of the data by using the scripts. We checked whether the information provided was accurate, including the identification of outliers (by comparing the generated results with the visual observation of the dataset, as well as other graphs such as histograms). Metrics such as F-score were available in some cases (e.g., in outliers detection). We also generated corresponding graphs for speedup and execution time, in order to observe the efficiency of parallel execution.
- Activity 6. Communication—Once we had results to communicate, we proceeded with the preparation of an article to explain the results from our research.

- Design as an Artifact: We produced a set of processes for data understanding and decision making, together with their (parallel) implementation;
- Problem Relevance: Our work addressed an important business problem of understanding the data before carrying out complex data analytics. The implemented scripts could support researchers and practitioners when selecting the most appropriate data analytics and ML solutions;
- Design Evaluation: We defined a way to evaluate several aspects of our results, including not only the capability to provide relevant information and accuracy of results, but also the performance when executed in parallel;
- Research Contributions: We identified several aspects that affect the characteristics of the sensors’ datasets, such as potential issues with variance, the reality about the probability distribution of the data, statistical tests that may be problematic, etc. This knowledge was used to implement a set of useful scripts, and we also demonstrated the utility of parallelization as a way to increase performance when analyzing sensors data;
- Research Rigor: As explained, we followed formal research methods to identify key aspects and to design the R scripts, while the evaluation method was also formally defined;
- Design as a Search Process: We used all possible means to obtain a useful solution, adding as many sensor types (as we had access or generated new datasets) and statistical solutions as possible, to contrast the results and gain more knowledge;
- Communication of Research: This article is a good representation of communication to a technology-oriented audience.

#### 3.2. Data Sources

_{2}concentration, O

_{3}concentration, PM10 concentration and PM2.5 concentration, with measurements almost every minute. They cover one week of data, which provided measurements every minute. The devices were separated by distances ranging from hundreds of meters to several kilometers.

## 4. Research Results

#### 4.1. Sensor Types

#### 4.2. Units of Measurement and Ranges

^{2}, humidity in mg/L, etc.). This means that the ranges of data that a humidity sensor generates have nothing to do with a sonometer.

_{3}-Q

_{1}(the difference between 75th and 25th percentiles) and provides an idea of the statistical dispersion of values. Because the outliers tended to be out of that range, we eliminated the anomalies they introduced (for instance, when calculating the mean), although the measurement units used will also affect IQR calculation (it is not a unit-less metric; hence, the same measurements in hPa or decapascals would result in different calculations of IQR).

#### 4.3. Data Distribution

#### 4.4. Outliers and Homogeneity

^{−9}. The Dixon test was applied only to the middle of the dataset (due to its limitation) and detected the same outlier with Q = 0.45954 and p-value = 0.0188. The ESD test was executed with an upper limit of nine outliers and reported the nine correctly.

#### 4.5. Equivalent Sensors and Their Locations

^{−16}for Dique Exento Norte and p-value = 5.371 × 10

^{−15}for Dique Exento Sur, and Q-Q plots showed a clear light tail). For the 1-day data, in some cases the tests determined that the data were following a normal distribution.

#### 4.6. Proposal for a Decision-Making Procedure

#### 4.7. Evaluation Applying the Processes

_{2}sensor (Figure 9 shows the measures), which is a type of metric not studied before. Long and short windows of data, as well as sliding windows, were used in order to analyze the data. While the size of the long windows covered 6% (10 h) of the total amount of data, the size of the short windows covered only 1% (75 min of data).

#### 4.7.1. Variation Analysis Process

_{2}is measured in µg/m

^{3}, we considered that it has an absolute zero. Therefore, the coefficient of variation (CV) was also calculated. For the first sensor, the CV ranged from 0.27 to 0.77 in long windows and from 0.20 to 1.07 in short windows (although around 2/3 of the values were below 0.50). The CV mean was 0.49 for long windows and 0.44 for the short ones, while the median was 0.48 and 0.38, respectively. For the second sensor, values ranged from 0.31 to 0.73 in long windows, although high values were unusual, and most experiments reported coefficients of variation between 0.50 and 0.35, with a mean of 0.45 and a median of 0.40. In the case of short windows, CV went from 0.19 to 1.32 (in a few cases with extreme variation), with a mean of 0.42 and a median of 0.35.

#### 4.7.2. Outlier Analysis Process

^{−12}) to some near the limit of the test (with p-value 0.0489). In cases marked by doubts, the use of sliding windows supported the detection of some outliers (because they became more evident). The process was quite accurate with outlier detection, failing only four times (compared to visual checks). Therefore, its accuracy was calculated as 0.9 and its F1-score was 0.77.

^{−12}to 0.048 (for the detected outliers), and reported very few false positives (compared to test results with visual inspection). The homogeneity tests reported some false positives as well. In general, the sliding windows worked fine, with values varying as expected for the detection, and only failed in four cases. In this case, the accuracy was 0.9 and F1-score was 0.77 (although precision and recall differed from the first case).

#### 4.7.3. Correlation Analysis Process

#### 4.7.4. Performance Analysis

^{®}Core i5-8350U vPro processor (four physical cores at 1.70 GHz and 3.60 GHz in turbo mode, 6 MB smart cache and a bus speed of 4 GT/s) and 8 GB DDR4 2400 MHz RAM. The software used was a Windows 10 Enterprise (compilation 19041.1052) operating system with R version 3.6.2.

## 5. Discussion and Conclusions

#### 5.1. Particularities of Sensor Data

#### 5.2. Proposed Processes and Their Implementation

#### 5.3. Research Limitations

#### 5.4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Jeffery, S.R.; Alonso, G.; Franklin, M.J.; Hong, W.; Widom, J. Declarative Support for Sensor Data Cleaning. Lect. Notes Comput. Sci.
**2006**, 3968, 83–100. [Google Scholar] - Bruijn, B.; Nguyen, T.; Bucur, D.; Tei, K. Benchmark Datasets for Fault Detection and Classification in Sensor Data. In Proceedings of the 5th International Conference on Sensor Networks, Rome, Italy, 17–19 February 2016; pp. 185–195. [Google Scholar] [CrossRef]
- CrowdFlower. 2017 Data Scientist Report. Available online: https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport.pdf (accessed on 20 July 2021).
- Kaggle. 2018 Kaggle Machine Learning and Data Science Survey. Available online: https://www.kaggle.com/paultimothymooney/2018-kaggle-machine-learning-data-science-survey (accessed on 20 July 2021).
- Anaconda. The State of Data Science. 2020. Available online: https://www.anaconda.com/state-of-data-science-2020 (accessed on 20 July 2021).
- Teh, H.Y.; Kempa-Liehr, A.W.; Wang, K.I.K. Sensor data quality: A systematic review. J. Big Data
**2020**, 7, 1–49. [Google Scholar] [CrossRef] [Green Version] - Firat, M.; Dikbas, F.; Koc, A.C.; Gungor, M. Analysis of temperature series: Estimation of missing data and homogeneity test. Meteorol. Appl.
**2012**, 19, 397–406. [Google Scholar] [CrossRef] - Che, F.R.; Hiroyuki, T.; Lariyah, M.S.; Hidayah, B. Homogeneity and trends in long-term rainfall data, Kelantan River basin, Malaysia. Int. J. River Basin Manag.
**2016**, 14, 151–163. [Google Scholar] - Alexandersson, H. A homogeneity test applied to precipitation data. J. Climatol.
**1986**, 6, 661–675. [Google Scholar] [CrossRef] - Pettitt, A.N. A non-parametric approach to the change point problem. J. R. Stat. Soc. Ser. C Appl. Stat.
**1979**, 28, 126–135. [Google Scholar] [CrossRef] - Buishand, T.A. Some Methods for Testing the Homogeneity of Rainfall Records. J. Hydrol.
**1982**, 58, 11–27. [Google Scholar] [CrossRef] - Ni, K.; Ramanathan, N.; Chehade, M.N.H.; Balzano, L.; Nair, S.; Zahedi, S.; Kohler, E.; Pottie, G.; Hansen, M.; Srivastava, M. Sensor network data fault types. ACM Trans. Sen. Netw.
**2009**, 5, 1–29. [Google Scholar] [CrossRef] [Green Version] - Baljak, V.; Tei, K.; Honiden, S. Fault classification and model learning from sensory readings—Framework for fault tolerance in wireless sensor networks. In Proceedings of the IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing, Melbourne, Australia, 2–5 April 2013; pp. 408–413. [Google Scholar] [CrossRef]
- Erhan, L.; Ndubuaku, M.; di Mauro, M.; Song, W.; Chen, M.; Fortino, G.; Bagdasar, O.; Liotta, A. Smart anomaly detection in sensor systems: A multi-perspective review. Inf. Fusion
**2021**, 67, 64–79. [Google Scholar] [CrossRef] - Zhang, Y.; Szabo, C.; Sheng, Q.Z. Reduce or Remove: Individual Sensor Reliability Profiling and Data Cleaning. Intell. Data Anal.
**2016**, 20, 979–995. [Google Scholar] [CrossRef] - Kenda, K.; Mladenić, D. Autonomous Sensor Data Cleaning in Stream Mining Setting. Bus. Syst. Res. J.
**2018**, 9, 69–79. [Google Scholar] [CrossRef] [Green Version] - Ramotsoela, D.; Abu-Mahfouz, A.; Hancke, G. A Survey of Anomaly Detection in Industrial Wireless Sensor Networks with Critical Water System Infrastructure as a Case Study. Sensors
**2018**, 18, 2491. [Google Scholar] [CrossRef] [Green Version] - Magán-Carrión, R.; Camacho, J.; García-Teodoro, P. Multivariate statistical approach for anomaly detection and lost data recovery in wireless sensor networks. Int. J. Distrib. Sens. Netw.
**2015**, 11, 672124. [Google Scholar] [CrossRef] [Green Version] - Liu, J.; Deng, H. Outlier detection on uncertain data based on local information. Knowl.-Based Syst.
**2013**, 51, 60–71. [Google Scholar] [CrossRef] - Martins, H.; Palma, L.; Cardoso, A.; Gil, P. A support vector machine based technique for online detection of outliers in transient time series. In Proceedings of the 10th Asian Control Conference (ASCC), Kota Kinabalu, Malaysia, 31 May–3 June 2015; pp. 1–6. [Google Scholar]
- Hasan, M.; Islam, M.M.; Zarif, M.I.; Hashem, M.M. Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things
**2019**, 7, 100059. [Google Scholar] [CrossRef] - Maseda, F.J.; López, I.; Martija, I.; Alkorta, P.; Garrido, A.J.; Garrido, I. Sensors Data Analysis in Supervisory Control and Data Acquisition (SCADA) Systems to Foresee Failures with an Undetermined Origin. Sensors
**2021**, 21, 2762. [Google Scholar] [CrossRef] - Martí, L.; Sanchez-Pi, N.; Molina, J.M.; Garcia, A.C.B. Anomaly Detection Based on Sensor Data in Petroleum Industry Applications. Sensors
**2015**, 15, 2774–2797. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Oucheikh, R.; Fri, M.; Fedouaki, F.; Hain, M. Deep Real-Time Anomaly Detection for Connected Autonomous Vehicles. Procedia Comput. Sci.
**2020**, 177, 456–461. [Google Scholar] [CrossRef] - Box, G.E.P.; Cox, D.R. An analysis of transformations. J. R. Stat. Soc. Ser. B
**1964**, 26, 211–252. [Google Scholar] [CrossRef] - Yeo, I.; Johnson, R. A New Family of Power Transformations to Improve Normality or Symmetry. Biometrika
**2000**, 87, 954–959. [Google Scholar] [CrossRef] - Hevner, A.R.; March, S.T.; Park, J. Design Research in Information Systems Research. MIS Q.
**2004**, 28, 75–105. [Google Scholar] [CrossRef] [Green Version] - Peffers, K.; Tuunanen, T.; Rothenberger, M.A.; Chatterjee, S. A Design Science Research Methodology for Information Systems Research. J. Manag. Inf. Syst.
**2007**, 24, 45–77. [Google Scholar] [CrossRef] - Ingelrest, F.; Barrenetxea, G.; Schaefer, G.; Vetterli, M.; Couach, O.; Parlange, M. SensorScope: Application-specific sensor network for environmental monitoring. ACM Trans. Sens. Netw.
**2010**, 6, 1–32. [Google Scholar] [CrossRef] - Reed, J.F.; Lynn, F.; Meade, B.D. Use of Coefficient of Variation in Assessing Variability of Quantitative Assays. Clin. Diagn. Lab. Immunol.
**2002**, 9, 1235–1239. [Google Scholar] [CrossRef] [Green Version] - Grubbs, F. Procedures for Detecting Outlying Observations in Samples. Technometrics
**1969**, 11, 1–21. [Google Scholar] [CrossRef] - Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson Correlation Coefficient. In Noise Reduction in Speech Processing; Springer Topics in Signal Processing Series; Springer: Berlin, Germany, 2009; Volume 2, pp. 1–4. [Google Scholar] [CrossRef]
- Anderson, T.W.; Darling, D.A. A test of goodness-of-fit. J. Am. Stat. Assoc.
**1954**, 49, 765–769. [Google Scholar] [CrossRef] - Shapiro, S.S.; Wilk, M.B. An analysis of variance test for normality (complete samples). Biometrika
**1965**, 52, 591–611. [Google Scholar] [CrossRef] - Dixon, W.J. Processing data for outliers. Biometrics
**1953**, 9, 74–89. [Google Scholar] [CrossRef] - Rosner, B. Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics
**1983**, 25, 165–172. [Google Scholar] [CrossRef] - Tietjen, G.; Moore, R. Some Grubbs-Type Statistics for the Detection of Several Outliers. Technometrics
**1972**, 14, 583–597. [Google Scholar] [CrossRef] - Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psych.
**2013**, 49, 764–766. [Google Scholar] [CrossRef] [Green Version] - Buishand, T.A. Tests for Detecting a Shift in the Mean of Hydrological Time Series. J. Hydrol.
**1984**, 73, 51–69. [Google Scholar] [CrossRef] - Lanzante, J.R. Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. Int. J. Clim.
**1996**, 16, 1197–1226. [Google Scholar] [CrossRef] - Mann, H.B. Nonparametric tests against trend. Econometrica
**1945**, 13, 245–259. [Google Scholar] [CrossRef] - Aggarwal, R.; Ranganathan, P. Common pitfalls in statistical analysis: The use of correlation techniques. Perspect. Clin. Res.
**2016**, 7, 187–190. [Google Scholar] [PubMed] - Kendall, M.G. The treatment of ties in rank problems. Biometrika
**1945**, 33, 239–251. [Google Scholar] [CrossRef] - Dodge, Y. Spearman Rank Correlation Coefficient. In The Concise Encyclopedia of Statistics; Springer: New York, NY, USA, 2008. [Google Scholar] [CrossRef]
- Wirth, R.; Hipp, J. CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; pp. 29–39. [Google Scholar]
- Huber, S.; Wiemer, H.; Schneider, D.; Ihlenfeldt, S. DMME: Data mining methodology for engineering applications—A holistic extension to the CRISP-DM model. Procedia CIRP
**2019**, 79, 403–408. [Google Scholar] [CrossRef] - Delignette-Muller, M.; Dutang, C. Fitdistrplus: An R Package for Fitting Distributions. J. Stat. Softw.
**2015**, 64, 1–34. [Google Scholar] [CrossRef] [Green Version] - Ryan, C.M.; Parnell, A.; Mahoney, C. Real-time anomaly detection for advanced manufacturing: Improving on Twitter’s state of the art. arXiv
**2019**, arXiv:1911.05376. Available online: https://arxiv.org/abs/1911.05376 (accessed on 20 July 2021). - Hochenbaum, J.; Vallis, O.S.; Kejariwal, A. Automatic anomaly detection in the cloud via statistical learning. arXiv
**2017**, arXiv:1704.07706. Available online: https://arxiv.org/abs/1704.07706 (accessed on 20 July 2021). - Hoefler, T.; Belli, R. Scientific benchmarking of parallel computing systems: Twelve ways to tell the masses when reporting performance results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, TX, USA, 15–20 November 2015; Association for Computing Machinery: New York, NY, USA, 2015. Article 73. pp. 1–12. [Google Scholar] [CrossRef]
- Amdahl, G.M. Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities. In Proceedings of the Spring Joint Computer Conference, Atlantic City, NJ, USA, 18–20 April 1967; pp. 483–485. [Google Scholar] [CrossRef]
- Tanuska, P.; Spendla, L.; Kebisek, M.; Duris, R.; Stremy, M. Smart Anomaly Detection and Prediction for Assembly Process Maintenance in Compliance with Industry 4.0. Sensors
**2021**, 21, 2376. [Google Scholar] [CrossRef] [PubMed] - Guh, R.S. Effects of non-normality on artificial neural network based control chart pattern recognizer. J. Chin. Inst. Ind. Eng.
**2002**, 19, 13–22. [Google Scholar] [CrossRef]

**Figure 1.**Behavior of different sensors: (

**a**) Outdoor temperature sensor (1 week); (

**b**) Salinity sensor (1 week); (

**c**) Outdoor luminosity sensor during the afternoon (30 min); (

**d**) Indoor luminosity sensor during nighttime (10 min).

**Figure 2.**This figure shows the graphical analysis to determine if the datasets follow a normal distribution: (

**a**) Histogram for a dataset with temperature measurements; (

**b**) Q-Q plot of the same temperature dataset showing the deviation from a normal distribution (heavy tailed); (

**c**) Histogram for the wind speed dataset; (

**d**) Q-Q plot of the same wind speed dataset showing the deviation from a normal distribution (skewed right).

**Figure 3.**This figure shows the graphical analysis of data distribution for the Figure 2 temperature dataset with a bias fault injected: (

**a**) Histogram of the temperature dataset; (

**b**) Q-Q plot of the same temperature dataset, showing the deviation from a normal distribution (skewed right).

**Figure 4.**This shows an example of atmospheric pressure outlier detection with homogeneity tests: (

**a**) Full dataset with atmospheric pressure metrics, with outliers; (

**b**) Graph of the statistic calculated with SNHT; (

**c**) Graph of the statistic calculated with Pettitt; (

**d**) Graph of the statistic calculated with Buishand range test; (

**e**) Graph of the statistic calculated with Lanzante’s test; (

**f**) Graph of the statistic calculated with Buishand U test.

**Figure 5.**This figure shows the correlation diagram for different types of sensors and datasets: (

**a**) Spearman correlation for wind speed (1 week); (

**b**) Spearman correlation for temperature without errors (1 week); (

**c**) Spearman correlation for temperature with drift errors (1 week); (

**d**) Spearman correlation for atmospheric pressure (1 month), with different measurement units.

**Figure 7.**Processes for analyzing data from sensors: (

**a**) Process to analyze variation of data; (

**b**) Process for analyzing the presence of outliers in the data.

**Figure 9.**Datasets used for analysis with NO

_{2}measurements: (

**a**) Sensor 1 measurements; (

**b**) Sensor 2 measurements.

**Figure 10.**Graphs showing the performance of the parallel version of the scripts. (

**a**) Speedup for the variation script. (

**b**) Execution time for the variation script. (

**c**) Speedup for the outlier script. (

**d**) Execution time for the outlier script. (

**e**) Speedup for the correlation script. (

**f**) Execution time for the correlation script.

**Table 1.**Table comparing the outcomes of correlation tests facing different failures in the dataset.

Type of Failure/Test | Pearson | Kendall | Spearman |
---|---|---|---|

No error | 0.916444 | 0.8414264 | 0.9399103 |

Malfunction | 0.816932 | 0.752025 | 0.8621708 |

Bias | 0.5387241 | 0.6189237 | 0.6764626 |

Drift | 0.3560831 | 0.7168082 | 0.8339586 |

R Script | Results | Serial | 2 Cores | 3 Cores | 4 Cores | 5 Cores | 6 Cores | 7 Cores |
---|---|---|---|---|---|---|---|---|

Variation Process | Mean | 94.72 | 48.42 | 34.37 | 27.73 | 34.77 | 23.58 | 31.99 |

Best | 57.48 | 45.23 | 24.05 | 21.39 | 29.52 | 21.08 | 26.87 | |

Worst | 138.88 | 52.53 | 38.32 | 34.12 | 47.17 | 26.13 | 34.84 | |

C. of Variation | 0.28 | 0.04 | 0.13 | 0.15 | 0.12 | 0.06 | 0.06 | |

Outlier Process | Mean | 471.42 | 247.58 | 212.99 | 208.1 | 154.95 | 160.91 | 186.21 |

Best | 432.64 | 235.89 | 199.65 | 178.57 | 143.91 | 144.05 | 169.06 | |

Worst | 535.75 | 255.68 | 218.5 | 230.13 | 159.91 | 178.54 | 202.19 | |

C. of Variation | 0.08 | 0.02 | 0.02 | 0.09 | 0.03 | 0.06 | 0.04 | |

Correlation Process | Mean | 4.36 | 2.82 | 2.47 | 2.45 | 2.63 | 2.75 | 2.83 |

Best | 4.09 | 2.68 | 2.36 | 2.39 | 2.55 | 2.53 | 2.77 | |

Worst | 4.69 | 3.25 | 2.89 | 2.53 | 2.75 | 2.97 | 2.92 | |

C. of Variation | 0.05 | 0.07 | 0.05 | 0.01 | 0.01 | 0.04 | 0.01 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nieto, F.J.; Aguilera, U.; López-de-Ipiña, D.
Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation. *Sensors* **2021**, *21*, 6063.
https://doi.org/10.3390/s21186063

**AMA Style**

Nieto FJ, Aguilera U, López-de-Ipiña D.
Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation. *Sensors*. 2021; 21(18):6063.
https://doi.org/10.3390/s21186063

**Chicago/Turabian Style**

Nieto, Francisco Javier, Unai Aguilera, and Diego López-de-Ipiña.
2021. "Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation" *Sensors* 21, no. 18: 6063.
https://doi.org/10.3390/s21186063