# Outlier Detection Transilience-Probabilistic Model for Wind Tunnels Based on Sensor Data

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theoretical Foundations

- Statistical Models. Statistical Anomaly detection models are based on statistical knowledge [15]; that is, statistical techniques and models. There exist different approaches for the classification of this kind of method, but in this research, a new one is proposed based on the classification into two main groups:
- -
- Based on magnitudes. These kind of models are based on the values of statistical magnitudes, which must be calculated previously. The possible outliers in the data set are identified from these values. These include, for example, Box and Whiskers, based on the use of quartiles, bagplot, mean and variance, or Cochran test.
- -
- Based on probability distribution. These kind of models are based on the probability distributions of the analyzed variables, which must be obtained previously. The outliers are identified from those distributions. Different types of probability distributions can be analyzed to find outliers, such as Normal and Log-Normal. These include, for example, the Chi-square test, Dixon test, Gribs test, Tietjen–Moore, or Generalized Extreme Studentized Deviation.

- Distance-based Models. These models are based on the definition of a distance function between the data, such as the Minkowski, Euclidean, Manhatthan, Mahalanobis, or Chebychev distance [16]. The distance is calculated between all the observations in the data set, and the values obtained must follow a certain pattern if the data are normal. The outliers are those values that do not follow this pattern [17,18]. These methods include, for example, K-nearest neighbors and Angle Bases Outlier Detection (ABOD).
- Density-based Models. These models are based on the amount of observations found in the different regions of the space. If the values are normal, the density must follow a pattern in different regions. If some regions appear with lesser density than the pattern observed, values in that region could possibly be outliers [19]. These include, for example, Local Outlier Factor (LOF) and Shared Nearest Neighbour (SNN).
- Spectral Decomposition Models. These models divide the data into sub-spaces of lower dimensions than the initial one and identify those sub-spaces as having normal, anomaly, or noise Sub-space Outlier Degree. Once these sub-spaces have been defined, outliers can be identified in the anomaly sub-space [20].

- Supervised Models. These are models that use a training set that contains normal data and anomalies. If there are different classes of anomalies, they must be contained in the training set data.
- Unsupervised Models. These models are used when it is not known whether the data contained in the data set are normal or outliers; that is, if the data does not have a class label identifying whether the data is normal or an outlier. To establish whether a datum is an outlier or not, a parameter called the outlier degree is used. This parameter is set by the analyst and, for each datum, it establishes whether it is an outlier or not.
- Semi-supervised Models. In these models, some data in the data set are considered normal, while other data are not labeled as normal or outlier. These models use the labeled data as reference, in order to obtain a parameter that allows for the identification of the non-labeled objects as outliers or normal data.

#### 2.1. Statistical Probability Distribution Outlier Detection Models

#### 2.2. Local Transilience Outlier Identification Method (LTO)

- In the first phase, the laminarity of the studied variable is established in the following way:
- The local area is established. The number k of nearest neighbors that will be used to perform the study is established: $k=3$ or $k=4$ may be enough, but lower or greater values can be selected. In this paper, $k=3$ was used.
- For each value of the variable v at time i, ${v}_{i}$, the Euclidean distance to the k nearest values is calculated:$$forj=1,k;{d}_{i,i-k}=\sqrt[]{\sum _{i=1}^{ds}{({v}_{i}-{v}_{j})}^{2}},$$
- Once the three distances are obtained, the second step consists of the calculation of their mean:$${m}_{i}=\frac{{\sum}_{j=1}^{k}{d}_{i,i-j}}{k}.$$This is performed for all values in the training set.
- The third step consists of the calculation of the mean and variance of the means calculated in Step 2. If there are n values in the data set, the calculation is:$$\begin{array}{cc}\hfill m=\frac{{\sum}_{i=1}^{n}{m}_{i}}{k}& {{\sigma}_{m}}^{2}=\frac{{\sum}_{i=1}^{n}{({m}_{i}-m)}^{2}}{k}.\end{array}$$Once m and $var$ have been obtained for the training set, they are used to identify anomalies in the variable for which they have been obtained during the normal operations of the laminar variable (in the case of this paper, wind tunnel variables), by applying a statistical method based on probability distributions (as explained in Section 2.1) to obtain the interval for normal ${m}_{i}$. This interval is defined as the degree of non-laminarity of the value. In the case of this study, the interval range was established as $\pm 3{\sigma}^{2}$

- In the second phase, the same three first steps of Phase A are applied to the sample under study, and ${m}_{i}$ for each one of them is obtained. To detect outliers for each observed value, we check whether it is within the laminarity interval or if, on the contrary, the mean leap value is outside the said interval; in this case, it is classified as a transilent leap. This can be done in real-time as, for each value, only the value of the k nearest (or, in the case of time-series data, the k previous) values of the variable are needed.

- In Phase A, establishing the laminarity of the variable and the range of normal values.

Algorithm 1: Establishing the laminarity of the variable and the range of normal values. |

- In Phase B, the LTO method is used to detect the outliers of the variable.

Algorithm 2: Using the LTO method to detect the outliers. |

## 3. Anomaly Detection in Wind Tunnel System Performance

#### 3.1. Variables Used in Monitoring the Behavior of Wind Tunnels

- Communication set-up.
- Acquisition of data.
- Real-time visualization of the installation status with Real Time System.
- Information storage in databases with a Reporting Server System.
- Post-processing of the information with Reporting Server for later historical visualization through graphics using the Real Time system.

- Analog. These can take a continuous value, usually within a known range. This is the case of the air speed, which is a percentage value of the operating setpoint, which normally takes values between 0 and 100%. In this case, it could be a minimum limit programmed in the PLC. The maximum value responds to the value configured for the application, which, in this case, is greater than 100%, such that it was configured with the possibility of operating at overspeed.
- Digital. These are bi-state variables, such as YES or NO, ONE or ZERO, TRUE or FALSE. A ZERO/FALSE value indicates that the event is not active. A ONE/TRUE value indicates that the event is active. If there is a TRUE in a run command, it means that the system is activating the run command. If there is a TRUE in an alarm, that alarm is active; when it goes to FALSE, the alarm stops—this could be due to a condition or acknowledgment. CONDITION means that if the condition that produces the alarm disappears, the alarm event also disappears. ACKNOWLEDGMENT means that if the condition that produces the alarm disappears, the alarm event does not disappear, such that an acknowledgment action by the operator is necessary (e.g., pressing an alarm reset button). The capture of event data allows for identifying the number of alarms produced and their typology. Events can identify previous malfunction situations, so their analysis can allow for anticipating a machine stoppage or the performance of preventive maintenance actions.

- Fans:
- -
- Speed. Fan rotation speed. Measured in percentage (%); as a consequence, the expected value is in the interval (0, 100). It is measured by the drivers.
- -
- Speed reference VSD (Variable Speed Drives). Speed reference for the drivers. This is the speed setpoint of the driver, which indicates that its value is a percentage; as a consequence, the expected value is in the interval (0, 100). If the driver is receiving a setpoint of 50%, it is modulating its nominal output. It is measured by its own driver.
- -
- Current. Average instantaneous current of the motor output phases. Measured in Amps (A). It is measured by the driver.
- -
- Power. Instantaneous power consumed by the motor. Measured in Kilowatts (kW). It is measured by the driver.
- -
- Winding temperature of the motor phases V, U, and W. It is a three-phase motor and this variable measures the different winding temperatures of the motor for phases V, U, and W. This is measured in degrees centigrade (ºC), measured by sensors installed in the motor winding. A value of 850 ºC indicates a problem in the probe or in the acquisition electronics.
- -
- Engine front and rear bearing temperatures. These are measured in degrees centigrade (ºC), and are collected by sensors installed in the motor bearing. Sensors read by ZMDAQ electronics, PT100 input channel. A value of 200 ºC indicates a problem in the probe or in the acquisition electronics.
- -
- Vibration in the general motor. Measured in millimeters/second (mm/s). It is measured by sensors installed in the motor housing. Sensor reading by ZMDAQ electronics, analogue input channel 4.20 mA.
- -
- Measured pumping pressure value. Measured in Pascals (Pa). It is measured by a Petermann sensor in the fan. The pressure differential between taps B+ and B- is measured using a pneumatic inlet channel.

- Drivers:
- -
- Alternating voltage between the driver input phases R and S.
- -
- Alternating voltage between the driver input phases S and T.
- -
- Alternating voltage between the driver input phases T and R.
- -
- Average alternating voltage of the three motor phases U, V, and W
- -
- Motor output IGBT temperature (these are the most critical switching elements).
- -
- General temperature of the motor.
- -
- Instantaneous power output to motor.
- -
- Average current of the three motor phases U, V, and W.
- -
- Motor output phase U current.
- -
- Motor output phase V current.
- -
- Motor output phase W current.
- -
- Instantaneous speed of motor rotation in percentage.
- -
- Instantaneous speed of motor rotation in Hertz.
- -
- Instantaneous speed of motor rotation in revolutions per minute.
- -
- Sum of energy consumed; these are data returned by the drive.
- -
- Sum of energy regenerated to the grid; these are data returned by the drive.
- -
- Instantaneous drive input power.
- -
- Inverter input phase R current.
- -
- Inverter input phase S current.
- -
- Inverter input phase T current.
- -
- Cosine of Phi at the input of the drive.
- -
- Harmonic distortion value in intensity at the input of the drive.

- Wind tunnel facility:
- -
- System speed setpoint in percentage.
- -
- Sum of powers of the four fans.
- -
- Total flow value of the installation.
- -
- Air flow speed.
- -
- Temperature setpoint inside the tunnel.
- -
- Measurement of the temperature inside the tunnel.

#### 3.2. Anomaly Detection in Wind Tunnel Variables

#### 3.2.1. Applying Modified Local Transilience Outlier Identification Method (LTO) for Outlier Detection in Wind Tunnels

#### 3.2.2. Applying Probability Distributions for Anomaly Detection in Wind Tunnels

## 4. Research Results

#### 4.1. Intervals of Correct Operation and Anomalies Observed in Variables Involved in Wind Tunnels Performance Applying the Probability Distribution Method

- Pressure: The values showed a marked difference for one of the fans, where the mean values and standard deviation were above those of the rest of the fans. The fans reached very different minima and maxima, and it is necessary to remember the description of the pressure variable previously made, where it was specified that the fans should operate normally at a negative mean value (or close to zero). There was a limit value of 400 Pa, above which the fan operates in a high pump zone. Above this value, one should could consider stopping its operation. Therefore, the pressure distribution analysis allowed us to observe frequent values above zero and even exceeding 400 Pa. Although the distribution curve resembled a Gaussian distribution, it can be seen that, in most cases (except for fan 1), although there was symmetry with respect to the mean, the means had positive values, the highest being that of fan 3. The data obtained during the analysis allowed us to see the intervals detected by the model, where the normal operating values would be. Any value outside this range would be an anomalous value within the operations of the system. With the developed method, a range of normal values was established, in which 99.99% of cases were found.
- -
- #1 Fan: −152.4557 140.9509
- -
- #2 Fan: −152.9021 166.1269
- -
- #3 Fan: −220.8516 250.6594
- -
- #4 Fan: −140.6571 159.1673

Running the probability distribution method on the test data yielded the following results:- -
- #1 Fan: 0.4% of outlier values were detected, exceeding mainly positive values.
- -
- #2 Fan: 0.7% of outlier values were detected, exceeding mainly positive values.
- -
- #3 Fan: 0.6% of outlier values were detected, exceeding mainly positive values.
- -
- #4 Fan: 0.5% of outlier values were detected, exceeding mainly positive values.

- Vibration: This refers to the vibration in the general engine; it is measured in millimeters per second, where values above 10.2 mm/s are excessive. The values were collected through a sensor in the motor housing.The vibration variable was analyzed to identify anomalous values. The following data present the intervals detected by the model where the normal operating values would be. Any value outside this range would be an anomalous value within the operations of the system. With the developed method, a range of normal values was established, in which 99.73% of cases were found.
- -
- #1 Fan: −1.365203 4.060265
- -
- #2 Fan: −1.287130 3.253275
- -
- #3 Fan: −1.384863 2.924214
- -
- #4 Fan: −2.814419 6.267012

Running the probability distribution method on the test data showed the following results:- -
- #1 Fan: 0.3% of outlier values were detected, exceeding mainly positive values.
- -
- #2 Fan: 0.1% of outlier values were detected, exceeding mainly positive values.
- -
- #3 Fan: 0.08% of outlier values were detected, exceeding mainly positive values.
- -
- #4 Fan: 0.01% of outlier values were detected, exceeding mainly positive values.

- Flow: This is a variable relative to the installation, which represents the total flow value of the installation, measured in m${}^{3}$/s. Differential pressure is measured in the installation, in order to calculate the flow based on that pressure with the measurement section configured in the PLC. From the analysis of the variable total flow, the following interval was obtained, where the normal operations of the system must be located.
- -
- Total Flow: 251.8248 1159.6980

During the execution of the analysis method, the test data showed that a total of 0.9% outlier values were detected, while 98.75% of the data were in the range of normal values, as can be seen in Figure 5. - Speed: This is the air flow speed, in km/h, which is calculated based on the differential pressure measurement of the installation.From the analysis of the variable speed, the following interval was obtained, where the normal operation of the system must be located.
- -
- Speed: −30.94886 291.87405

During the execution of the analysis method, the test data showed that a total of 0.3% outlier values were detected, where 98.70% of the data were in the range of normal values, as can be seen in Figure 6. - Temperature: This is a measure of the general installation, in terms of the interior temperature in the tunnel. It is measured in degrees centigrade and consists of values that a cooling device sends to the system. From the analysis of the variable temperature, the following interval was obtained, where the normal operation of the system must be located.
- -
- Temperature: 15.90087 31.51146

During the execution of the analysis method, the test data showed that a total of 0.3% outlier values were detected, where 99.70% of the data were in the range of normal values, as can be seen in Figure 7.

#### 4.2. Intervals of Correct Operation and Anomalies Observed in Variables Involved in Wind Tunnel Performance Applying the New Mean of Distances to Local Transilience Outlier Identification Method (LTO)

- Speed: This variable is found within the data collection of the fan frequency drivers and collects the instantaneous speed of motor rotation in Hertz, a value that is returned by the driver itself.The data obtained in the analysis allowed us to identify the intervals where the normal operating values would be. Any value outside this range would be an anomalous value within the operations of the system. With the developed method, a range of normal values was established, in which 99.99% of cases were found.
- -
- #1 Driver: −35.60271 35.61316
- -
- #2 Driver: −35.70871 35.71897
- -
- #3 Driver: −35.16069 35.17099
- -
- #4 Driver: −35.19508 35.20562

Running the method based on the new mean of distances to LTO outlier identification on the test data showed the following results:- -
- #1 Driver: 0.1% of outlier values were detected, exceeding mainly positive values.
- -
- #2 Driver: 0.2% of outlier values were detected, exceeding mainly positive values.
- -
- #3 Driver: 0.2% of outlier values were detected, exceeding mainly positive values.
- -
- #4 Driver: 0.1% of outlier values were detected, exceeding mainly positive values.

Figure 8 shows the outliers found during the analysis of the variable. Those values outside the normal range (and, thus, identified as outliers) can be seen in the graph. - Temperature IGBT: This is the temperature of the motor output IGBTs, which is a very critical element in the system. The temperature is measured in degrees centigrade and is a value returned by the drive. The data obtained in the analysis allowed us to identify the intervals where the normal operating values would be. Any value outside this range would be an anomalous value within the operations of the system. With the developed method, a range of normal values was established, in which 99.99% of cases were found.
- -
- #1 Driver: −16.13426 16.13375
- -
- #2 Driver: −16.21591 16.21481
- -
- #3 Driver: −16.27438 16.27396
- -
- #4 Driver: −14.84254 14.84203

Running the method based on the new mean of distances to LTO outlier identification on the test data showed the following results:- -
- #1 Driver: 0.2% of outlier values were detected, exceeding mainly positive values.
- -
- #2 Driver: 0.3% of outlier values were detected, exceeding mainly positive values.
- -
- #3 Driver: 0.3% of outlier values were detected, exceeding mainly positive values.
- -
- #4 Driver: 0.3% of outlier values Were detected, exceeding mainly positive values.

Figure 9 shows the outliers found during the analysis of the variable. Those values outside the normal range (and, thus, identified as outliers) can be seen in the graph.

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Zhu, W. Models for wind tunnel tests based on additive manufacturing technology. Prog. Aerosp. Sci.
**2019**, 110. [Google Scholar] [CrossRef] - Ruchała, P.; Placek, R.; Stryczniewicz, W.; Matyszewski, J.; Cieśliński, D.; Bartkowiak, B. Wind Tunnel Tests of Influence of Boosters and Fins on Aerodynamic Characteristics of the Experimental Rocket Platform. Trans. Aerosp. Res.
**2019**, 2017. [Google Scholar] [CrossRef] [Green Version] - Bayati, I.; Belloli, M.; Bernini, L.; Zasso, A. Aerodynamic design methodology for wind tunnel tests of wind turbine rotors. J. Wind Eng. Ind. Aerodyn.
**2017**, 167. [Google Scholar] [CrossRef] [Green Version] - Blocken, B.; Stathopoulos, T.; van Beeck, J.P. Pedestrian-level wind conditions around buildings: Review of wind-tunnel and CFD techniques and their accuracy for wind comfort assessment. Build. Environ.
**2016**, 100. [Google Scholar] [CrossRef] - Eidenberger, H.; Mossel, A. Indoor skydiving in immersive virtual reality with embedded storytelling. In Proceedings of the 21st ACM Symposium on Virtual Reality Software and Technology, Beijing, China, 13–15 November 2015. [Google Scholar] [CrossRef]
- Anh, D.T.; Karol, D.; Katarzyna, S. The Predictive Maintenance Concept in the Maintenance Department of the “Industry 4.0”. Prod. Enterpris. Found. Manag.
**2018**, 10, 283–292. [Google Scholar] - Ala’raj, M.; Majdalawieh, M.; Abbod, M.F. Improving binary classification using filtering based on k-NN proximity graphs. J. Big Data
**2020**. [Google Scholar] [CrossRef] - Cuadrado-Gallego, J.J.; Demchenko, Y. Data Science Body of Knowledge. In The Data Science Framework: A View from the EDISON Project; Cuadrado-Gallego, J.J., Demchenko, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 43–73. [Google Scholar] [CrossRef]
- Hawkins, D.M. Identification of Outliers; Chapman and Hall: London, UK, 1980; Volume 11. [Google Scholar]
- Chandola, V. Anomaly detection: A survey. ACM Comput. Surv. CSUR
**2009**, 41, 1–58. [Google Scholar] [CrossRef] - Gupta, M. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng.
**2013**, 26, 2250–2267. [Google Scholar] [CrossRef] - Martí, L.; Sanchez-Pi, N.; Molina, J.; Garcia, A. Anomaly detection based on sensor data in petroleum industry applications. Sensors
**2015**, 15, 2774–2797. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Ang, J.C.; Andri, M.; Habibollah, H.; Haza Nuzly, A.H. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2015**, 13, 971–989. [Google Scholar] [CrossRef] [PubMed] - Cuadrado, G.; Demchenko, Y. (Eds.) The Data Science Framework: A View Form the Edison Project; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Barnett, V.; Lewis, T. Outliers in Statistical Data; Wiley: New York, NY, USA, 1978. [Google Scholar]
- Angiulli, F.; Fassetti, F. Detecting distance-based outliers in streams of data. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, Lisboa, Portugal, 11 June 2007; pp. 811–820. [Google Scholar]
- Yang, D.; Ward, M.O. Neighbor-based pattern detection for windows over streaming data. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, 24–26 March 2009; pp. 529–540. [Google Scholar]
- Cao, H.; Zhou, Y.; Chen, G. Attribute outlier detection over data streams. In Proceedings of the 15th International Conference DASFAA Part II, Tsukuba, Japan, 1–4 April 2010; pp. 216–230. [Google Scholar]
- Breunig, M.M.; Kriegel, H.-P.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the Acm Sigmod International Conference on Management of Data, Dallas, TX, USA, 16 May 2000; pp. 93–104. [Google Scholar]
- Pincombe, B. Anomaly detection in time series of graphs using ARMA processes. ASOR Bull.
**2005**, 24, 2–10. [Google Scholar] - Gogoi, P.; Bhattacharyya, D.K. Anomaly detection analysis of intrusion data using supervised and unsupervised approach. J. Conver. Inf. Technol.
**2010**, 5, 95–110. [Google Scholar] - Wang, H.; Bah, M.J.; Hammad, M. Progress in Outlier Detection Techniques: A Survey. IEEE Access
**2019**, 7. [Google Scholar] [CrossRef] - Safaei, M.; Asadi, S.; Driss, M.; Boulila, W.; Alsaeedi, A.; Chizari, H.; Abdullah, R.; Safaei, M. A systematic literature review on outlier detection in wireless sensor networks. Symmetry
**2020**, 12, 328. [Google Scholar] [CrossRef] [Green Version] - Basu, S.; Meckesheimer, M. Automatic outlier detection for time series: An application to sensor data. Knowl. Inform. Syst.
**2007**, 11, 137–154. [Google Scholar] [CrossRef]

**Figure 8.**Example of the instantaneous speed of motor rotation outliers detected in the testing data.

**Figure 9.**Example of the Insulated-Gate Bipolar Transistor (IGBT) temperature outliers detected in the testing data.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Quesada, E.; Cuadrado-Gallego, J.J.; Patricio, M.Á.; Usero, L.
Outlier Detection Transilience-Probabilistic Model for Wind Tunnels Based on Sensor Data. *Sensors* **2021**, *21*, 2532.
https://doi.org/10.3390/s21072532

**AMA Style**

Quesada E, Cuadrado-Gallego JJ, Patricio MÁ, Usero L.
Outlier Detection Transilience-Probabilistic Model for Wind Tunnels Based on Sensor Data. *Sensors*. 2021; 21(7):2532.
https://doi.org/10.3390/s21072532

**Chicago/Turabian Style**

Quesada, Encarna, Juan J. Cuadrado-Gallego, Miguel Ángel Patricio, and Luis Usero.
2021. "Outlier Detection Transilience-Probabilistic Model for Wind Tunnels Based on Sensor Data" *Sensors* 21, no. 7: 2532.
https://doi.org/10.3390/s21072532