# Anomaly Detection on Data Streams for Smart Agriculture

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- 1
- we present a detailed state of the art on anomaly-detection techniques with a focus on smart agriculture;
- 2
- we propose a robust ensemble-based methodology for the detection of anomalies from data streams in smart agriculture context;
- 3
- we apply the proposed technique to a data stream of combine-harvester GPS logs with the aim of identifying anomalies that impact harvest efficiency of farm machinery; and
- 4
- we apply the proposed technique to crop data with the aim of identifying anomalies that reveal the state of the crop during harvest.

## 2. Materials and Methods

#### 2.1. Data Preprocessing and Transformation

#### 2.1.1. Scenario A: Combine Harvester GPS Logs

**Definition**

**1.**

- ${\mathrm{c}}_{\mathrm{id}}$ is the combine harvester identifier,
- t is the timestamp of the GPS log,
- x is the longitude of the combine ${\mathrm{c}}_{\mathrm{k}}$ at time t,
- y is the latitude of ${\mathrm{c}}_{\mathrm{k}}$ at time t,
- s is the speed of ${\mathrm{c}}_{\mathrm{k}}$ at time t in miles/hour,
- b is the bearing of ${\mathrm{c}}_{\mathrm{k}}$ at time t in degrees
- a is the accuracy of the captured GPS location of ${\mathrm{c}}_{\mathrm{k}}$ at time t

**Definition**

**2.**

#### 2.1.2. Scenario B: Crop Dataset

#### 2.2. Proposed Approach

#### 2.2.1. Enhanced LSCP Algorithm (ELSCP)

- 1
- Using a ball tree nearest neighbour algorithm with Haversine distance metric, the local region is defined to be the set of nearest training points in randomly sampled feature subspaces that occur more frequently using a defined threshold over multiple iterations.
- 2
- Using the local region, a local pseudo-ground truth is defined, and Kendall correlation is calculated between each base detector’s training outlier scores and the pseudo-ground truth.
- 3
- A histogram is built out of Kendall correlation scores, and detectors in the largest bin are selected as competent base detectors for the given test instance.
- 4
- Using the correlation scores, the best detector is selected. The final score for the test instance is computed by using the average of the best detector’s local region scores.

#### 2.2.2. Performance Indicators

- True positive (TP): true positives are correctly identified anomalies.
- False positive (FP): false positive are incorrectly identified normal data.
- True negative (TN): true negative are correctly identified normal data.
- False negative (FN): false negative are incorrectly rejected anomalies.

## 3. Results

#### 3.1. Scenario A: Combine Harvester GPS Data

#### 3.2. Scenario B: Crop Damage

## 4. Conclusions

#### Limitations and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

ANNs | Artificial neural networks |

AUC-ROC | Area under the curve of the receiver operating characteristic |

AUCPR | Area Under the Curve of Precision-Recall |

ARIMA | Autoregressive integrated moving average model |

CBLOF | Clustering-based local outlier factor |

COPOD | Copula-based outlier detector |

DBSCAN | Density-based spatial clustering of applications with noise |

ELSCP | Enhanced locally selective combination in parallel outlier ensembles |

FP | False positive |

FPR | False-positive rate |

FN | False negative |

GPS | Global positioning system |

GPU | Graphics processing unit |

HBOS | Histogram-based outlier score |

IQR | Interquartile range |

kNN | k-nearest neighbours detector |

LOF | Local outlier factor |

LODA | Lightweight online detector of anomalies |

LSTM | Long short-term memory |

LSCP | Locally selective combination in parallel outlier ensembles |

MCD | Minimum covariance determinant |

OCSVM | One-class support vector machines |

P | Precision |

PyOD | Python outlier detection |

QGIS | Quantum geographic information system |

R | Recall |

SAR | Synthetic aperture radar |

SVM | Support vector machine |

TP | True positive |

TPR | True positive rate |

TN | True negative |

## References

- Allahyari, M.S.; Damalas, C.A.; Ebadattalab, M. Farmers’ technical knowledge about integrated pest management (IPM) in olive production. Agriculture
**2017**, 7, 101. [Google Scholar] [CrossRef] [Green Version] - Fargnoli, M.; Lombardi, M.; Puri, D. Applying hierarchical task analysis to depict human safety errors during pesticide use in vineyard cultivation. Agriculture
**2019**, 9, 158. [Google Scholar] [CrossRef] [Green Version] - Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR)
**2009**, 41, 1–58. [Google Scholar] [CrossRef] - Ou, C.H.; Chen, Y.A.; Huang, T.W.; Huang, N.F. Design and Implementation of Anomaly Condition Detection in Agricultural IoT Platform System. In Proceedings of the 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, 7–10 January 2020; pp. 184–189. [Google Scholar]
- Christiansen, P.; Nielsen, L.N.; Steen, K.A.; Jørgensen, R.N.; Karstoft, H. DeepAnomaly: Combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field. Sensors
**2016**, 16, 1904. [Google Scholar] [CrossRef] [Green Version] - Xu, J.; Guga, S.; Rong, G.; Riao, D.; Liu, X.; Li, K.; Zhang, J. Estimation of Frost Hazard for Tea Tree in Zhejiang Province Based on Machine Learning. Agriculture
**2021**, 11, 607. [Google Scholar] [CrossRef] - Abdallah, M.; Lee, W.J.; Raghunathan, N.; Mousoulis, C.; Sutherland, J.W.; Bagchi, S. Anomaly Detection through Transfer Learning in Agriculture and Manufacturing IoT Systems. arXiv
**2021**, arXiv:2102.05814. [Google Scholar] - Mouret, F.; Albughdadi, M.; Duthoit, S.; Kouamé, D.; Rieu, G.; Tourneret, J.Y. Outlier detection at the parcel-level in wheat and rapeseed crops using multispectral and SAR time series. Remote Sens.
**2021**, 13, 956. [Google Scholar] [CrossRef] - Blackmore, S. The interpretation of trends from multiple yield maps. Comput. Electron. Agric.
**2000**, 26, 37–51. [Google Scholar] [CrossRef] - Matheron, G. Principles of geostatistics. Econ. Geol.
**1963**, 58, 1246–1266. [Google Scholar] [CrossRef] - Blackmore, S.; Godwin, R.J.; Fountas, S. The analysis of spatial and temporal trends in yield map data over six years. Biosyst. Eng.
**2003**, 84, 455–466. [Google Scholar] [CrossRef] - Ehsani, R. Increasing field efficiency of farm machinery using GPS. EDIS. 2010, 2010. Available online: https://journals.flvc.org/edis/article/view/118721 (accessed on 23 September 2021).
- Wang, Y.; Balmos, A.; Krogmeier, J.; Buckmaster, D. Data-Driven Agricultural Machinery Activity Anomaly Detection and Classification. In Proceedings of the 14th International Conference on Precision Agriculture, Montreal, Quebec, Canada, 24–27 June 2018. [Google Scholar]
- Ramaswamy, S.; Rastogi, R.; Shim, K. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 427–438. [Google Scholar]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 93–104. [Google Scholar]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput.
**2001**, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed] - Zhao, Y.; Nasrullah, Z.; Hryniewicki, M.K.; Li, Z. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining, SIAM, Calgary, Alberta, Canada, 2–4 May 2019; pp. 585–593. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
- Pevnỳ, T. Loda: Lightweight on-line detector of anomalies. Mach. Learn.
**2016**, 102, 275–304. [Google Scholar] [CrossRef] [Green Version] - Goldstein, M.; Dengel, A. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In KI-2012: Poster and Demo Track; 2012. Available online: https://www.goldiges.de/publications/HBOS-KI-2012.pdf (accessed on 23 September 2021).
- He, Z.; Xu, X.; Deng, S. Discovering cluster-based local outliers. Pattern Recognit. Lett.
**2003**, 24, 1641–1650. [Google Scholar] [CrossRef] - Li, Z.; Zhao, Y.; Botta, N.; Ionescu, C.; Hu, X. COPOD: copula-based outlier detection. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 1118–1123. [Google Scholar]
- Zimek, A.; Campello, R.J.; Sander, J. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explor. Newsl.
**2014**, 15, 11–22. [Google Scholar] [CrossRef] - Britto Jr, A.S.; Sabourin, R.; Oliveira, L.E. Dynamic selection of classifiers—A comprehensive review. Pattern Recognit.
**2014**, 47, 3665–3680. [Google Scholar] [CrossRef] - Polikar, R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag.
**2006**, 6, 21–45. [Google Scholar] [CrossRef] - Ho, T.K.; Hull, J.J.; Srihari, S.N. Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell.
**1994**, 16, 66–75. [Google Scholar] - Woods, K.; Kegelmeyer, W.P.; Bowyer, K. Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell.
**1997**, 19, 405–410. [Google Scholar] [CrossRef] - Zhang, Y.; Krogmeier, J. Combine Kart Truck GPS Data Archive. Purdue University Research Repository. 2020. Available online: https://purr.purdue.edu/publications/3083/2 (accessed on 23 September 2021). [CrossRef]
- Zhang, Y.; Balmos, A.; Krogmeier, J.V.; Buckmaster, D. Working zone identification for specialized micro transportation systems using GPS tracks. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Canary Islands, Spain, 15–18 September 2015; pp. 1779–1784. [Google Scholar]
- Koninti, S.K. AV JanataHack: Machine Learning in Agriculture; Analytics Vidhya, 2020. Available online: https://datahack.analyticsvidhya.com/contest/janatahack-machine-learning-in-agriculture/#DiscussTab (accessed on 23 September 2021).
- Van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate imputation by chained equations in R. J. Stat. Softw.
**2011**, 45, 1–67. [Google Scholar] [CrossRef] [Green Version] - Aggarwal, C.C.; Sathe, S. Theoretical foundations and algorithms for outlier ensembles. ACM Sigkdd Explor. Newsl.
**2015**, 17, 24–47. [Google Scholar] [CrossRef] - Aggarwal, C.C. Outlier analysis. In Data Mining; Springer: Berlin, Germany, 2015; pp. 237–263. [Google Scholar]
- Rousseeuw, P.J.; Hubert, M. Anomaly detection by robust statistics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2018**, 8, e1236. [Google Scholar] [CrossRef] [Green Version] - Rousseeuw, P.J.; Driessen, K.V. A fast algorithm for the minimum covariance determinant estimator. Technometrics
**1999**, 41, 212–223. [Google Scholar] [CrossRef] - Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology
**1982**, 143, 29–36. [Google Scholar] [CrossRef] [Green Version] - Boyd, K.; Eng, K.H.; Page, C.D. Area under the precision-recall curve: Point estimates and confidence intervals. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin, Germany, 2013; pp. 451–466. [Google Scholar]
- Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.J.; Micenková, B.; Schubert, E.; Assent, I.; Houle, M.E. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov.
**2016**, 30, 891–927. [Google Scholar] [CrossRef] - Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef] - Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE
**2015**, 10, e0118432. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zhao, Y.; Nasrullah, Z.; Li, Z. PyOD: A Python Toolbox for Scalable Outlier Detection. J. Mach. Learn. Res.
**2019**, 20, 1–7. [Google Scholar] - Mandrekar, J.N. Receiver operating characteristic curve in diagnostic test assessment. J. Thorac. Oncol.
**2010**, 5, 1315–1316. [Google Scholar] [CrossRef] [Green Version] - Wang, B.; Mao, Z. Outlier detection based on a dynamic ensemble model: Applied to process monitoring. Inf. Fusion
**2019**, 51, 244–258. [Google Scholar] [CrossRef] - Hajebi, K.; Abbasi-Yadkori, Y.; Shahbazi, H.; Zhang, H. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Centre, Convencions Internacional Barcelona, 19–22 July 2011. [Google Scholar]
- Cruz, R.M.; Sabourin, R.; Cavalcanti, G.D. Dynamic classifier selection: Recent advances and perspectives. Inf. Fusion
**2018**, 41, 195–216. [Google Scholar] [CrossRef] - Rayana, S.; Akoglu, L. Less is more: Building selective anomaly ensembles. ACM Trans. Knowl. Discov. Data (TKDD)
**2016**, 10, 1–33. [Google Scholar] [CrossRef]

**Figure 1.**Movement trajectory of a tractor in a citrus grove [12].

**Figure 2.**Field of interest: trajectory of a combine harvester showing normal points in green and anomalies in red.

**Figure 8.**LSCP flowchart. Steps requiring recomputation highlighted in yellow; cached steps in grey [17].

Column Name | Description |
---|---|

Id | UniqueID |

Estimated_Insects_Count | Estimated insects count per square meter |

Crop_Type | Category of Crop(0,1) |

Soil_Type | Category of Soil (0,1) |

Pesticide_Use_Category | Type of pesticides used (1, never; 2, previously used; 3, currently using) |

Number_Doses_Week | Number of doses per week |

Number_Weeks_Used | Number of weeks used |

Number_Weeks_Quit | Number of weeks pesticide not used |

Season | Season Category (1,2,3) |

Crop_Damage | Crop damage category (0 = alive, 1 = damage due to other causes, 2 = damage due to pesticides) |

Model | AUC-ROC | AUCPR | F1 Score |
---|---|---|---|

ELSCP | 0.998 | 0.972 | 0.921 |

OCSVM | 0.897 | 0.385 | 0.167 |

LODA | 0.913 | 0.215 | 0.078 |

COPOD | 0.934 | 0.173 | 0.228 |

CBLOF | 0.756 | 0.038 | 0.014 |

LSCP | 0.533 | 0.022 | 0.032 |

Model | AUC-ROC | AUCPR | F1 Score |
---|---|---|---|

ELSCP | 0.641 | 0.277 | 0.343 |

OCSVM | 0.595 | 0.253 | 0.291 |

LODA | 0.580 | 0.200 | 0.122 |

COPOD | 0.675 | 0.297 | 0.282 |

CBLOF | 0.636 | 0.226 | 0.212 |

LSCP | 0.452 | 0.169 | 0.135 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Moso, J.C.; Cormier, S.; de Runz, C.; Fouchal, H.; Wandeto, J.M.
Anomaly Detection on Data Streams for Smart Agriculture. *Agriculture* **2021**, *11*, 1083.
https://doi.org/10.3390/agriculture11111083

**AMA Style**

Moso JC, Cormier S, de Runz C, Fouchal H, Wandeto JM.
Anomaly Detection on Data Streams for Smart Agriculture. *Agriculture*. 2021; 11(11):1083.
https://doi.org/10.3390/agriculture11111083

**Chicago/Turabian Style**

Moso, Juliet Chebet, Stéphane Cormier, Cyril de Runz, Hacène Fouchal, and John Mwangi Wandeto.
2021. "Anomaly Detection on Data Streams for Smart Agriculture" *Agriculture* 11, no. 11: 1083.
https://doi.org/10.3390/agriculture11111083