Industrial-AdaVAD: Adaptive Industrial Video Anomaly Detection Empowered by Edge Intelligence
Abstract
1. Introduction
- How to implement lightweight adaptive anomaly detection in resource-constrained equipment. Most anomalies in the industrial production process are sudden, so anomalies need to be detected as quickly as possible. Existing methods only consider the accuracy of detection but ignore the detection efficiency in practical applications; due to the large number of parameters of the model, a high computing load will be generated on the edge computing device, and the detection delay will be long. Therefore, it is necessary to explore lightweight, low-latency, and adaptive anomaly detection methods in industrial production scenarios.
- How to realize transferable anomaly detection methods between complex industrial scenes. Industrial production video scenes are more complex and changeable than normal surveillance videos. The same abnormal behavior may occur in different industrial scenes, limited by the difference in video background. The data distribution of abnormal events will also change, and a model trained in a single scenario may not perform well in other scenarios. Therefore, it is necessary to explore methods that can improve the generalization ability of the model in unknown scenarios to improve the model’s adaptability to the scene.
- We propose a lightweight, feature-adaptive method for video anomaly detection, centered on an enhanced feature extraction module. This module employs residual networks and channel attention mechanisms to minimize parameter count and adopt dense connections to enable efficient output pruning according to different parameter volumes. Furthermore, we propose a method for selecting optimized pruned models through a multidimensional evaluation to maintain balance between computational efficiency and accuracy. Although there is still room for improvement, the current accuracy and latency balance meet the basic requirements for real-time industrial video anomaly detection in edge environments.
- In the scene adaptation module, we propose a scene-adaptive transferable video anomaly detection method in the scene adaptation module to solve the problem of poor generalization ability of existing video anomaly detection methods in industrial production scenarios. This method uses multiple-layer adversarial domain adaptation at different feature levels to ensure the accuracy of industrial video anomaly detection.
- We constructed a dataset based on coal mine industrial video surveillance data. Experimental results show that our model can achieve low-latency detection on edge devices and can perform cross-scenario detection on the dataset. The results show that our proposed method can be deployed in edge devices and can effectively detect abnormal events in industrial production videos.
2. Related Works
2.1. Video Anomaly Detection
2.2. Domain Adaptation
2.3. Edge Intelligence
3. Methodology
3.1. Problem Definition
3.2. Framework
- The video preprocessing module processes the characteristics of industrial production videos to obtain the actual input video frames.
- The lightweight adaptive feature extractor can be deployed on resource-constrained edge devices to adaptively extract video frame features based on computing resources and video scene complexity.
- The anomaly detector reconstructs the input frame and determines whether it is abnormal by calculating the difference between the reconstructed frame and the actual frame.
- The scene adaptation module uses the adversarial domain adaptation method to input source scene and target scene frames and calculate the adversarial loss to achieve unsupervised industrial scene adaptation.
- The model evaluation module evaluates different pruned models in terms of FLOPs, energy consumption, and detection accuracy, to select the appropriate pruning model according to different operating environments.
3.2.1. Video Preprocessing
3.2.2. Adaptive Lightweight Feature Extraction
3.2.3. Scene-Adaptive Transferable Video Anomaly Detection
Algorithm 1 Scene-adaptive algorithm. |
Input: Source scene frame , Target scene frame |
Output: Anomaly score (PSNR) of target scene frame |
1: for I in source video , target video do |
2: Feature extractor extracts video frame features |
3: Reconstruct video frame |
4: Update scene classifier C |
5: for each i in adaptive levels do |
6: if then |
7: Calculates the reconstruction loss by Equation (7) |
8: end if |
9: if then |
10: Calculates the adversarial loss by Equation (8) |
11: end if |
12: end for |
13: Calculate total loss by Equation (9) |
14: Update Feature extractor and reconstructor by Equation (10) |
15: end for |
3.2.4. Multidimensional Evaluation Pruned Model Selection Method
4. Experiments
4.1. Experimental Settings
4.1.1. Datasets
- The UCSD Ped dataset [32] contains two sub-datasets: Ped1 and Ped2. The Ped1 dataset contains videos of pedestrians indoors, while the Ped2 dataset contains videos of pedestrians outdoors. The dataset includes abnormal events captured in various crowd scenes ranging from sparse to dense, such as unexpected behaviors such as walking on the road, walking on the grass, and vehicle movement on the sidewalk. Ped1 consists of 34 training video samples and 36 testing video samples, and Ped2 consists of 16 training video samples and 12 testing video samples.
- The CUHK Avenue dataset [3] includes 16 training videos and 21 test videos. Abnormal events in the dataset include random running people, abandoned objects, and people walking with suspicious objects.
4.1.2. Baselines
- MPPCA [33]: A spatio-temporal Markov random field (MRF) model is proposed to detect abnormal events in videos. The nodes in the MRF graph correspond to different areas in the video frame.
- MDT [34]: A joint spatio-temporal anomaly detector is proposed that integrates time and space to detect abnormal behaviors in crowded scenes.
- Unmasking [35]: An unsupervised video abnormal event detection framework based on unmasking technology is proposed. By iteratively training a binary classifier, a classifier with higher training accuracy is finally used to detect abnormal events.
- ConvAE [36]: Two methods based on autoencoders are proposed. One is to use traditional hand-designed spatio-temporal local features to learn a fully connected autoencoder. The second is to build a fully convolutional feedforward autoencoder and build an end-to-end learning framework to capture multiple patterns in the dataset to detect anomalies.
- MemAE [37]: An improved autoencoder (Memory-Augmented Autoencoder, MemAE) is proposed to improve the robustness of autoencoder-based anomaly detection so that it can better handle anomalies.
- MNAD [38]: It is proposed to use a memory module with an update scheme, in which items in the memory record typical patterns of normal data, which enhances the discriminative ability of memory items and features deeply learned from normal data, and improves the anomaly detection effect.
- Feature Generalization [22]: Analyze the feature embedding of a pre-trained CNN, use cross-domain generalization metrics to study the generalization ability of source features in different target video domains, and verify the practicability of the feature generalization method on different video datasets.
- DANN [23]: An adaptive method based on DANN is proposed for the transfer of anomaly detection knowledge in an unsupervised manner. Unsupervised adversarial domain adaptation is used to generate significant differences between the distribution of normal and abnormal data in the target domain, thereby achieving anomaly detection in new scenarios.
- Finetune [39]: Extensive benchmarking using 12 different CNN models trained on ImageNet as feature extractors and fine-tuned on seven video anomaly detection benchmark datasets to detect video anomalies.
- Meta-Learning [40]: The problem of few-shot scene-adaptive anomaly detection is proposed, aiming to detect anomalies in previously unseen scenes using only a small number of frames, and a method based on meta-learning is proposed to solve the problem of lack of video data of the target scene.
4.1.3. Implementation Details
4.2. Results and Discussion
4.2.1. Performance on Adaptive Lightweight Feature Extraction
4.2.2. Performance on Scene-Adaptive Detection
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AIoT | Artificial Intelligence of Things |
VAD | Video Anomaly Detection |
SVM | Support Vector Machine |
CNN | Convolutional Neural Networks |
L-CNN | Lightweight Convolutional Neural Network |
HOG | Histogram of Oriented Gradients |
DNN | Deep Neural Network |
AHP | Analytical Hierarchy Process |
MRF | Markov Random Field |
MemAE | Memory-Augmented Autoencoder |
ROC | Receiver Operating Characteristic |
TPR | True Positive Rate |
FPR | False Positive Rate |
References
- Ma, L.; Dong, J.; Peng, K.; Zhang, C. Hierarchical monitoring and root-cause diagnosis framework for key performance indicator-related multiple faults in process industries. IEEE Trans. Ind. Inform. 2018, 15, 2091–2100. [Google Scholar] [CrossRef]
- Luo, W.; Liu, W.; Gao, S. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 341–349. [Google Scholar]
- Lu, C.; Shi, J.; Jia, J. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2720–2727. [Google Scholar]
- Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
- Patrikar, D.R.; Parate, M.R. Anomaly detection using edge computing in video surveillance system. Int. J. Multimed. Inf. Retr. 2022, 11, 85–110. [Google Scholar] [CrossRef]
- Georgiou, T.; Liu, Y.; Chen, W.; Lew, M. A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision. Int. J. Multimed. Inf. Retr. 2020, 9, 135–170. [Google Scholar] [CrossRef]
- Chen, Y.; Zhou, X.S.; Huang, T.S. One-class SVM for learning in image retrieval. In Proceedings of the 2001 International Conference on Image Processing (Cat. No. 01CH37205), Thessaloniki, Greece, 7–10 October 2001; Volume 1, pp. 34–37. [Google Scholar]
- Zhao, B.; Li, F.-F.; Xing, E.P. Online detection of unusual events in videos via dynamic sparse coding. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3313–3320. [Google Scholar]
- Mehran, R.; Oyama, A.; Shah, M. Abnormal crowd behavior detection using social force model. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 935–942. [Google Scholar]
- Nayak, R.; Pati, U.C.; Das, S.K. A comprehensive review on deep learning-based methods for video anomaly detection. Image Vis. Comput. 2021, 106, 104078. [Google Scholar] [CrossRef]
- Nikouei, S.Y.; Chen, Y.; Song, S.; Xu, R.; Choi, B.Y.; Faughnan, T. Smart surveillance as an edge network service: From harr-cascade, svm to a lightweight cnn. In Proceedings of the 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC), Philadelphia, PA, USA, 18–20 October 2018; pp. 256–265. [Google Scholar]
- Xu, R.; Nikouei, S.Y.; Chen, Y.; Polunchenko, A.; Song, S.; Deng, C.; Faughnan, T.R. Real-time human objects tracking for smart surveillance at the edge. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
- Wang, C.; Dong, S.; Zhao, X.; Papanastasiou, G.; Zhang, H.; Yang, G. SaliencyGAN: Deep learning semisupervised salient object detection in the fog of IoT. IEEE Trans. Ind. Inform. 2019, 16, 2667–2676. [Google Scholar] [CrossRef]
- Jiang, T.; Li, Y.; Xie, W.; Du, Q. Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4666–4679. [Google Scholar] [CrossRef]
- Liu, W.; Luo, W.; Lian, D.; Gao, S. Future frame prediction for anomaly detection—A new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6536–6545. [Google Scholar]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
- Zhang, Y.; Tang, H.; Jia, K.; Tan, M. Domain-symmetric networks for adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5031–5040. [Google Scholar]
- Wang, Q.; Michau, G.; Fink, O. Domain adaptive transfer learning for fault diagnosis. In Proceedings of the 2019 Prognostics and System Health Management Conference (PHM-Paris), Paris, France, 2–5 May 2019; pp. 279–285. [Google Scholar]
- dos Santos, F.P.; Ribeiro, L.S.; Ponti, M.A. Generalization of feature embeddings transferred from different video anomaly detection domains. J. Vis. Commun. Image Represent. 2019, 60, 407–416. [Google Scholar] [CrossRef]
- Fan, C.; Zhang, F.; Liu, P.; Sun, X.; Li, H.; Xiao, T.; Zhao, W.; Tang, X. Importance weighted adversarial discriminative transfer for anomaly detection. arXiv 2021, arXiv:2105.06649. [Google Scholar] [CrossRef]
- Xu, Z.; Li, J.; Zhang, M. A surveillance video real-time analysis system based on edge-cloud and fl-yolo cooperation in coal mine. IEEE Access 2021, 9, 68482–68497. [Google Scholar] [CrossRef]
- Chriki, A.; Touati, H.; Snoussi, H.; Kamoun, F. Deep learning and handcrafted features for one-class anomaly detection in UAV video. Multimed. Tools Appl. 2021, 80, 2599–2620. [Google Scholar] [CrossRef]
- Kim, W.J.; Youn, C.H. Lightweight online profiling-based configuration adaptation for video analytics system in edge computing. IEEE Access 2020, 8, 116881–116899. [Google Scholar] [CrossRef]
- Kim, J.H.; Kim, N.; Won, C.S. Deep edge computing for videos. IEEE Access 2021, 9, 123348–123357. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In International Workshop on Deep Learning in Medical Image Analysis; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6956–6965. [Google Scholar]
- Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv 2016, arXiv:1611.06440. [Google Scholar]
- Liu, S.; Li, X.; Zhou, Z.; Guo, B.; Zhang, M.; Shen, H.; Yu, Z. Adaenlight: Energy-aware low-light video stream enhancement on mobile devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2023, 6, 172. [Google Scholar] [CrossRef]
- Chan, A.B.; Vasconcelos, N. Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 909–926. [Google Scholar] [CrossRef]
- Kim, J.; Grauman, K. Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2921–2928. [Google Scholar]
- Li, W.; Mahadevan, V.; Vasconcelos, N. Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 18–32. [Google Scholar] [CrossRef]
- Tudor Ionescu, R.; Smeureanu, S.; Alexe, B.; Popescu, M. Unmasking the abnormal events in video. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2895–2903. [Google Scholar]
- Hasan, M.; Choi, J.; Neumann, J.; Roy-Chowdhury, A.K.; Davis, L.S. Learning temporal regularity in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 733–742. [Google Scholar]
- Gong, D.; Liu, L.; Le, V.; Saha, B.; Mansour, M.R.; Venkatesh, S.; Hengel, A.v.d. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1705–1714. [Google Scholar]
- Park, H.; Noh, J.; Ham, B. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14372–14381. [Google Scholar]
- Gutoski, M.; Ribeiro, M.; Hattori, L.T.; Romero, M.; Lazzaretti, A.E.; Lopes, H.S. A comparative study of transfer learning approaches for video anomaly detection. Int. J. Pattern Recognit. Artif. Intell. 2021, 35, 2152003. [Google Scholar] [CrossRef]
- Lu, Y.; Yu, F.; Reddy, M.K.K.; Wang, Y. Few-shot scene-adaptive anomaly detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 125–141. [Google Scholar]
UCSD Ped | Avenue | Coal Mine Video | ||
---|---|---|---|---|
Ped1 | Ped2 | |||
Training Video Frame | 6766 | 2533 | 15,328 | 12,687 |
Testing Video Frame | 7164 | 1997 | 15,324 | 11,529 |
Training Data | AUC | Energy Consumption [mAh] | Peak Power [W] | Average Power [W] | |
---|---|---|---|---|---|
DANN (Based on Unet) | Source | 76.46% | - | - | - |
100 | 77.47% | 25.60 | 7.648 | 7.579 | |
500 | 81.86% | 126.80 | 7.637 | 7.651 | |
1000 | 84.68% | 257.40 | 7.661 | 7.674 | |
Lightweight Feature Extractor | Source | 80.10% | - | - | - |
100 | 76.85% | 9.40 | 7.246 | 7.233 | |
500 | 80.93% | 44.30 | 7.271 | 7.264 | |
1000 | 84.31% | 83.30 | 7.197 | 7.201 |
Training Data | Training Time [min] | Inference Time [ms/f] | |||
---|---|---|---|---|---|
Jetson | PC | Jetson | PC | ||
DANN (Based on Unet) | Source | - | - | - | - |
100 | 0.89 | 1.68 | 226 | 95 | |
500 | 4.76 | 3.88 | 246 | 84 | |
1000 | 9.64 | 5.75 | 213 | 87 | |
Lightweight Feature Extractor | Source | - | - | - | - |
100 | 0.32 | 1.07 | 83 | 23 | |
500 | 1.61 | 2.11 | 85 | 20 | |
1000 | 3.19 | 3.46 | 78 | 21 |
Method | Ped1 → Ped2 | Ped1 → Coal Mine Video | |
---|---|---|---|
No Ada. | MPPCA [33] | 69.30% | - |
MPPCA+SFA [33] | 61.30% | - | |
MDT [34] | 82.90% | - | |
Unmasking [35] | 82.20% | - | |
ConvAE [36] | 81.10% | 78.16% | |
MNAD [38] | 81.40% | 76.96% | |
MemAE (directly) [37] | 81.83% | 74.24% | |
Ada. | Feature Gen [22] | 80.18% | 77.59% |
MemAE (finetune) [37] | 82.00% | 78.63% | |
DANN [23] | 91.90% | 83.89% | |
Finetune [39] | 89.30% | 81.03% | |
Meta-Learning [40] | 90.21% | 79.69% | |
Our Method | 92.03% | 84.31% |
Methods | Coal Mine Scene | Average AUC | |||
---|---|---|---|---|---|
Cableway | Equipment Center | Track | Control Room | ||
Feature Generalization | 81.23% | 79.14% | 85.32% | 80.45% | 81.54% |
MemAE (Finetune) | 83.91% | 84.25% | 87.41% | 82.13% | 84.43% |
DANN | 86.77% | 87.93% | 86.82% | 85.22% | 86.69% |
Our Method | 87.45% | 89.76% | 90.23% | 84.78% | 88.06% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, J.; Shen, H.; Ding, Y.; Guo, B. Industrial-AdaVAD: Adaptive Industrial Video Anomaly Detection Empowered by Edge Intelligence. Mathematics 2025, 13, 2711. https://doi.org/10.3390/math13172711
Xiao J, Shen H, Ding Y, Guo B. Industrial-AdaVAD: Adaptive Industrial Video Anomaly Detection Empowered by Edge Intelligence. Mathematics. 2025; 13(17):2711. https://doi.org/10.3390/math13172711
Chicago/Turabian StyleXiao, Jie, Haocheng Shen, Yasan Ding, and Bin Guo. 2025. "Industrial-AdaVAD: Adaptive Industrial Video Anomaly Detection Empowered by Edge Intelligence" Mathematics 13, no. 17: 2711. https://doi.org/10.3390/math13172711
APA StyleXiao, J., Shen, H., Ding, Y., & Guo, B. (2025). Industrial-AdaVAD: Adaptive Industrial Video Anomaly Detection Empowered by Edge Intelligence. Mathematics, 13(17), 2711. https://doi.org/10.3390/math13172711