Swin Transformer-Based Real-Time Multi-Tasking Image Detection in Industrial Automation Production Environments
Abstract
1. Introduction
- Propose MSTUnet, a hybrid model integrating anomaly simulation, Swin Transformer-based complex atom network, and discriminative sub-network, to address real-time multi-task image detection in industrial environments.
- Demonstrate that Swin Transformer’s cross-window attention mechanism outperforms traditional CNN/GAN-based methods in both image defogging (PSNR = 23.49, SSIM = 0.9195) and defect detection (99.21% average recognition rate for PE pipe defects).
- Develop a B/S architecture-based industrial detection system that improves product excellence rate by 12.12% compared to traditional manual monitoring, with verified practical applicability.
2. Method
2.1. Real-Time Multi-Task Image Detection Model Construction
2.1.1. Exception Generation and Mask Generation Modules
2.1.2. Swin Transformer-Based Complex Atom Networks
2.1.3. Discriminative Sub-Networks
2.1.4. Loss Function
2.2. Industrial Automation Production Anomaly Detection System Design
- (1)
- In the display layer, the user accesses through the browser, and the front-end UI design framework Vuetify3 provides the user with an interactive and friendly interface. The front-end framework Vue3 realizes declarative and responsive design, which simplifies the development process and reduces the complexity of the code while realizing more efficient and flexible front-end and back-end interaction logic. The Axios tool library makes use of the HTTP protocols of POST and Axios tool library encapsulates the HTTP protocol POST and GET and other network request methods to achieve efficient asynchronous communication between the browser and the server.
- (2)
- In the business layer, functional modules such as user rights, image detection, video detection, and record management are designed in which the user rights function provides functional services for user registration and login. The image detection function provides the uploading of pictures to be detected and the image defect detection function, which realizes the accurate and fast detection of industrial product defects.
- (3)
- In the basic layer, the computing framework uses the deep learning framework Pytorch to simplify the development of detection algorithms, and at the same time, CUDA and cuDNN are used to drive and accelerate GPU computation, and the back-end adopts the Python3.8-based web framework Flask3, which meets the system’s demand for convenient development and flexible customization with its lightweight and extensible features; at the same time, Redis is used as the lightweight messaging framework. Redis as a lightweight message queue to ensure the high concurrency performance of the system, and Celery is used to perform asynchronous task processing to improve the response speed of the system.
- (4)
- In the data layer, MySQL, a relational database, is used as the underlying database to ensure data security, stability, and ease of use. Thus, a high-performance, easy-to-use, and scalable real-time inspection system for industrial product images is realized.
3. Results and Discussion
3.1. Real-Time Multi-Task Image Detection Model Simulation Test
3.1.1. Image Defogging Effect Analysis
- (1)
- Dataset
- (2)
- Image quality evaluation index
- (3)
- Model performance effect analysis
3.1.2. Performance Analysis of Industrial Product Defect Detection
- (1)
- Detection model evaluation index
- (2)
- Experimental data analysis
3.2. Analysis of Industrial Production Inspection System Application Practice
3.2.1. Detection of Data Visualization Display Effect
3.2.2. Long-Term Operational Effectiveness of the System
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Abd Al Rahman, M.; Mousavi, A. A review and analysis of automatic optical inspection and quality monitoring methods in electronics industry. IEEE Access 2020, 8, 183192–183271. [Google Scholar] [CrossRef]
- Rai, R.; Tiwari, M.K.; Ivanov, D.; Dolgui, A. Machine learning in manufacturing and industry 4.0 applications. Int. J. Prod. Res. 2021, 59, 4773–4778. [Google Scholar] [CrossRef]
- Wang, H.; Li, C.; Li, Y.F.; Tsung, F. An Intelligent Industrial Visual Monitoring and Maintenance Framework Empowered by Large-Scale Visual and Language Models. IEEE Trans. Ind. Cyber-Phys. Syst. 2024, 2, 166–175. [Google Scholar] [CrossRef]
- McCann, R.; Obeidi, M.A.; Hughes, C.; McCarthy, É.; Egan, D.S.; Vijayaraghavan, R.K.; Joshi, A.M.; Garzon, V.A.; Dowling, D.P.; McNally, P.J. In-situ sensing, process monitoring and machine control in Laser Powder Bed Fusion: A review. Addit. Manuf. 2021, 45, 102058. [Google Scholar] [CrossRef]
- Jain, S.; Chandrasekaran, K. Industrial automation using internet of things. In Security and Privacy Issues in Sensor Networks and IoT; IGI Global: Hershey, PA, USA, 2020; pp. 28–64. [Google Scholar]
- Kumar, B.S.; Ramalingam, S.; Divya, V.; Amruthavarshini, S.; Dhivyashree, S. LoRa-IoT based industrial automation motor speed control monitoring system. In Proceedings of the 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 5–7 January 2023; pp. 11–15. [Google Scholar]
- Javaid, M.; Haleem, A.; Singh, R.P.; Rab, S.; Suman, R. Significance of sensors for industry 4.0: Roles, capabilities, and applications. Sens. Int. 2021, 2, 100110. [Google Scholar] [CrossRef]
- Scime, L.; Beuth, J. Anomaly detection and classification in a laser powder bed additive manufacturing process using a trained computer vision algorithm. Addit. Manuf. 2018, 19, 114–126. [Google Scholar] [CrossRef]
- Villalba-Diez, J.; Schmidt, D.; Gevers, R.; Ordieres-Meré, J.; Buchwitz, M.; Wellbrock, W. Deep learning for industrial computer vision quality control in the printing industry 4.0. Sensors 2019, 19, 3987. [Google Scholar] [CrossRef] [PubMed]
- Alcácer, V.; Cruz-Machado, V. Scanning the industry 4.0: A literature review on technologies for manufacturing systems. Eng. Sci. Technol. Int. J. 2019, 22, 899–919. [Google Scholar] [CrossRef]
- Xia, C.; Pan, Z.; Polden, J.; Li, H.; Xu, Y.; Chen, S.; Zhang, Y. A review on wire arc additive manufacturing: Monitoring, control and a framework of automated system. J. Manuf. Syst. 2020, 57, 31–45. [Google Scholar] [CrossRef]
- Misra, N.N.; Dixit, Y.; Al-Mallahi, A.; Bhullar, M.S.; Upadhyay, R.; Martynenko, A. IoT, big data, and artificial intelligence in agriculture and food industry. IEEE Internet Things J. 2020, 9, 6305–6324. [Google Scholar] [CrossRef]
- Scott, D.M.; McCann, H. Process Imaging for Automatic Control; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Uhlemann, T.H.J.; Lehmann, C.; Steinhilper, R. The digital twin: Realizing the cyber-physical production system for industry 4.0. Procedia CIRP 2017, 61, 335–340. [Google Scholar] [CrossRef]
- Zhou, X.; Xu, X.; Liang, W.; Zeng, Z.; Shimizu, S.; Yang, L.T.; Jin, Q. Intelligent small object detection for digital twin in smart manufacturing with industrial cyber-physical systems. IEEE Trans. Ind. Inform. 2021, 18, 1377–1386. [Google Scholar] [CrossRef]
- Chamara, N.; Islam, M.D.; Bai, G.F.; Shi, Y.; Ge, Y. Ag-IoT for crop and environment monitoring: Past, present, and future. Agric. Syst. 2022, 203, 103497. [Google Scholar] [CrossRef]
- Delli, U.; Chang, S. Automated process monitoring in 3D printing using supervised machine learning. Procedia Manuf. 2018, 26, 865–870. [Google Scholar] [CrossRef]
- Yang, J.; Wang, C.; Jiang, B.; Song, H.; Meng, Q. Visual perception enabled industry intelligence: State of the art, challenges and prospects. IEEE Trans. Ind. Inform. 2020, 17, 2204–2219. [Google Scholar] [CrossRef]
- Gehrmann, C.; Gunnarsson, M. A digital twin based industrial automation and control system security architecture. IEEE Trans. Ind. Inform. 2019, 16, 669–680. [Google Scholar] [CrossRef]
- de Souza Cardoso, L.F.; Mariano, F.C.M.Q.; Zorzal, E.R. A survey of industrial augmented reality. Comput. Ind. Eng. 2020, 139, 106159. [Google Scholar] [CrossRef]
- Alsakar, Y.M.; Elazab, N.; Nader, N.; Mohamed, W.; Ezzat, M.; Elmogy, M. Multi-label dental disorder diagnosis based on MobileNetV2 and swin transformer using bagging ensemble classifier. Sci. Rep. 2024, 14, 25193. [Google Scholar] [CrossRef] [PubMed]
- Sun, J.; Zheng, H.; Diao, W.; Sun, Z.; Qi, Z.; Wang, X. Prototype-Optimized unsupervised domain adaptation via dynamic Transformer encoder for sensor drift compensation in electronic nose systems. Expert Syst. Appl. 2025, 263, 125444. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; IEEE Press: Seoul, Republic of Korea, 2021; pp. 10012–10022. [Google Scholar]
- Guo, C.; Chen, X.; Chen, Y.; Yu, C. Multi-Stage Attentive Network for Motion Deblurring via Binary Cross-Entropy Loss. Entropy 2022, 24, 1414. [Google Scholar] [CrossRef]
- Golts, A.; Freedman, D.; Elad, M. Unsupervised single image dehazing using dark channel prior loss. IEEE Trans. Image Process. 2019, 29, 2692–2701. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Xu, Z.; Luo, J. Towards perceptual image dehazing by physics-based disentanglement and adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 7485–7492. [Google Scholar]
- Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]





| Method | PSNR | SSIM | FADE | NIQE | 
|---|---|---|---|---|
| DCP [25] | 15.48 | 0.7562 | 0.6859 | 5.0712 | 
| DisentGAN [26] | 16.69 | 0.7891 | 0.6604 | 4.2812 | 
| DehazeNet [27] | 16.98 | 0.7915 | 0.3512 | 4.1661 | 
| GridDehazeNet [28] | 20.82 | 0.8805 | 0.3597 | 3.9465 | 
| Ours | 23.49 | 0.9195 | 0.3254 | 3.7593 | 
| Type | Sample Size | Correct Classification | Misclassification | Recognition Rate | 
|---|---|---|---|---|
| Inner scratch | 1500 | 1493 | 7 | 99.53% | 
| External hole | 1500 | 1482 | 18 | 98.80% | 
| Surface stain | 1500 | 1490 | 10 | 99.33% | 
| Surface pit | 1500 | 1489 | 11 | 99.27% | 
| Thickness inequality | 1500 | 1482 | 18 | 98.80% | 
| Surface ripple | 1500 | 1493 | 7 | 99.53% | 
| Production Line | Production Date | Total Production | Excellent Products Rate | Qualified Product Rate | Unchecked Product Rate | 
|---|---|---|---|---|---|
| Production line A | 11.16 | 1415 | 5.73% | 53.40% | 46.60% | 
| 11.17 | 1218 | 5.83% | 70.60% | 29.40% | |
| 11.18 | 1108 | 0.17% | 53.44% | 46.56% | |
| 11.19 | 1190 | 0.55% | 78.43% | 21.57% | |
| 11.20 | 1007 | 2.76% | 77.22% | 22.78% | |
| 11.21 | 1288 | 2.08% | 69.23% | 30.77% | |
| 11.22 | 916 | 1.95% | 66.10% | 33.90% | |
| 11.23 | 1144 | 1.35% | 70.86% | 29.14% | |
| 11.24 | 1461 | 5.74% | 67.20% | 32.80% | |
| 11.25 | 1107 | 3.72% | 72.64% | 27.36% | |
| 11.26 | 1464 | 3.34% | 56.24% | 43.76% | |
| 11.27 | 1308 | 4.15% | 57.35% | 42.65% | |
| 11.28 | 1264 | 1.72% | 70.20% | 29.80% | |
| 11.29 | 1072 | 4.90% | 54.52% | 45.48% | |
| 11.30 | 1043 | 4.02% | 66.73% | 33.27% | |
| Average | 1200 | 3.20% | 65.61% | 34.39% | 
| Production Line | Production Date | Total Production | Excellent Products Rate | Qualified Product Rate | Unchecked Product Rate | 
|---|---|---|---|---|---|
| Production line B | 11.16 | 1003 | 19.44% | 83.54% | 16.46% | 
| 11.17 | 1296 | 13.66% | 97.19% | 2.81% | |
| 11.18 | 1236 | 11.17% | 86.11% | 13.89% | |
| 11.19 | 1261 | 16.77% | 92.63% | 7.37% | |
| 11.20 | 918 | 15.11% | 94.22% | 5.78% | |
| 11.21 | 1132 | 11.32% | 80.36% | 19.64% | |
| 11.22 | 1108 | 15.29% | 91.49% | 8.51% | |
| 11.23 | 1080 | 16.53% | 83.35% | 16.65% | |
| 11.24 | 1061 | 16.89% | 95.87% | 4.13% | |
| 11.25 | 1044 | 18.63% | 96.35% | 3.65% | |
| 11.26 | 986 | 18.82% | 96.42% | 3.58% | |
| 11.27 | 1296 | 16.59% | 96.73% | 3.27% | |
| 11.28 | 1185 | 11.31% | 98.38% | 1.62% | |
| 11.29 | 920 | 17.30% | 88.85% | 11.15% | |
| 11.30 | 1435 | 11.02% | 89.49% | 10.51% | |
| Average | 1131 | 15.32% | 91.40% | 8.60% | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; He, W.; Lan, A. Swin Transformer-Based Real-Time Multi-Tasking Image Detection in Industrial Automation Production Environments. Machines 2025, 13, 972. https://doi.org/10.3390/machines13100972
Li H, He W, Lan A. Swin Transformer-Based Real-Time Multi-Tasking Image Detection in Industrial Automation Production Environments. Machines. 2025; 13(10):972. https://doi.org/10.3390/machines13100972
Chicago/Turabian StyleLi, Haoxuan, Wei He, and Anran Lan. 2025. "Swin Transformer-Based Real-Time Multi-Tasking Image Detection in Industrial Automation Production Environments" Machines 13, no. 10: 972. https://doi.org/10.3390/machines13100972
APA StyleLi, H., He, W., & Lan, A. (2025). Swin Transformer-Based Real-Time Multi-Tasking Image Detection in Industrial Automation Production Environments. Machines, 13(10), 972. https://doi.org/10.3390/machines13100972
 
        

 
       