Smart PV Monitoring and Maintenance: A Vision Transformer Approach within Urban 4.0
Abstract
:1. Introduction
- An intelligent PV monitoring and maintenance system based on the ViT model, capable of processing images and detecting anomalies. The model demonstrated superior performance in a comparative study of several CNN architectures.
- A novel real-time software application for detecting anomalies in PV panels.
2. Related Work
3. The Adopted Architecture of the PV Panel Anomalies Detection System
3.1. Context and Motivation
- Patching: Unlike CNNs, which process individual pixels or small pixel windows, ViT divides the input image into small and fixed-size patches Pn. Then, ViT converts each patch Pi into a vector Vi.
- Linear Embedding: Here linear projection is applied to transform each patch vector Vi into an embedding Ei using the following equation:
- Adding Positional Embeddings: Since the patch order is important for spatial image understanding, positional embedding Epos are added to the patch embedding to retain spatial position information. Then, Zi = Ei + Epos,i, the sequence of patch embeddings, along with positional embeddings, is formed and prepared as the input for the transformer blocks Zi (for i = 1, …, n).
- Transformer Blocks: Each transformer block consists of two main sub-layers.
- Multihead Self-Attention layer allows the model to focus on different parts of the image simultaneously and to model the relationships between patches.
- Feedforward Network (FFN) is a series of fully connected layers applied independently to each position [38].
- Classification: At the end of the transformer network, the classification token (a specific vector VClass added to the patch embeddings) is extracted and passed through one or more fully connected layers to perform the final classification task. The ViT architecture is conceptually simpler than deep CNN architectures, as it avoids complex convolutions and successive downsampling. Moreover, the model can be pre-trained on large image databases, reserved for research work, and then adapted to various computer vision tasks with excellent knowledge transfer [37].
3.2. The Adopted System Architecture
3.2.1. Image Preprocessing (I2P)
3.2.2. Input Patching
3.2.3. Feature Extraction and Selection
- The multihead attention mechanism: in ViT enables the model to handle different aspects of information at various levels of detail, capturing both local and global dependencies within an image. Long-range dependencies refer to the relationships and interactions between distant parts of an image that are not immediately adjacent to each other. This capability allows the model to understand how features in one image area might affect or be related to those in another distant area. This dynamic focus across different patches of a PV image allows ViT to consider all relevant information, ensuring a comprehensive analysis of both intricate local details and broader global interactions. This capability makes it particularly suitable for detecting all types of anomalies, perfectly aligning them with the objectives of our application.
- Class Token: A unique class token is appended to the sequence of patch embeddings during the input stage. This token accumulates and summarizes the image characteristics as it passes through the transformer layers. The class token acts as a repository of information and gathers insights from various patches. This mechanism allows the model to integrate and synthesize data from the entire image, leading to a unified representation that is essential for accurate classification.
- Output Vector of the Class Token: After processing through the final transformer block, the output vector associated with the class token encapsulates the global representation of the PV image. This vector is then used for the classification task. It demonstrates that the model does not rely on isolated patch features but integrates them into a cohesive and comprehensive representation. This integration ensures that the classification decisions are based on a holistic image understanding, capturing local details, and overarching patterns.
- Positional Encoding: Positional encodings are added to the patch embeddings before the transformer processes them to retain spatial information. This step ensures that the model knows the relative positions of the patches within the image, which is crucial for understanding the spatial context and relationships between different regions.
- FNN: Each transformer layer includes a feedforward neural network that processes the output of the multihead attention mechanism. This network helps to refine and transform the extracted features, enhancing the model’s ability to capture complex patterns and representations within the data.
- Layer Normalization and Residual Connections: Transformer layers incorporate layer normalization and residual connections, stabilizing the training process and enabling the model to learn more effectively. These components help maintain the integrity of the feature representations and support the extraction of high-quality features for classification.
3.2.4. Learning and Classification
- Feature learning: The second phase is Feature Learning, where the ViT learns to identify and classify many objects in thousands of categories. This experience allows it to develop detailed and flexible visual representations that are used as a basis for more specific classification tasks.
- Fin-tuning: Following the pretraining step Fine-Tuning [13] is used for anomaly detection. Generally, the model is particularly adjusted to detect anomalies in solar panels. This step requires replacing the model’s output layer with a new one to distinguish between various anomaly types.
- The model is then trained on a dataset composed purely of images of solar panels, each annotated according to the presence or absence of anomalies. This process refines the model’s ability to recognize the often subtle signs of panel anomalies.
3.2.5. Detection and Deployment
4. Software Application
- User Access and Initialization: As a first step (1), users access the application via the user interface. They initiate the process by loading a PV panel image.
- Image Upload: In the next step (2), the system prompts for an image upload. The user selects and uploads a PV panel image to the system database.
- Image Processing: The stored image is processed using the charged ViT model. The transformer blocks analyze the input to detect anomalies (step (3)). The analysis results are then sent back to the user interface.
- Result Display: In the last step (4), users receive a detailed report on the PV panel’s condition, including:
- Anomaly Detection: Indicates whether an anomaly is present or not.
- Type Classification: Specifies the type of PV panel (mono or poly).
- Defect Probability: Provides the probability of a detected anomaly, offering an in-depth understanding of the panel’s health.
5. Experiment and Evaluation
5.1. Electroluminescent PV Data Set
5.2. Data Analysis
5.3. Experimental Design
5.3.1. ViT Hyperparameters
5.3.2. ViT and CNNs Models Parameters
5.3.3. Performance Metrics
- TP: True positive value occurs when the model correctly predicts the positive class.
- FP: False positive value occurs when the model incorrectly predicts the positive class.
- TN: True negative value occurs when the model correctly predicts the negative class.
- FN: False negative value occurs when the model incorrectly predicts the negative class.
5.4. Results and Discussion
5.4.1. Results
5.4.2. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Musarat, M.A.; Sadiq, A.; Alaloul, W.S.; Abdul Wahab, M.M. A systematic review on enhancement in quality of life through digitalization in the construction industry. Sustainability 2022, 15, 2022. [Google Scholar] [CrossRef]
- Fathi, M.; Naderpour, M. Intelligent Urban Infrastructures: Realizing the Vision of Urban 4.0. J. Urban Technol. 2018, 25, 1–17. [Google Scholar]
- Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; Zorzi, M. Internet of Things for Smart Cities. IEEE Internet Things J. 2014, 1, 22–32. [Google Scholar] [CrossRef]
- Haight, R.; Haensch, W.; Friedman, D. Solar-powering the Internet of Things. Science 2016, 353, 124–125. [Google Scholar] [CrossRef]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Mansouri, M.; Trabelsi, M.; Nounou, H.; Nounou, M. Deep learning-based fault diagnosis of photovoltaic systems: A comprehensive review and enhancement prospects. IEEE Access 2021, 9, 126286–126306. [Google Scholar] [CrossRef]
- Haripriya, P.; Parthiban, G.; Porkodi, R. A study on CNN architecture of VGGNET and Resnet fordicom image classification. Neuro Quantology 2022, 20, 2027. [Google Scholar]
- Hashem, I.A.T.; Usmani, R.S.A.; Almutairi, M.S.; Ibrahim, A.O.; Zakari, A.; Alotaibi, F.; Alhashmi, S.M.; Chiroma, H. Urban computing for sustainable smart cities: Recent advances, taxonomy, and open research challenges. Sustainability 2023, 15, 3916. [Google Scholar] [CrossRef]
- Khalifi, H.; Riahi, S.; Cherif, W. Smart Cities and Sustainable Urban Development in Morocco. Ingénierie Des Systèmes D’information 2024, 29, 741. [Google Scholar] [CrossRef]
- Le, M.; Nguyen, D.K.; Dao, V.D.; Vu, N.H.; Vu, H.H.T. Remote anomaly detection and classification of solar photovoltaic modules based on deep neural network. Sustain. Energy Technol. Assess. 2021, 48, 101545. [Google Scholar] [CrossRef]
- Gao, Y.; Liu, J. Vision Transformers for Anomaly Detection in Solar Energy Systems. IEEE Trans. Ind. Inform. 2021, 17, 8465–8474. [Google Scholar]
- Tina, G.M.; Ventura, C.; Ferlito, S.; De Vito, S. A state-of-art-review on machine-learning based methods for PV. Appl. Sci. 2021, 11, 7550. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. (CSUR) 2021, 54, 200. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zakaria, N.; Hassim, Y.M.M. A Review Study of the Visual Geometry Group Approaches for Image Classification. J. Appl. Sci. Technol. Comput. 2024, 1, 14–28. [Google Scholar]
- Dhillon, A.; Verma, G.K. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
- Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of image classification algorithms based on convolutional neural networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
- Bibri, S.E. The eco-city and its core environmental dimension of sustainability: Green energy technologies and their integration with data-driven smart solutions. Energy Inform. 2020, 3, 4. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 15. [Google Scholar] [CrossRef]
- de Oliveira, L.G.; Aquila, G.; Balestrassi, P.P.; de Paiva, A.P.; de Queiroz, A.R.; de Oliveira Pamplona, E.; Camatta, U.P. Evaluating economic feasibility and maximization of social welfare of photovoltaic projects developed for the Brazilian northeastern coast: An attribute agreement analysis. Renew. Sustain. Energy Rev. 2020, 123, 109786. [Google Scholar] [CrossRef]
- Ebnou Abdem, S.A.; Chenal, J.; Diop, E.B.; Azmi, R.; Adraoui, M.; Tekouabou Koumetio, C.S. Using Logistic Regression to Predict Access to Essential Services: Electricity and Internet in Nouakchott, Mauritania. Sustainability 2023, 15, 16197. [Google Scholar] [CrossRef]
- Amiri, A.F.; Oudira, H.; Chouder, A.; Kichou, S. Faults detection and diagnosis of PV systems based on machine learning approach using random forest classifier. Energy Convers. Manag. 2024, 301, 118076. [Google Scholar] [CrossRef]
- Deitsch, S.; Christlein, V.; Berger, S.; Buerhop-Lutz, C.; Maier, A.; Gallwitz, F.; Riess, C. Automatic classification of defective photovoltaic module cells in electroluminescence images. Sol. Energy 2019, 185, 455–468. [Google Scholar] [CrossRef]
- De Benedetti, M.; Leonardi, F.; Messina, F.; Santoro, C.; Vasilakos, A. Anomaly detection and predictive maintenance for photovoltaic systems. Neurocomputing 2018, 310, 59–68. [Google Scholar] [CrossRef]
- Al-Mashhadani, R.; Alkawsi, G.; Baashar, Y.; Alkahtani, A.A.; Nordin, F.H.; Hashim, W.; Kiong, T.S. Deep learning methods for solar fault detection and classification: A review. Sol. Cells 2021, 11, 12. [Google Scholar]
- Akram, M.W.; Li, G.; Jin, Y.; Chen, X.; Zhu, C.; Zhao, X.; Khaliq, A.; Faheem, M.; Ahmad, A. CNN based automatic detection of photo-voltaic cell defects in electroluminescence images. Energy 2019, 189, 116319. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
- Zulfauzi, I.A.; Dahlan, N.Y.; Sintuya, H.; Setthapun, W. Anomaly detection using K-Means and long-short term memory for predictive maintenance of large-scale solar (LSS) photovoltaic plant. Energy Rep. 2023, 9, 154–158. [Google Scholar] [CrossRef]
- Maurício, J.; Domingues, I.; Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 2023, 13, 5521. [Google Scholar] [CrossRef]
- Islam, K. Recent advances in vision transformer: A survey and outlook of recent work. arXiv 2022, arXiv:2203.01536. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
- Herrmann, C.; Sargent, K.; Jiang, L.; Zabih, R.; Chang, H.; Liu, C.; Krishnan, D.; Sun, D. Pyramid adversarial training improves vit performance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13419–13429. [Google Scholar]
- Lin, Z.; Wang, H.; Li, S. Pavement anomaly detection based on transformer and self-supervised learning. Autom. Constr. 2022, 143, 104544. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects 575 in Context. In Proceedings of the European Conference on Computer Vision (ECCV) 2014, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Sharma, G.; Jadon, V.K. Classification of image with convolutional neural network and TensorFlow on CIFAR-10 dataset. In Innovations in VLSI, Signal Processing and Computational Technologies, Proceedings of the International Conference on Women Researchers in Electronics and Computing, Jalandhar, India, 21–23 April 2023; Springer Nature: Singapore, 2023; pp. 523–535. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Huo, Y.; Jin, K.; Cai, J.; Xiong, H.; Pang, J. Vision transformer (Vit)-based applications in image classification. In Proceedings of the 2023 IEEE 9th International Conference on Big Data Security on Cloud (BigDataSecurity), Proceedings of the IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS), New York, NY, USA, 6–8 May 2023; pp. 135–140. [Google Scholar]
- Bounabi, M.; Elmoutaouakil, K.; Satori, K. A new neutrosophic TF-IDF term weighting for text mining tasks: Text classification use case. Int. J. Web Inf. Syst. 2021, 17, 229–249. [Google Scholar] [CrossRef]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In Computer Vision–ECCV 2016, Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 525–542. [Google Scholar]
- Hammoudi, Y.; Idrissi, I.; Boukabous, M.; Zerguit, Y.; Bouali, H. Review on maintenance of photovoltaic systems based on deep learning and internet of things. Indones. J. Electr. Eng. Comput. Sci. 2022, 26, 1060–1072. [Google Scholar]
- Buerhop-Lutz, C.; Deitsch, S.; Maier, A.; Gallwitz, F.; Berger, S.; Doll, B.; Hauch, J.; Camus, C.; Brabec, C.J. A Benchmark for Visual Identification of Defective Solar Cells in Electroluminescence Imagery. In Proceedings of the European PV Solar Energy Conference and Exhibition (EU PVSEC), Brussels, Belgium, 24–28 September 2018. [Google Scholar] [CrossRef]
- Dobrzański, L.A.; Szczęsna, M.; Szindler, M.; Drygała, A. Electrical properties mono-and polycrystalline silicon solar cells. J. Achiev. Mater. Manuf. Eng. 2013, 59, 67–74. [Google Scholar]
- Agresti, A. Categorical Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 792. [Google Scholar]
- Bengio, Y.; Goodfellow, I. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Bounabi, M.; Moutaouakil, K.E.; Satori, K. A comparison of text classification methods using different stemming techniques. Int. J. Comput. Appl. Technol. 2019, 60, 298–306. [Google Scholar] [CrossRef]
- Abdem, S.A.E.; Azmi, R.; Diop, E.B.; Adraoui, M.; Chenal, J. Identifying determinants of waste management access in Nouakchott, Mauritania: A logistic regression model. Data Policy 2024, 6, e29. [Google Scholar] [CrossRef]
- Korovin, A.; Vasilev, A.; Egorov, F.; Saykin, D.; Terukov, E.; Shakhray, I.; Zhukov, L.; Budennyy, S. Anomaly detection in electroluminescence images of heterojunction solar cells. Sol. Energy 2023, 259, 130–136. [Google Scholar] [CrossRef]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 1995, 14, 1137–1145. [Google Scholar]
- Almadhor, A. Proactive Monitoring, Anomaly Detection, and Forecasting of Solar Photovoltaic Systems Using Artificial Neural Networks. Ph.D. Dissertation, Daniel Felix Ritchie School of Engineering and Computer Science, Electrical and Computer Engineering, Denver, CO, USA, 2019. [Google Scholar]
Hyperparameter | Value | Description |
---|---|---|
Number of Epochs | 15 | Number of times the entire dataset passes through the model. |
Batch Size | 32 | Number of samples per gradient update. |
Learning Rate | 0.001 | Step size for updating the model weights. |
Optimizer | Adam | Algorithm to adjust weights during training |
Loss Function | Cross-Entropy | Measures the difference between the predicted and true labels. |
Dropout Rate | 0.1 | Fraction of units to drop during training to prevent overfitting. |
Weight Decay | 0.01 | Regularization term added to the loss function. |
Model | Parameters (Million) | FLOPs | Depth (Layers) | Comments |
---|---|---|---|---|
ViT | ~86 M
| ~17.6 GFLOPs | 12 | Very powerful for capturing global dependencies but highly resource-intensive [5]. |
AlexNet | ~60 M | ~725 MFLOPs | 8 | Less complex, suitable for simpler architectures [7]. |
VGG | ~138 M | ~15.5 GFLOPs | 16 | Very complex in terms of parameters, with a high computational cost [7]. |
ResNet | ~25.5 M | ~4.1 GFLOPs | 50 | Good depth with increased efficiency thanks to residual blocks [5]. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bounabi, M.; Azmi, R.; Chenal, J.; Diop, E.B.; Ebnou Abdem, S.A.; Adraoui, M.; Hlal, M.; Serbouti, I. Smart PV Monitoring and Maintenance: A Vision Transformer Approach within Urban 4.0. Technologies 2024, 12, 192. https://doi.org/10.3390/technologies12100192
Bounabi M, Azmi R, Chenal J, Diop EB, Ebnou Abdem SA, Adraoui M, Hlal M, Serbouti I. Smart PV Monitoring and Maintenance: A Vision Transformer Approach within Urban 4.0. Technologies. 2024; 12(10):192. https://doi.org/10.3390/technologies12100192
Chicago/Turabian StyleBounabi, Mariem, Rida Azmi, Jérôme Chenal, El Bachir Diop, Seyid Abdellahi Ebnou Abdem, Meriem Adraoui, Mohammed Hlal, and Imane Serbouti. 2024. "Smart PV Monitoring and Maintenance: A Vision Transformer Approach within Urban 4.0" Technologies 12, no. 10: 192. https://doi.org/10.3390/technologies12100192
APA StyleBounabi, M., Azmi, R., Chenal, J., Diop, E. B., Ebnou Abdem, S. A., Adraoui, M., Hlal, M., & Serbouti, I. (2024). Smart PV Monitoring and Maintenance: A Vision Transformer Approach within Urban 4.0. Technologies, 12(10), 192. https://doi.org/10.3390/technologies12100192