1. Introduction
Photovoltaic modules play a crucial role in photovoltaic energy systems, which are part of ongoing efforts to transition toward renewable energy resources. This transition aims to minimize carbon dioxide emissions and mitigate their detrimental effects [
1,
2]. The International Renewable Energy Agency (IRENA) has taken a firm stance on renewable energy, leading to a global investment of USD 282 billion in the renewable energy sector as of 2019 [
3]. This growing momentum toward renewable resources has significantly increased the demand for solar photovoltaic (PV) systems compared to other energy generation systems. However, solar cells may exhibit various defects and shortcomings that can affect the overall energy efficiency of the photovoltaic energy system. Consequently, there is a need to investigate solar cells, starting with their manufacturing phase and conducting inspections throughout their lifespan. Given the emerging trends in energy systems, it is essential to establish a viable and robust assessment mechanism for solar photovoltaic (PV) modules to ensure the anticipated energy harvesting through solar PV energy systems.
Solar PV modules are typically designed with protective measures to withstand different weather conditions and ensure resilience against environmental elements. The front side of the modules is shielded by tempered glass, providing resistance to the stresses and intensities of environmental factors. To safeguard against temperature variations, humidity, and corrosion resulting from water contamination, ethylene vinyl acetate is used as an encapsulation agent [
4]. Additionally, a backsheet is incorporated as an additional component to provide mechanical stability, further protection against environmental elements, and insulation for the PV modules [
5]. However, despite the presence of these supplementary protective components, multiple defects can occur in the modules over their lifespan. Weather conditions and mechanical damage can contribute to surface defects, while artifacts may also arise during the manufacturing phase [
6]. The dynamic temperature and irradiance affect certain parameters of photovoltaic modules, acting as obstacles in estimating these parameters, as the overall throughput of a photovoltaic system hinges on their accurate assessment [
7]. The authors propose the L-SHADE and L-SHADED techniques, with the latter method focusing on dimensionality reduction. Following this phase, they employ a linear population size reduction-based success history adaptation differential evolution (L-SHADE) method, consequently determining the unknown parameters. PV modules, such as multi-crystalline KC200GT and mono-crystalline SM55, are utilized under varying temperature and irradiance conditions to identify unknown parameters such as the photo-generated current, series resistance, and diode reverse saturation current. Consequently, inspecting and assessing the condition of PV modules become critical in solar energy systems.
Photovoltaic modules are designed to endure approximately 25 years of continuous exposure to challenging environmental conditions [
8]. Manual visual inspection of solar PV modules is a laborious task, and even the scrutiny of a specialist may not be effective, as numerous defects are not visible to the naked eye. In the field of imaging, infrared (IR) cameras can be employed for PV module inspection using infrared imaging. Faulty solar cells that fail to convert solar energy into electrical output emit heat, which can be detected using infrared imaging techniques [
9]. However, it is important to note that certain micro-defects may not be captured by infrared imaging, and the limited resolution of infrared cameras poses additional limitations when considering this approach for inspection [
10].
Electroluminescence (EL) imaging is considered a preferable alternative to infrared (IR) imaging due to its ability to provide high-quality images capable of capturing micro-defects in solar cells. EL images are obtained by capturing emissions induced at a specific wavelength of 1150 nm using a silicon charge-coupled device (CCD) sensor [
11]. In these grayscale EL images, micro-cracks are revealed as dark-gray areas where micro-defects occur [
12]. However, visual inspection of EL images is a time-consuming and tedious task, even when performed by an expert. Given the growing trends in renewable energy systems, solar energy plays a major role in this domain. Manual visual assessment is impractical for solar PV energy systems, where it is necessary to inspect thousands of PV modules throughout their lifespan. Considering these factors, this manuscript proposes an autonomous assessment method utilizing EL images to detect various defects and features in solar PV modules.
In this manuscript, we introduce a novel deep learning-based framework for the automatic segmentation of 24 defects and features in EL images of solar photovoltaic (PV) modules. Our proposed framework aims to achieve efficient EL image segmentation by utilizing a minimal number of model parameters, resulting in a lightweight system.
Figure 1 depicts different EL images of solar PV modules, showcasing the presence of multiple co-occurring defects and features. The contributions of this research work are as follows:
Development of a deep learning-based framework for the automatic segmentation of 24 defects and features in EL images of solar PV modules.
Emphasis on a lightweight segmentation system by optimizing the number of model parameters.
Due to the coexistence of multiple defects and features in an image, various micro-defects occupy a trivial number of pixels in an image, consequently causing imbalanced classes. Three different loss functions are utilized by employing custom class weights. This comparison study aids in determining the most efficient loss function for the appropriate segmentation of 24 unique classes present in EL imagery.
These contributions emphasize the novelty and importance of the proposed SEiPV-Net (Segmentation Network for Electroluminescence Images of Solar Photovoltaic Modules) framework in achieving accurate and efficient automatic segmentation of EL images of solar PV modules while taking into account the challenges posed by imbalanced classes and the presence of multiple defects and features in the images.
2. Related Works
Electroluminescence (EL) imaging, first experimented with in 2005, has been widely used to capture the degradation patterns of silicon solar cells [
13]. In a study by Fuyuki et al. [
14], EL imaging was deployed to determine the size of crystalline silicon solar cells and identify cracks and defects, which appeared as darker regions. The study successfully identified inadequate areas in the crystalline silicon solar cells using EL imaging. In another work by Deitsch et al. [
15], various defects in mono-crystalline and poly-crystalline PV modules were detected by manually extracting features and employing a Support Vector Machine (SVM) classifier. To further improve defect classification, convolutional neural networks (CNNs) were employed using a dataset of 1968 solar cell images extracted from EL images of PV modules. The CNNs achieved higher classification accuracy compared to the SVM classifier, although the CNNs tended to be more resource-intensive in terms of hardware complexity. Shujaat et al. [
16] exploited the practicality of CNNs to identify the promoters responsible for carrying out the transcription of genes.
Similarly, in a comparative study, Karimi et al. [
17] evaluated the performance of CNNs against machine learning-based SVM and random forest classifiers. The objective of the study was to categorize solar cell images acquired from EL imaging of PV modules into three defined categories. The framework also incorporated data augmentation techniques to increase the overall dataset size, which consisted of 5400 solar cell images.
In a study by Tsai et al. [
18], a Fourier image reconstruction approach was employed to identify defective solar cells using EL images obtained from poly-crystalline PV modules. The method presented in the study provided a reliable means to ascertain the presence of defects in solar cells. Furthermore, in a separate investigation, Anwar et al. [
19] combined image segmentation techniques with anisotropic diffuse filtering to detect micro-cracks in solar cells. The study utilized a dataset of 600 EL images and demonstrated the effectiveness of the proposed approach in accurately discerning micro-cracks.
Deep learning has emerged as a potent method for autonomous decision making in several domains, allowing for accurate and fast object and region-of-interest identification [
20,
21]. In the field of biomedical image segmentation, the U-Net architecture, proposed in [
22], has become a widely adopted model due to its effectiveness. It is made up of an encoder–decoder structure that forms a U-shaped network with a contracting and expanding route. Several U-Net design versions have been developed for specialized segmentation tasks in biomedical imaging [
23,
24]. These include Attention U-Net [
25], Dilated Inception U-Net [
26], Unet++ [
27], SegR-Net [
28], RAAGR2-Net [
21], and R2U-Net [
29]. These variants try to improve the U-Net model’s performance in certain scenarios. DeepLabv3+ is another well-known semantic segmentation framework, which expands the DeepLabv3 paradigm [
30] by combining encoder–decoder blocks with Xception and ResNet-101 as the network backbones [
31]. This architecture was created specifically for semantic segmentation tasks and has performed well. In biomedical imaging, dual-encoder- and dual-decoder-based DL frameworks are employed for aiding in the diagnosis of colorectal cancer through polyp segmentation from colonoscopy images [
32]. In another work, Wooram et al. [
33] adopted a neural network based on convolutional operations, along with atrous spatial pyramid pooling and separable convolutions integrated with a decoder module, to carry out the real-time segmentation of external cracks in structures.
The literature includes a variety of deep learning approaches aimed at classifying cells and modules into a wide range of flaws while accounting for the varied severity of these problems in solar PV modules [
34,
35]. In one work, Rahman et al. [
36] adapted the U-Net architecture to identify flaws in poly-crystalline solar cells using EL pictures. The study proved the modified U-Net’s accuracy in segmenting and finding flaws in solar cells. Deqiang et al. [
37] proposed a U-Net-based framework for single image super-resolution (SISR), with the sole purpose of image reconstruction from bristly textures to finer details. The approach is termed anti-illumination, as it effectively subdues the noise in images while captivating the illuminance details.
Another paper [
38] by Chen et al. reported the segmentation of fractures in EL pictures of multi-crystalline solar cells. The study presented a unique approach for identifying and isolating fractures in photographs, which helped to characterize the faults. In order to extract global features, Ruixuan et al. [
39] introduced a transformer-based architecture, LF-DET, for light-field spatial super-resolution. The model comprises two subsets. The first part introduces convolutional layers before self-attention modules, consequently obtaining global features. Further, in the second part, feature representations from macro areas are obtained at various levels using angular modeling across multiple scales. Furthermore, Pratt et al. [
40] used EL pictures of PV modules to detect flaws in mono-crystalline and multi-crystalline silicon solar cells through a U-Net-based image semantic segmentation framework. The suggested framework yielded encouraging results in detecting and segmenting faults in solar cells. For crack segmentation, Young et al. [
41] proposed a convolutional neural network-based DL model, comparing it to image processing edge detection approaches such as Canny and Sobel detection. Another work by Young et al. [
42] reported the adaption of a faster region-based convolutional neural network (Faster R-CNN) for automatically detecting five different types of damage in the visual probing of structures.
We present a unique lightweight framework for the semantic segmentation of solar cells using EL imagery. While deep learning techniques have yielded promising results in a variety of segmentation problems, current models frequently face constraints such as excessive computational complexity and poor performance on unbalanced classes. To address these constraints, our proposed system employs a lightweight architecture built specifically for solar cell segmentation. We intend to achieve strong segmentation precision and accuracy for a total of 24 separate classes, covering a wide range of faults and characteristics. The dataset for training and assessment was acquired from [
40,
43] and provides a wide variety of samples for a thorough examination. The following sections will describe the approach, experimental setup, and outcomes of our established study.
3. Proposed Network Architecture
The proposed network architecture is presented in
Figure 2. The two inherent components—the model’s encoding and decoding modules—are depicted in the figure. The contracting part comprises a DSF (Dense and Successive Features) block as the primary component. The CCEAF block is an essential component of the expanding part of the network. In the skip connections, a pivotal block, known as HFPE, is employed along with an attention gate [
44] as an inbuilt module. Details of the attributes of each block in the network are presented in the following subsections.
The encoder and decoder modules comprise a DSF block and a CCEAF block, respectively. The encoding part consists of consecutive DSF blocks, where each block has an adjacent max-pooling layer. The DSF block followed by the max-pooling layer extracts high-level features from the input EL image to the point where low-level features are obtained by employing successive pairs of a DSF block and a max-pooling layer. The skip connections in the network are positioned to provide a fusion of feature maps from the encoder to the decoder of the network. An HFPE block is used to process each skip connection from the encoder. The attention gate block is followed by an HFPE block, and the resulting feature representations are provided to the decoder module. The decoder module comprises a CCEAF block as the primary unit. It consists of four consecutive layers of up-sampling blocks and CCEAF blocks. Up-sampled features, along with the outputs of the HFPE block in each layer of the encoder, are processed by deploying an attention gate to intensify crucial features for segmentation precision and localization. The output of the attention gate block is combined with the up-sampled feature maps for all layers of the decoder. The resulting combined output is utilized as input features for the CCEAF blocks in the decoder module. In the last section, a final 3 × 3 convolutional layer is employed for precise and effective semantic segmentation of 24 various classes.
3.1. Dense and Successive Features (DSF) Block
The DSF block is depicted in
Figure 3. The encoding part comprises a DSF block, a vital component for obtaining rich feature attributes from EL Images. It consists of four convolutional (conv) layers with dimensions of
,
,
, and
, which are connected in a dense and successive pattern. The input image is given to the
conv layer and further, the
conv layer is connected successively to the preceding layer. Followed by the concatenation of feature maps with the input image, consecutive
and
conv layers are employed to yield the final output of the DSF block. The DSF block is specifically utilized to acquire a set of divergent features rather than highly correlating features, which is usually the case in a conventional U-Net [
22] model. Previous feature maps are availed and utilized in the successive convolutional layers through concatenation operations, which allows the encoding module in the network to limit the probability of vanishing gradients.
3.2. Hierarchical Feature Precision and Extraction (HFPE) Block
The HFPE block is illustrated in
Figure 4. High- to low-level feature attributes are successively computed by the DSF block in the encoder module. The HFPE block comprises four convolutional layers with dimensions of
,
,
, and
. Each conv layer is connected to the previous layer in a serial pattern. Each layer is responsible for obtaining feature representations based on the kernel size of the respective convolutional layer. The first
convolutional layer takes in the input and learns the abstract feature attributes while the other
conv layer follows the previous layer. In this manner, the output is obtained through a consecutive combination of
and
conv layers. Consequently, feature maps are extracted hierarchically in such a way that shallow features are extracted in ascending order, leading to the attainment of more in-depth features. Further, the HFPE block is, in part, responsible for generating more precise feature representations.
3.3. Attention Gate Block
Figure 5 illustrates the attention gate block. In image segmentation, particularly in this case, where the total number of classes is 24, the accurate segmentation of micro-defects in EL images is a complex task. Due to class imbalances, various micro-defects occupy narrow pixel regions in an image. To effectively localize micro-defects such as cracks, gridlines, and inactive areas, there is a need to focus only on particular regions in the feature maps. The attention gate (Ag) block is employed after the HFPE block to process the feature attributes and pinpoint only the relative regions of interest (ROI) before propagating the features to the decoder module. In this manner, the principal features are extracted by repressing the impertinent features, which leads to effective and precise segmentation localization. In [
45,
46,
47], attention mechanisms were utilized to improve the segmentation performance of remote sensing and biomedical images. In a similar manner, Dong et al. [
48] and Rahmat et al. [
49] reported the utility of attention-based neural networks for crack segmentation by employing an encoder–decoder-based network and a generative adversarial network.
3.4. Contextual Characteristics Extraction and Attribute Fusion Block (CCEAF)
The CCEAF block is depicted in
Figure 6. It is exploited to generate segmentation masks for EL images with various micro-defects such as cracks, inactive areas, and gridlines. It comprises dilated or atrous convolutional layers along with global pooling and average pooling layers. Three dilated conv layers are employed with dilation rates of 1, 3, and 6. Two average pooling layers are utilized with pool sizes set to 2 and 4 along with the global average pooling layer. Input features are provided in a parallel constellation to the four different layers of the CCEAF block. Two dilated conv layers are successively combined with dilation rates of 3 and 1, respectively. The outputs from all the layers are concatenated, where the respective layers include two average pooling layers with pool sizes of 2 and 4, a global pooling layer, a dilated conv layer with a dilation rate of 6, and two combined dilated conv layers. The CCEAF block is employed to understand the contextual representations of the feature maps. The total number of classes is 24, which includes micro-defects such as cracks and gridlines that spread across a very narrow range of image pixels. For this reason, understanding the characteristics of the global and local contexts is necessary. Dilated convolutions are crucial to capture comprehensive details in the feature maps by expanding the receptive region, thus enabling the network to gather context information over a wider region. This combined effect of average pooling, global pooling, and dilated convolutions consequently improves the overall segmentation performance.
4. Dataset and Materials
The dataset for this research work was obtained from [
43] and comprises 24 different classes. It is further subdivided into 12 generalized innate features of solar PV modules and 12 defects. The dataset comprises images displaying these 24 classes (defects and inbuilt features), which are specifically highlighted during the dataset preparation phase. These classes correspond to various defects that can occur during the lifespan of a solar PV module. The size of an EL image of a solar cell is
pixels. A total of 593 EL images were utilized, with an almost equal number of images for multi-crystalline and mono-crystalline solar wafers [
43]. The dataset consists of 1912 images for training purposes and 54 images for model validation, comprising 896 mono-crystalline solar cell images and 1016 multi-crystalline solar cell images. For testing the model, a set of 50 images was utilized, which comprised an equal number of multi- and mono-crystalline solar wafers. The details of the dataset, including the total number of images, image size, total number of classes, and the training images split, are provided in
Table 1. Defects such as cracks, inactive areas, and gridlines are frequent in testing images, consequently allowing for the evaluation of the model’s robustness against various micro-defects.
Class Weights
For effective model training, a crucial parameter involves choosing the weights for all classes in the dataset. Weight assignment refers to the critical emphasis given to a certain class, which is based on the numerical value that is given to the respective class during the training phase of the model. Two types of training strategies were utilized. The first one involved assigning equal class weights to all classes. A weight of 0.25 was experimentally assigned to all 24 classes in the first training phase. The other training strategy involved selecting custom class weights for classes such as cracks, inactive areas, gridlines, and ribbons. Certain defects were given higher priority over others. A value of 0.45 was assigned to cracks due to their very small pixel range in images, as these micro-defects occupy very narrow regions in EL images. The model was trained using the second training strategy by focusing on certain micro-defects based on their weight assignments.
Table 2 illustrates the class-weight assignments for two categories, i.e., equal class weights and custom class weights.
7. Conclusions
We present SEiPV-Net, a novel encoder–decoder-based network for segmenting EL Images of solar PV modules. Our method handles the segmentation of 24 separate classes, taking into account both intrinsic characteristics and picture flaws. Dense and Successive Features (DSF), Hierarchical Feature Precision and Extraction (HFPE), Contextual Characteristics Extraction and Attribute Fusion (CCEAF), and attention gate blocks are the four main components of SEiPV-Net. We use two distinct model training procedures to evaluate the performance of SEiPV-Net: equal class-weight and bespoke class-weight assignments. Qualitative and quantitative comparative analyses are carried out between SEiPV-Net, DeepLabv3+, U-Net, and PSP-Net. We investigate the use of three alternative loss functions during training and testing: the weighted cross-entropy (WCE) loss, weighted squared Dice loss (WSDL), and weighted Tanimoto loss (WTL). An ablation study is also carried out to better understand the role of each individual block in the encoder and decoder modules. The model parameters are also analyzed and compared to the aforementioned state-of-the-art methodologies, emphasizing SEiPV-Net’s lightweight nature. This thorough study sheds light on the efficacy of our proposed framework, emphasizing its potential for precise and efficient segmentation of EL imagery of solar PV modules.