1. Introduction
Quality inspection applications in industry are becoming very important. It is a requirement to move towards a zero-defect manufacturing scenario, with unitary non-destructive inspection and traceability of produced parts. This is one of the applications where image analysis with deep learning (DL) methods is showing its full potential. DL has proven to greatly improve the results of solutions obtained using traditional vision techniques, regarding precision, robustness, and flexibility. These improvements allow models to be adapted to incorporate new features of interest, transfer learned models between different domains, and to speed-up the design and development of models for new tasks.
However, the quality inspection environment in the industry has peculiarities that must be taken into account when applying DL-based solutions. Thus, it is not easy to generate large enough datasets with representative images of the different characteristics of interest. Manual labeling of each of the examples must be carried out, which is usually an arduous task that takes large amounts of time and resources. Furthermore, the scarcity of available examples, and the fact that the images of industrial products manufactured successively are very similar to each other, can make the DL algorithm prone to overfitting. Finally, in new applications or industrial processes, there are no defective data samples from the beginning, so it would be necessary to wait a long time to be able to have DL models capable of identifying the faults that may appear.
Thus, this work presents a methodology to deal with these peculiarities. This methodology should work as a guide towards robust classification and segmentation models, giving rise to fault detection models that are able to detect anomalous feature patterns from the start-up of a new line. The methodology will make use of an anomaly detection model which allows anomalous patterns to be detected in the produced parts, and in addition, the detected anomalies will serve as automatic annotations making the labeling of the images much faster. The work will be tailored to the solar cell manufacturing industry; however, it could be extrapolated to different domains.
In the last decade, about 2.6 trillion dollars have been invested in renewable energies, half of it in solar energy, with the objective of developing efficient alternatives to traditional energy sources, such as oil or gas [
1]. The development of the technology has reduced solar electricity generation cost per kilowatt-hour by 81%. This cost reduction has turned solar energy into an attractive source of energy for electricity production, increasing the installation of Photovoltaic (PV) cells by 36.8% between 2010 and 2018 [
1]. This investment trend is expected to continue in the coming years [
2].
During the assembly of the panels, different events, such as excessive mechanical stress on the panel or a soldering failure, can lead to defects that can harm the long-term energy generation capacity of the module. A defect that covers 8% of the total cell area may not have a significant impact on the performance if the cell is isolated. However, the same area can have a significant impact when cells are connected and soldered to each other in cell arrays [
3], which is the most common layout. The defective area may spread with time, breaking the cell and considerably reducing the energy production capacity of the module. As cell production increases, quality inspection becomes critical to avoid defective cells being assembled into the final panel and, thus, to ensure high efficiency and reliable performance of the produced panels.
Today, different imaging techniques are used during PV module inspection to obtain images where defects appear highlighted, for example, Electroluminescence (EL) [
4,
5], Photoluminescence (PL) [
6,
7], or Thermography [
8,
9]. During the assembly stage, EL is one of the predominant techniques. In EL, the cells emit light under electrical current by the phenomenon of Electroluminescence. This light is then captured in high-resolution images where defective areas, with less current flow, appear darker than the remaining parts of the cell [
10]. The most common defects that may arise are cracks, breaks, and finger interruptions [
11].
Figure 1 shows the appearance of these kinds of defects in EL images. This technique requires a high degree of control over environmental conditions since the images have to be taken in total darkness. This requirement makes the application of EL unfeasible for outdoor panel inspection but suitable for inspection in the manufacturing phase with controlled environmental conditions. EL provides high-resolution images where defects are highlighted.
Despite these enhanced images, the defect detection process has to be done by checking each of the cells individually. This process is currently done to a great extent by human operators, who are prone to error, as it is hard for humans to meet industrial production cycle-times. For example, a panel composed of 60 PV modules must be examined in under 30 s, which means half a second for each module. In addition, human subjectivity is inevitable when deciding if a cell is defective or not, affecting the quality inspection effectiveness. In recent years, several proposals have been made towards the automation of quality inspection. By automating the inspection, all the cells can be checked faster and always using the same objective criteria, overcoming the previous limitations.
The proposed approaches for automatic PV module inspection can be grouped into three categories according to the required level of human intervention: (1) traditional image processing-based approaches, where the procedures used to highlight and binarize defective areas in the images must be manually defined, (2) shallow learning approaches, where machine learning techniques are used for defect identification based on meaningful features that must be obtained through manual feature engineering, and (3) deep learning techniques, where the features are automatically obtained from the data. Note that higher levels of human intervention in the image processing algorithm implementation implies larger development times in order to adapt it to new requirements.
The remainder of the paper is organized as follows:
Section 2 presents some background and related works in the field of photovoltaic cell inspection. In
Section 3, the unsupervised and supervised training are explained.
Section 4 details the dataset, metrics, and hardware and software specifications used in the experiments.
Section 5 describes the performed experiments and their results. Finally,
Section 6 provides some conclusions about the work.
2. Related Works
In this section, some of the proposed approaches for the automatic detection of defects in images of PV modules are going to be summarized.
The traditional image processing methods are mainly based on manual feature engineering. In this process, the discriminating characteristics of the defects are used to process and binarize the images to highlight the defects. For example, using anisotropic diffusion filters [
12,
13] or modified steerable filters [
14,
15], the background in the modules is smoothed such that only defects remain. Or inversely, applying anisotropic in Tsai et al. diffusion [
16] or filters in the frequency domain in Tsai et al. [
17] to remove the defects in the cells, so then the difference between the filtered and the original image is used to highlight the defects.
In other works, the manual feature extraction is combined with shallow learning methods: In Tsai et al. [
18] and Zhang et al. [
19], they extract Independent Component Analysis basis (ICA) from defect-free solar cells samples to construct a demixing matrix. At the inspection stage, the images are reconstructed using the learned basis images and the reconstruction error is used for detecting the presence of defects. In Rodriguez et al. [
20], 20 different LoG-Gabor Filters are used to extract 81 features for each pixel in the images. Then, Principal Component Analysis (PCA) is used to refine these features, and finally, a Random Forest model classifies each pixel as non-defective or defective. In Tsai et al. [
21], they extract characteristics of local grains patterns and clusterize them using Fuzzy C-means. At testing, the distance of the grains from the samples to the clusters is used to decide if the grain is defective or not. Similarly, in Su et al. [
22], they use a modified Center-Symmetric Local Binary Patterns (CS-LBP) feature descriptor to extract features from the defective areas in the cells, which are then used to train the K-means algorithm. The cluster centroids from training samples are employed to generate global feature vectors to train a classification algorithm, such as a Support Vector Machine (SVM).
Overall, both traditional image processing methods and traditional methods in combination with shallow machine learning techniques can achieve high defect detection rates. However, manual feature engineering is usually time-consuming and requires high domain knowledge. In addition, inspection systems based on these approaches are commonly very case-specific solutions that lack adaptability. A change in the data can mean a substantial change in the inspection system, which would require additional time-consuming manual feature engineering labor to adapt it.
In more recent works, DL methods have been widely applied in the solar cell inspection field. These methods can directly extract meaningful features from the raw data without any feature engineering work, thus making these methods more flexible to changes. References [
4,
23,
24,
25,
26,
27] are some examples of how Convolutional Neural Networks (CNN) have been employed for classifying solar cells as defective or defect-free during quality inspection. In addition to classification, in some cases, the location of the defects in the cells is also provided. For example, in our previous works, we used the sliding window approach with a CNN designed for classification to process cell images by patches and accumulate the results in a heatmap-like image, highlighting areas with a high probability of being defective [
28]. Or we explicitly train a Fully Convolutional Network to perform pixel-wise classification [
29]. Additionally, other researchers have also proposed other types of defect location, using bounding boxes [
30,
31], or by visualizing the activation maps from the last network layer [
25,
32].
Nevertheless, to obtain high detection rates, the networks are trained using supervised learning, which requires a considerable amount of annotated defective data. The quality of the results (i.e., detection rate) in supervised learning is directly proportional to the amount of employed annotated data. However, this represents a challenge in many industrial applications as sufficient defective samples may be difficult to obtain in an industrial setting. Thus, the creation of accurate inspection models may be difficult, as a new manufacturing line will need time to generate a representative dataset with enough examples. There may also be certain very rare defect types that might be difficult to gather for the dataset.
To tackle the problem of insufficient defective data, several researchers have proposed different solutions. One of the approaches is transfer learning [
26,
33,
34], where the neural network is initialized using weights from a previously trained network. Then, the model is refined using a few case specific samples. Transfer learning is limited by the similarity between the source and target domains. Currently, the available pre-trained weights have been mainly trained on natural images rather than on industrial datasets, which can limit their use in industrial cases.
Another approach consists in generating synthetic data to compensate imbalanced datasets employing variants of the Generative Adversarial Network (GAN) [
35]. These architectures have shown remarkable capabilities in learning latent representations of real data to generate realistic synthetic samples. In this way, synthetic defective samples are generated and employed along with real samples to train a conventional CNN. This approach alleviates the risk of overfitting and improves the generalization capability of the network. This strategy has been successfully employed to generate realistic human faces [
36], synthetic machinery faulty signals [
37], and also defective solar cell samples [
11,
38]. Nonetheless, both Transfer Learning and GANs still require defective data.
In other domains, researchers have used an anomaly detection approach to avoid the need for defective data. The objective of this approach is to train a network to learn the probabilistic distribution of normal data. Then, the learned features can be used to discriminate samples that will be far from what is considered normal and are, thus, detect defective samples. In anomaly detection, just defect-free samples are used during training and there is no need for annotations. These features make anomaly detection an interesting approach for industrial applications. Anomaly detection has been applied in different industrial cases, e.g., Haselmann et al. [
39] and Staar et al. [
40] and also in the medical domain, e.g., Schlegl et al. [
41] and Chen et al. [
42], where it is also difficult to obtain anomalous data for training. However, these approaches usually result in less accurate models than those obtained with supervised training.
In the case of solar cell inspection, anomaly detection approaches have been proposed in Qian et al. [
34,
43], where they train a Stacked Denoising AutoEncoder (SDAE) to extract features from defect-free samples using the sliding window method. In Qian et al. [
34], they extend the network architecture with a pre-trained VGG16 network that works as an additional feature extractor. This extra branch extracts additional features which are fused with the already extracted feature maps enhancing the obtained information. At testing time, the same procedure is applied and the extracted features are processed using matrix decomposition to localize the defects in the cells. After that, some morphological processing is applied to improve the results. However, in these works, only the detection of cracks is targeted. Furthermore, the images are processed using the sliding window method. This procedure slows down the inspection process limiting its deployment into a real production environment.
An inspection system should be able to detect the maximum number of defects, be fast to meet the established inspection time, and require the minimum human intervention in order to save resources and time. The main contribution of this work is a methodology that tries to meet these requirements by combining the accuracy of supervised models with the benefits of an anomaly detection approach, i.e., that it only requires defect-free samples for training and avoids the need for data labeling. The approach has been tailored for the detection and segmentation of different types of defects, such as cracks, microcracks, or finger interruptions, in EL images of solar cells; however, it should also be applicable in other industrial inspection tasks. The methodology is illustrated in
Figure 2 and consists of two stages:
First, using an anomaly detection approach, defect-free samples can be employed to obtain an initial inspection model that from the very beginning of a new production line can detect and segment anomalies in EL images of cells. For this purpose, f-AnoGAN [
41], a GAN-based anomaly detection network that has been shown to work well with medical images, is adapted for inspection. The original architecture has been modified such that instead of using a sliding window method, the images can be processed as a whole, reducing the processing time drastically. In addition, a modified training scheme is proposed which improves the defect detection rates with respect to the results with the original training scheme.
Then, as defective cells arise, the anomaly detection model will separate them from the defect-free ones and it will generate pixel-level annotations without any human intervention. The experiments have shown that these segmentation results can be used as pixel-wise labels for the supervised training of a U-Net [
44]-based model that improves the defect detection rates of the anomaly detection model.
3. Methodology
This section details how the different networks used in the methodology are trained.
3.1. Unsupervised Model for Anomaly Detection
In this stage, the objective is to train an anomaly detection model that can detect and locate anomalous patterns within solar cell images. This is achieved by training f-AnoGAN network to encode and reconstruct only defect-free samples, so then, when processing defective samples, it will output a defect-free version of them. The differences between the original and the reconstructed defect-free version will highlight anomalies in the cells.
f-AnoGAN is composed of three different sub-networks (a generator G, a discriminator D, and an encoder E) that are trained in two phases.
In the first training phase, the generator and discriminator are trained in an adversarial manner to learn a latent space of normal data variability using just normal data. In this work, defect-free samples are considered as normal data and defective samples as anomalous data.
In the second phase, the encoder is trained to map normal data from the image space to the learned latent space while the Generator and Discriminator are kept unaltered. Once these two phases have finished, the encoder can map test images from the image space to the latent space, and the generator can reconstruct the encoded version of the images from the latent space back to the image space. As the network is trained on normal data, it only learns to encode and reconstruct correctly normal features; thus, when processing anomalous samples, deviations from the reconstructed images can be used for anomaly detection and location.
3.1.1. Phase 1-WGAN Training
The objective of the first training phase consists in learning the variability of normal data. For this purpose, a Wasserstein GAN (WGAN), composed of a generator and a discriminator, is optimized to learn the normal data probability distribution. The optimization is achieved using the gradient penalty-based loss shown in Equation (
1), proposed by Gulrajani et al. [
45], where the Wasserstein distance between the real normal data probability distribution
, and generator synthesized data probability distribution
is minimized.
where
,
with
, and
is the penalty coefficient. During training, the generator is fed with a noise input vector
, sampled from a latent space
, and tries to learn the mapping from that latent space to the image space
. The synthesized data
should follow as closely as possible the real data distribution
. Simultaneously, the discriminator is given the generated sample
and the real sample
, so it outputs a scalar measure of how close both distributions are. The training and the components in this phase are illustrated in
Figure 3.
After the first phase of training, (1) a latent space that represents the variability of the normal data, (2) a generator that can map samples from this latent space to image space, and (3) a discriminator that can detect samples that do not follow the normal data distribution are obtained.
However, at this phase, there is no network component that can perform the inverse mapping, i.e., from image space to latent space. The next phase will focus on learning this mapping.
3.1.2. Phase 2-Encoder Training
In the second training phase, illustrated in
Figure 4, the objective is to make the encoder learn to map a real image to the latent space such that the generator can map it back to the image space. During this phase, both the generator’s and the discriminator’s weights remain unaltered. This network configuration is denoted as
in Reference [
41]. In this case, the encoder is optimized by minimizing the Mean Square Error (MSE) with respect to the difference between the original image
and the reconstructed one
. Additionally, the reconstruction error from the
architecture loss is extended by including feature residuals from an intermediate layer in the discriminator, yielding the
architecture. By taking into account these residuals in the feature space, the reconstruction is improved [
41]. The loss function of
is defined by Equation (
2):
where
corresponds to the discriminator’s intermediate layer features,
is the dimensionality of the intermediate feature representation, and
k is a weighting factor.
3.1.3. Anomaly Detection
Once the training has finished, all the components are fixed and ready to be used for anomaly detection. At this point, the images are processed as in the encoder training. First, the encoder maps the images to the latent space, and then, the generator maps them back to the image space. Finally, the difference between the reconstructed and the original image defined in Equation (
3) is used for anomaly detection.
where
,
, and
k is a weighting factor from Equation (
2).
Only defect-free cell samples have been used for training; therefore, the network will have only learned to reconstruct normal samples. In the case of defect-free samples, the network outputs an image similar to the input image; thus, there is not much deviation when subtracting one image from the other. Instead, when processing a defective cell, the output is a defect-free version of the input sample. As a consequence, the deviation between the original and reconstructed images can be used to detect anomalous parts. This behavior is shown in
Figure 5.
The absolute value of the pixel-wise difference between the original and the reconstructed image,
, is used for pixel-wise anomaly detection. By applying a threshold
c, defined in Equation (
4), to the residuals image obtained from
, the binary image
is obtained.
This binary image can be considered as a pixel-wise annotation of defective samples as described in
Section 3.2.
In this work, two modifications have been made to the original f-AnoGAN network to adapt it for anomaly detection in photovoltaic cell manufacturing.
With f-AnoGAN, the images are processed in patches of size 64 × 64 pixels, which requires multiple executions of the network, increasing the time to process an entire cell. As a consequence, the network does not meet the industrial production cycle time (under half a second per cell). In order to reduce the inspection time, the encoder input and the generator output layers’ dimension was increased. Thus, whole cell images will be processed in a single pass, reducing processing time drastically with respect to the original sliding window approach.
In addition, the training scheme was also modified. In f-AnoGAN, the generator is frozen during the second training phase in
Section 3.1.2; thus, only encoder weights are modified. This can limit the network capability in terms of reconstructing the input image. In order to maintain a stable training without restricting the reconstruction capability, the generator is also trained at a certain number of the encoder training iterations with a lower learning rate, while keeping the discriminator unaltered. By training the generator, the reconstruction of defect-free samples will improve. Therefore, the deviation between the original and the reconstructed images of normal data will be reduced. Consequently, both the anomaly score and the pixel differences will be lower for defect-free samples, but higher for defective ones; thus, the model’s detection rate will improve.
3.2. Supervised Model for Defect Segmentation
In anomaly detection, the model is taught to find everything that is not considered normal. In supervised training, the model is instead trained with labels to search for specific defective patterns in the data, which usually yields more precise models for defect detection. Using the anomaly detection approach as an automatic labeling method, one may benefit from the precision of supervised learning models avoiding the time-consuming, and not always trivial, pixel-level labeling task, thereby considerably reducing the effort dedicated to the setup of a new inspection system.
This way, in the first stage of the inspection system development, where lots of defect-free cell samples and few defective cell samples are available, an initial inspection model can be obtained using anomaly detection. Then, as defective cells arise, the trained anomaly model will process the samples and output pixel-wise annotations avoiding the time-consuming data annotation task. After some time, when there are enough annotated defective cell samples, a model will be trained in a supervised manner to search for specific features in the images as in our previous works [
28,
29], obtaining more accurate models.
For the supervised training, as in our previous work [
29], U-net [
44], an end-to-end trainable Fully Convolutional Neural Network (FCN), was used. This network has been shown to work well on biomedical image segmentation with low amounts of data. The network follows an encoder-decoder shape where, after successive downsampling and then upsampling steps, features from the images are extracted to finally output a segmentation map of the same size of the input. Additionally, skip connections connect blocks in the encoder and decoder parts helping to recover fine-grained details lost during the downsampling and improving the final results.
6. Conclusions
In this work, an anomaly detection-based methodology has been proposed for the development of a quality inspection system of monocrystalline solar cells. With anomaly detection, only defect-free samples are required to obtain a model for inspection which can detect and locate defects in the cells. This feature is key for the development of a PV module inspection system as it permits companies to have an inspection model from the very beginning stage of a new production line setup, without waiting for defective data to appear. Furthermore, it also avoids expending time in the annotation of the samples which saves a lot of effort concerning data preparation when constructing an inspection system.
In order to apply anomaly detection for industrial inspection, a GAN proposed to detect and locate anomalies in the medical domain has been adapted. The adaptations have been two-fold: First, the architecture has been modified such that the images can be processed in a single step instead of processing them by patches. In this way, less time is required to process a cell; therefore, the established inspection time mark of less than half a second per cell has been met. And second, the training scheme has also been modified. This modification has resulted in an improvement in the defect detection capabilities of the model.
In addition, it has been experimentally demonstrated that the results from the anomaly detection are potential pixel-wise labels that can be used for supervised training. In the experiment, the defect localization results obtained from a model trained with labels generated by experts and a model trained with automatically generated labels have been compared. The comparison has shown that using automatic labels is comparable to using manual annotations; thus, it is feasible to use anomaly detection as an automatic annotator which saves time and resources.
The proposed methodology is rooted in the use of GANs, which are known for their difficult training process. In industry, most of the quality inspection cases are related to homogeneous parts that can alleviate to some extent the training instability that the network can face. However, less homogeneous parts might need some modifications in the training or in the network, in order to learn the data distribution and obtain high quality image reconstruction for the anomaly detection.
Lastly, although the experimental results already demonstrate the feasibility of the proposed method for inspection of solar cells, we plan to explore different architectures and parameters for optimizing the methodology in future works. It would also be interesting to test it in other industrial contexts.