1. Introduction
The global transition toward low-carbon energy systems has positioned photovoltaic (PV) technology as a central driver of renewable electricity generation. However, the long-term reliability and performance of PV modules are significantly hindered by latent structural and electrical defects that develop during manufacturing, installation, and field operation. Among these, microcracks remain one of the most critical degradation mechanisms, as they disrupt current flow, increase series resistance, and accelerate long-term power loss [
1,
2,
3,
4]. Other defect types, including finger interruptions, inactive areas, grain boundaries, and dislocation clusters, further degrade module performance by impairing charge transport pathways or reducing effective cell area [
5,
6,
7]. A broad survey of PV degradation mechanisms shows that such defects contribute substantially to long-term performance decline, impacting both module reliability and lifecycle economics [
8].
Electroluminescence (EL) imaging has emerged as one of the most effective non-destructive diagnostic tools. When a PV cell is forward biased, radiative recombination produces near-infrared emission whose spatial distribution reflects the internal crystal and metallization structure [
9]. EL images therefore reveal microcracks, shunts, inactive regions, and metallization failures with far greater sensitivity than visual inspection or thermographic methods. Comparative studies confirm that EL imaging provides deeper defect insight than infrared thermography and electrical IV (current-voltage) testing, making it a key technology for quality control in both laboratory and industrial environments [
10].
With the expansion of PV manufacturing and global deployment, manual interpretation of EL images has become increasingly impractical. This has motivated a surge in machine learning (ML) and deep learning (DL) techniques for automated defect detection. Early ML approaches relied on handcrafted features combined with classifiers such as logistic regression, random forests, and support vector machines. Deep convolutional neural networks (CNNs) have since demonstrated substantial improvements in detection accuracy, robustness, and feature generalization. Deitsch et al. [
11] established the first widely used EL classification benchmark, and in a follow-up study [
12], they introduced the ELPV dataset for supervised learning. More recent advances include CNN-based IR/EL fusion for module inspection [
13], DL-based EL defect detection pipelines for industrial settings [
14] and generalized deep-learning frameworks for cell-level classification [
15,
16]. Additional studies have explored defect identification using statistical EL parameters, segmentation-based architectures, and automatic crack detection algorithms [
12,
17]. Collectively, these efforts confirm the suitability of DL methods for real-time PV defect diagnosis.
Despite these advances, two critical aspects remain insufficiently explored. First, most DL pipelines treat EL images as grayscale inputs duplicated across RGB channels, ignoring the fact that defect visibility is strongly intensity-dependent and spatially heterogeneous. Such simple channel replication does not explicitly encode physical defect characteristics and may limit feature separability for fine defects such as microcracks. Second, the role of input image resolution has not been systematically analyzed. The majority of existing studies adopt a fixed input size—typically dictated by backbone compatibility or dataset preprocessing—without evaluating performance trends across multiple resolutions [
18]. For example, several representative works resize EL images to a single resolution (e.g., 224 × 224 or 300 × 300) and report classification performance without discussing the impact of spatial resolution on defect detectability or computational efficiency [
19]. Moreover, commonly used benchmark datasets, such as ELPV, are distributed at a fixed native resolution, which further encourages single-resolution evaluation protocols.
EL acquisition systems produce images with widely varying resolutions depending on sensor type, optics, and inspection distance; however, CNN architectures impose fixed input sizes (e.g., 224 × 224 or 300 × 300). Downsampling may suppress small defects, while higher resolutions substantially increase computational cost and inspection time. The trade-off between resolution, diagnostic accuracy, and computational efficiency therefore remains an open and practically relevant problem.
Building on our preliminary work published in [
20], this paper presents an extended and refined framework for EL-based PV defect classification that jointly addresses these challenges. We introduce a defect-aware RGB representation that maps physically meaningful intensity ranges to color channels, enhancing the contrast between cracks, inactive regions, healthy conduction areas, and metallization features. This representation is combined with systematic resolution analysis proposing three representative CNN architectures—ResNet–50, EfficientNet–B0, and EfficientNet–B3—evaluated at two input resolutions. Using the ELPV benchmark dataset, we demonstrate that the proposed processing enables higher accuracy than previousely reported results, including outperforming the 88.42% accuracy acheived by Deitsch et al. [
11] using a VGG19 regression model. Our extended pipeline, incorporating optimized preprocessing, augmentation, fine-tuning, and threshold selection, achieves up to 92.39% accuracy, offering new insights into resolution–capacity interactions and providing practical guidance for EL-based PV defect detection systems.
The remainder of this paper is organized as follows.
Section 2 describes the dataset, class definition, and the defect categories considered in this study, and summarizes the main challenges of EL-based inspection.
Section 3 presents the proposed preprocessing pipeline, including local contrast enhancement and the defect-aware RGB representation, and outlines the deep learning models used for classification.
Section 4 reports the simulation results, including the quantitative validation of the RGB representation using a lightweight CNN, the performance comparison of deeper transfer-learning models, and the analysis of computational cost under different input resolutions. Finally,
Section 5 concludes the paper with a summary of findings, limitations, and future research directions.
The objectives of this study are as follows:
Develop a physically interpretable defect-aware RGB representation of EL images that improves defect-feature separability compared to baseline grayscale-to-RGB mappings.
Quantify the impact of this representation on defect classification performance using both lightweight and deeper CNN architectures.
Analyze the trade-off between input resolution, classification performance, and computational cost under the constraints of the benchmark dataset.
2. Defect Representation in EL Imaging
The dataset used in this study is derived from a publicly available benchmark specifically prepared for photovoltaic (PV) defect detection research [
11]. It contains 2624 EL images, all acquired under controlled laboratory conditions from crystalline silicon solar cells. Each image is provided in grayscale format with a native resolution of 300 × 300 pixels, ensuring uniform spatial dimensions across the dataset, including microcracks, inactive regions, broken fingers, grain-boundary darkening, and dislocation clusters. Each EL image corresponds to a single solar cell and is accompanied by a defect probability value, as defined in the benchmark dataset creation procedure [
11].
2.1. Image Characteristics
Each raw sample is presented as a single-channel grayscale image with a special resolution of
, where
denotes the
image and all images belong to
Based on the defect probability, we transformed the dataset to a binary classification problem, where images with a defect probability equal to or greater than 0.5 were labeled as defective (label 1), while those below 0.5 were labeled as functional (label 0). Mathematically, the classification rule is defined as follows:
where
denotes the defect probability assigned to the
image.
Generating the labeled dataset:
This resulted in 1909 (72.7%) functional samples and 715 (27.3%) defective samples, reflecting a moderate class imbalance. The dataset therefore captures realistic variations in EL appearance and defect morphology and is widely recognized as a standard benchmark for evaluating automated PV defect detection algorithms.
2.2. Visual Defect Taxonomy in EL Imagery
Electroluminescence imaging reveals physical degradation mechanisms as variation in emitted photon intensity, where malfunctioning regions typically exhibit localized darkening.
Table 1 summarizes common defect types observed in electroluminescence (EL) images of crystalline silicon photovoltaic cells. The original database has only two labels for PV cells, functional and defective, even though clearly visible different damages are present. We have therefore decided to inspect image-by-image manually and assign a specific defect-type label to each defective cell. The defect categories and their characteristics were established, as illustrated in
Table 2, based on two complementary sources:
Descriptions reported in the existing literature on EL-based PV diagnostics.
Qualitative inspection of the original ELPV dataset, which contains these defect patterns in the images although they are not explicitly annotated by defect type.
Microcracks typically appear as thin dark fracture-like lines in EL images and are commonly attributed to mechanical stress during handling, transport, or lamination, often leading to power loss and hotspot formation [
1,
3]. Inactive areas manifest as extended dark regions with little or no EL emission, indicating electrically disconnected or severely degraded regions of the cell and resulting in significant performance loss [
21]. Finger breaks or interruptions are characterized by dark linear discontinuities along metallization fingers, caused by grid interruptions and associated with increased series resistance and local efficiency losses [
22]. Dislocation clusters appear as localized dark granular structures and are typically linked to crystallographic defects and material impurities, leading to localized degradation [
9].
Based on their radiative appearance and root causes, the principal defect categories considered in this work are summarized in
Table 1 and illustrated in
Figure 1.
Although different defect types are identified and analyzed to characterize the dataset and to support the physical interpretation of electroluminescence patterns, the classification task addressed in this study is formulated as a binary problem, distinguishing between functional and defective cells. In practice, photovoltaic cells often exhibit multiple defect types simultaneously, as reflected by the observed defect combinations in the dataset. For this work, all defect manifestations—whether isolated or combined—are grouped into a single faulty class, which is consistent with industrial inspection objectives where reliable fault detection is prioritized over fine-grained defect categorization. The defect-type analysis is therefore provided to justify the proposed defect-aware RGB representation and to enhance interpretability, rather than to introduce a multi-class or multi-label classification task. Extending the framework toward defect-type classification constitutes ongoing work and is outside the scope of the present study.
2.3. Challenges in Defect Visibility in EL Images
Despite the well-defined visual taxonomy of defects in EL images (
Section 2.2), their reliable discrimination remains challenging due to intrinsic intensity and resolution constraints. In EL images, the observed grayscale intensity
reflects spatial variations in radiative recombination. Highly emissive structures such as busbars and fingers generate consistently high-intensity responses,
which dominate the dynamic range and may obscure low-intensity defect signature.
In contrast, defects such as microcracks and inactive regions often produce subtle local intensity reductions,
making them difficult to distinguish from normal texture variations in raw grayscale images.
Additionally, special non-uniformity across the cell prevents the use of a single global threshold for reliable separation of defective and non-defective regions. Even though both pixels may belong to the same areas under different local conditions.
Resolution reduction further impacts defect visibility. Down sampling from the original resolution
to
suppresses fine structural details:
3. Detection Procedure for Cells Based on EL
The proposed defect-classification strategy is fundamentally driven by a defect-aware image representation paradigm that tightly couples intensity-to-RGB transformation with resolution adaptation to simultaneously enhance defect discriminability and control computational cost. In contrast to conventional pipelines where color conversion and resizing are treated as auxiliary or purely technical preprocessing steps, this work explicitly elevates both operations to core design variables within the learning strategy. The methodology is therefore constructed to investigate how physically meaningful color encoding and resolution selection influence feature learning, classification robustness, and computational efficiency in EL-based photovoltaic inspection.
The complete strategy is organized into four stages: defect-oriented image transformation, dataset partitioning under realistic class imbalance, transfer learning-based model training, and performance evaluation across resolutions and architectures, as summarized in
Figure 2. By embedding image representation and resolution control directly into the learning pipeline, the proposed methodology provides a structured and reproducible framework for analyzing the trade-offs between defect visibility, model capacity, and computational burden.
All preprocessing and simulations were implemented in Python 3.12.12 (Google Colab environment). The color enhancement pipeline, including CLAHE contrast enhancement and defect-aware RGB mapping, was implemented using the OpenCV library. The convolutional neural network (CNN) modeling, training, and evaluation were performed using the TensorFlow/Keras framework. This implementation environment ensures reproducibility and allows full control over the preprocessing and learning pipeline.
3.1. Defect-Oriented Image Preprocessing
Each EL image was originally provided as a single-channel grayscale matrix:
which limits the visibility of subtle defect-related structures such as microcracks or lightly degraded regions. This is motivated by the intensity-based pseudo-color methods widely used in medical imaging to enhance anatomical interpretability (MRI and CT false-color rendering) [
23,
24,
25]. Hence, a similar strategy is proposed for photovoltaic EL images to improve feature separation and spatial perception without changing structural information.
- (a)
Initial grayscale-to-RGB transformation
As a first attempt, a direct grayscale-to-RGB pseudo-color transformation was applied by thresholding pixel intensities into a fixed number of color ranges. This approach was initially adopted due to its simplicity and widespread use in intensity-based visualization techniques. Visual inspection of the resulting EL images is illustrated in
Figure 3.
After applying the baseline grayscale-to-RGB pseudo-color mapping, visual inspection revealed that metallic busbars were systematically highlighted using the same color range as severe defect regions. This occurs because busbars naturally exhibit high electroluminescence intensity due to their strong conductive properties. As a consequence, conductive but non-defective structures were visually encoded in a manner indistinguishable from true defect patterns.
This qualitative observation indicates that the baseline pseudo-color mapping introduces semantic ambiguity between conductive structures and actual defect regions, which is undesirable for any learning-based diagnostic system. This limitation motivated the development of a more physically meaningful, defect-aware representation strategy.
- (b)
Defect-aware RGB mapping
To overcome this limitation, a defect-aware transformation was developed. The process begins with local contrast enhancement using Contrast Limited Adaptive Histogram Equalization (
CLAHE), defined as
where
denotes the
CLAHE operator,
is the clip limit, and
is the tile grid size.
In this work, the parameters were fixed to
which partition the image into
non-overlapping tiles, apply histogram equalization independently within each tile, and clip the local histogram at
to limit noise amplification.
Lower clip limits
yield insufficient contrast enhancement, resulting in a reduced separation between low- and high-intensity regions:
which suppresses fine-crack and low-emission defect signatures. Conversely, higher clip limits
excessively amplify background noise and introduce artificial texture. The chosen configuration (
α = 2.0,
τ = 8 × 8) provides a stable compromise, enhancing local defect-related intensity variations while preserving homogeneous emission regions.
Next, rather than duplicating channels, pixel intensities were segmented into defect-related ranges:
where
is the
percentile.
To support the selection of the percentile thresholds, several configurations were quantitatively evaluated using the same preprocessing pipeline (
CLAHE, busbar/border exclusion, and percentile computation on the active area).
Table 3 summarizes the resulting pixel distributions and inter-band separability.
The configuration 10–50–90 produced the highest inter-band intensity separation (32.18 gray levels) but allocated only 9.7% of pixels to the lowest-intensity band and 10.4% to the highest-intensity band, which risks underrepresenting thin cracks and conductive structures. Conversely, the symmetric configuration 25–50–75 allocated approximately 25% of pixels to each band but reduced the overall separability (23.03) and increased the likelihood of merging healthy silicon emission into the conductive band.
The proposed configuration 20–40–80 provides a balanced compromise: it preserves sufficient pixel coverage for low-intensity defect regions (19.4%) and high-intensity conductive pathways (20.7%), maintains a physically consistent dominance of healthy emission (40.0%), and achieves stable separation between intensity bands (25.40). The low standard deviation across samples further confirms the robustness of this partitioning. This quantitative analysis supports the selection of 20–40–80 as a principled and numerically stable choice for defect-aware RGB encoding.
Pixels are then assigned to color bands modeling different physical regions, as presented in
Table 4.
Thus, the final false-color representation is as follows:
This step provides physically interpretable defect highlighting, enabling CNNs to better discriminate defect features, which improves the distinction between finger-related intensity and crack-related intensity.
Figure 4 illustrates samples of defective PV cell EL images after applying the adopted awareness defect RGB mapping.
Visual inspection of the transformed EL images (
Figure 4) confirms that busbars are now consistently grouped within the conductive pathway class, while crack structures and inactive regions remain distinctly emphasized.
A quantitative validation of the proposed representation using a lightweight CNN, whose architecture is illustrated in
Figure 5, is presented in the
Section 4 to objectively evaluate the discriminative contribution of the RGB mapping strategy.
3.2. Resolution Standardization
In this study, image resizing is introduced as a controlled experimental variable rather than a simple architectural constraint. The objective is not limited to satisfying the input requirements of convolutional neural networks (CNNs); it is also to systematically evaluate the trade-off between defect classification performance and computational cost under different spatial resolutions.
Based on our preliminary work published in [
20] conducted on the same ELPV dataset using grayscale EL images, it was demonstrated that increasing image resolution does not necessarily lead to higher classification accuracy for machine learning and lightweight CNN models. In contrast, higher resolutions were shown to significantly increase training and inference time. This observation can be expressed as follows:
where
denotes the input resolution,
the classification accuracy,
and the computational time. These results indicate that accuracy saturates with increasing resolution, while computational cost grows monotonically.
Motivated by this finding, the present work extends the analysis to strong deep learning architectures and investigates whether this resolution–performance behavior remains valid when defect-aware RGB EL representations are employed. In addition, this analysis reflects practical deployment constraints, as EL imaging systems may operate under limited sensor resolution, acquisition time, and hardware cost.
To ensure a fair and systematic comparison across architectures, all RGB-transformed EL images are resized to the operational input resolutions associated with each model. Let R() be the resizing function based on bilinear interpolation:
where
These resolutions correspond to widely adopted configurations in modern CNN architectures and allow the assessment of classification stability under reduced spatial detail. By comparing performance and runtime across these resolutions, this study directly evaluates whether lower-resolution RGB EL input associated with reduced computational load and lower sensor cost can preserve or even enhance defect classification accuracy relative to higher-resolution alternatives.
3.3. Deep Learning-Based Defect Classification
To evaluate the effectiveness of the proposed defect-aware RGB representation and to analyze the impact of image resolution on classification performance and computational cost, three representative convolutional neural network (CNN) architectures were employed: ResNet–50, EfficientNet–B0, and EfficientNet–B3. These models were selected due to their proven robustness in industrial vision tasks, scalability across resolutions, and widespread adoption in defect detection applications [
24,
25].
All networks were initialized with ImageNet pre-trained weights, enabling transfer learning to accelerate convergence and reduce the dependence on large-scale labeled EL datasets [
26,
27]. In this study, CNNs are not merely used as classifiers, but as quantitative validation tools to assess whether the proposed preprocessing strategy improves defect discriminability under different spatial resolutions.
Let
denote a raw grayscale EL image. Unlike conventional approaches that replicate the grayscale channel to satisfy CNN input requirements, this work applies a defect-aware pseudo-color transformation prior to resizing. The resulting RGB image is defined as follows:
where the three channels encode defect-related intensity information derived from
CLAHE-enhanced local percentile segmentation. This transformation embeds physically meaningful defects directly into the color space rather than duplicating intensity values.
Each CNN learns a nonlinear mapping:
where
represents the predicted probability of the solar cell being defective. A sigmoid activation function is applied at the output layer:
And binary labels are assigned as follows:
where
is a tuned decision threshold. This threshold calibration step is particularly important for the imbalanced nature of the ELPV dataset, where defective samples are underrepresented.
- (A)
ResNet–50
Figure 6 presents the architecture of ResNet–50.
To mitigate vanishing gradients in deep models, ResNet–50 employs residual learning. A residual block computes
allowing gradient propagation through skip connections. The input resolution required is 224 × 224 × 3. As an output, a deep semantic feature vector followed by a customized classification head consists of
- (B)
EfficientNet–B0
EfficientNet–B0 adopts a compound scaling strategy that uniformly scales network depth, width, and input resolution to achieve optimal accuracy–efficiency balance. The scaling is defined as follows:
where
denote scaling factors, respectively, network depth (number of layers), width (number of channels per layer), and input image resolution (H,W), and
is the compound coefficient controlling overall model size.
ensuring that each increment of
approximately doubles the total computational cost (FLOPs) while preserving a balanced growth among network capacity dimensions.
EfficientNet–B0 operates at an input resolution of 224 × 224 × 3, making it particularly suitable for evaluating the performance of defect-aware RGB representations under low-resolution, low-computation constraints, which are relevant for industrial deployment scenarios.
- (C)
EfficientNet–B3
EfficientNet–B3 extends B0 using higher compound scaling
, enabling extraction of finer spatial features inherent in EL images. As an original input resolution, 300 × 300 × 3 was chosen [
28].
In addition, when downscaled to
, the model remains operational but loses some spatial defect granularity. Thus, both resolutions were tested to assess resolution sensitivity. The architecture of the effecientNet-B3 model is presented in
Figure 7.
3.4. Transfer Learning-Based Model Training
Each CNN architecture defines a learnable function:
where
indicates the probability of a cell being defective.
Binary Focal Loss is used to address class imbalance:
with hyperparameters
.
Optimization uses ADAM [
29]:
Learning rate is automatically adjusted using Reduce LR On Plateau, triggered when validation loss stagnates.
Class imbalance is weighted by
where
is number of samples in class c.
3.5. Thershold Calibration and Final Decision
The network score
is converted to binary class through an optimized threshold
:
is selected to maximize validation accuracy:
This significantly improves the defective cell recall.
3.6. Performance Evaluation
Models are tested on with .
Metrics:
where
follow standard meanings.
4. Results and Discussions
All experiments were conducted on the defect-aware RGB-mapped EL dataset comprising 2624 electroluminescence (EL) images of crystalline silicon solar cells. The task is formulated as a binary classification problem, where each sample is assigned a label
such that
The dataset exhibits a moderate class imbalance (72.7% functional, 27.3% defective), which was intentionally preserved to reflect realistic industrial inspection conditions. A stratified partitioning strategy was applied to maintain class balance across all subsets:
To ensure the reliability and generalization capability of the developed classification models, a robust evaluation strategy was employed using multiple stratified random splits of the dataset. Four independent train–validation–test partitions were generated, each preserving the original class distribution to minimize sampling bias. In addition, validation and test sets were alternated across selected experiments to eliminate the possibility of favorable data partitioning affecting performance interpretation. This repeated-split evaluation enabled a stability assessment of the obtained metrics, confirming that performance—particularly that of EfficientNet–B3—remains consistent and is not contingent on a single dataset split.
Model performance was assessed using accuracy, -score, and Intersection over Union ().
4.1. Quantitative Validation of RGB Representation Strategy
Before evaluating deep architectures, a preliminary experiment was conducted to objectively assess whether the proposed defect-aware RGB mapping effectively improves the discriminative quality of EL representations.
A lightweight CNN was first trained using EL images transformed with a baseline grayscale-to-RGB pseudo-color mapping. When trained on 50% of the dataset, the model achieved an accuracy of 72.59%, while the defective-class
-score and
both remained at 0.00%. The corresponding confusion matrix (
Figure 8) shows that the network consistently predicted the majority class (functional) and failed to identify defective samples. This behavior indicates that the classifier primarily learned features associated with visually prominent busbars, which were incorrectly encoded as defect-like structures in the baseline RGB representation. As a result, genuine defect patterns could not be effectively distinguished from conductive elements, leading to a collapse of discriminative information.
These findings confirm that the baseline pseudo-color encoding introduces semantic ambiguity by assigning similar color representations to non-defective conductive structures and true defect regions, rendering it unsuitable for reliable defect classification.
The same lightweight CNN architecture was subsequently trained using the proposed defect-aware RGB representation. In this case, performance improved substantially, reaching an accuracy of 84.01%, a defective-class
-score of 68.02%, and a defective-class
of 51.54%. The confusion matrix (
Figure 9) demonstrates balanced predictions across both functional and defective classes, indicating that the model is now able to extract defect-relevant features effectively.
This experiment provides quantitative evidence that the proposed defect-aware transformation restores the discriminative structure lost in the baseline representation while preserving physical interpretability. On this basis, the defect-aware RGB representation was adopted as the standard input format for all subsequent experiments involving deeper neural architectures.
4.2. Quantitative Comparison of Classification Performance
Table 5 summarizes the test performance of the three models using their best-performing configurations.
EfficientNet–B3 achieves the highest overall performance across all evaluated configurations, reaching 92.39% accuracy and the strongest defective-class metrics at an input resolution of 300 × 300. When the input resolution is reduced to 224 × 224, EfficientNet–B3 still maintains high performance, with an accuracy of 91.88%, indicating limited degradation despite significant spatial downsampling.
Notably, EfficientNet–B0 also demonstrates strong performance at lower resolution. At 224 × 224, it already exceeds 89% accuracy and further improves to 89.85% at 300 × 300. These results indicate that, once an informative defect-aware RGB representation is employed, competitive classification performance can be achieved even under reduced spatial resolution, while higher resolutions provide consistent but incremental gains.
4.3. Stability Across Stratified Splits
The dataset was partitioned using a stratified split of 70% for training, 15% for validation, and 15% for testing. This allocation was chosen to ensure that the training subset contains a sufficiently large and diverse set of samples to learn defect-related patterns, while maintaining independent validation and test subsets for model tuning and unbiased performance evaluation.
The validation subset was used exclusively for hyperparameter selection, early stopping, and model selection, thereby limiting overfitting and preventing information leakage from the test data. The test subset remained completely unseen during training and optimization to provide an objective estimate of generalization performance. Stratification was applied to all subsets to preserve the original class distribution, which is particularly important given the moderate class imbalance of the dataset.
To assess the robustness and generalization capability of the proposed models, performance stability was evaluated across four independent stratified train–validation–test splits generated using different random seeds. This analysis is essential to verify that the reported results are not an artifact of a favorable data partition but instead reflect consistent model behavior under varying sampling conditions.
As illustrated in
Figure 10 and
Figure 11, test accuracy remains stable across splits for all three architectures. For EfficientNet–B3, accuracy ranges between approximately 88.1% and 91.4%, with a peak performance of 92.39% observed in the best-performing configuration reported in
Table 5. This corresponds to a total variation of less than 3.5 percentage points across splits, indicating limited sensitivity to the specific data partitioning. Similarly, EfficientNet–B0 exhibits test accuracies between 85.8% and 87.2%, while ResNet–50 ranges between approximately 86.5% and 88.4%. Such constrained variability demonstrates that model performance is not driven by a particular choice of training or testing samples.
Defective-class metrics further support this observation. For EfficientNet–B3, the defective-class -score consistently remains above 0.74 and reaches value close to 0.83, while varies between approximately 0.59 and 0.70 across splits. EfficientNet–B0 maintains -scores between 0.72 and 0.76 and values between 0.57 and 0.62, whereas ResNet–50 achieves -scores in the range of 0.75 to 0.80 and between 0.60 and 0.67. These limited fluctuations confirm that the models preserve their ability to detect defective samples even when the underlying data partitions are modified.
A similar stability pattern is observed for defective-class precision and recall. For EfficientNet–B3, precision consistently remains high, between approximately 0.89 and 0.95, while recall varies between 0.64 and 0.76, indicating a stable balance between false positives and false negatives. Comparable trends are observed for EfficientNet–B0 and ResNet–50. This consistency confirms that the models do not simply achieve high accuracy by favoring the majority class but instead demonstrate reliable defect sensitivity.
Overall, these results demonstrate that the observed performance gains are systematic, reproducible, and robust to dataset partitioning, thereby strengthening the validity of the conclusions and supporting the generalization capability of the proposed framework.
4.4. Impact of RGB Representation, Resolution, and Class Imbalance
The numerical results indicate clear performance differences between the evaluated configurations. Models trained using the defect-aware RGB-mapped EL images consistently achieve higher defective-class
-score and
compared to the baseline grayscale-to-RGB mapping evaluated in
Section 4.2, where defective-class performance collapsed. Across the three evaluated architectures, defective-class
-scores range from 76.7% (ResNet–50) to 84.54% (EfficientNet–B3), while
values range from 62.1% to 73.21%, respectively.
The influence of spatial resolution can be observed by comparing model performance at different input sizes. EfficientNet–B3 trained at 300 × 300 achieves the highest accuracy of 92.39%, while EfficientNet–B0 and ResNet–50 trained at 224 × 224 maintain accuracies of 89.0% and 87.06%, respectively. For reference, a recent study using the same ELPV dataset reported an accuracy of 88.4% with a VGG19-based model trained on grayscale EL images at 300 × 300 resolution. This comparison indicates that competitive performance is obtained even at lower spatial resolution when using the proposed representation.
A more detailed resolution–performance comparison is provided in
Table 6. EfficientNet–B0 achieves 89.00% accuracy at 224 × 224 and improves modestly to 89.85% at 300 × 300, while EfficientNet–B3 increases from 91.88% to 92.39% when moving from 224 × 224 to 300 × 300. These results indicate that resolution-related performance gains are consistent but incremental. In contrast, ResNet–50 shows slightly lower accuracy at 300 × 300 compared to 224 × 224, suggesting that higher resolution does not necessarily benefit all architectures equally and may increase optimization difficulty under fixed training constraints.
The impact of resolution on computational cost is summarized in
Table 6. For all architectures, reducing the input size from 300 × 300 to 224 × 224 leads to a noticeable reduction in training time. For example, EfficientNet–B0 requires 410 s at 224 × 224 compared to 690 s at 300 × 300, while ResNet–50 decreases from 980 s to 620 s. Similar behavior is observed for EfficientNet–B3. These results show that lower spatial resolution reduces computational cost while maintaining competitive classification performance.
All simulations were conducted using the original class distribution of the dataset without artificial balancing. Despite the moderate class imbalance, defective-class metrics remain consistently high across all evaluated models, as reflected in the -score and values reported above.
By jointly considering accuracy and computational cost, as shown in
Table 6, a clear trade-off emerges between performance and efficiency. Strong classification performance is already achieved at 224 × 224 for ResNet–50 and EfficientNet–B0, supporting the practical relevance of the proposed representation for computationally efficient PV inspection.
While training time reflects computational cost during model development, inference time is the critical factor for real-time industrial deployment. In practical EL inspection systems, overall throughput is governed by the complete pipeline, including sensor exposure, data transfer, preprocessing, and inference, with reported acquisition times ranging from tens of milliseconds to several seconds depending on system configuration. From a deep learning perspective, inference latency scales with network depth, parameter count, and input resolution [
30,
31]. Consequently, the relative trends observed in training time across architectures and resolutions are expected to translate into similar inference-time behavior.
Overall, the results demonstrate that combining defect-aware RGB encoding with optimized resolution selection provides a robust and computationally efficient framework for EL-based PV defect detection, outperforming color-based approaches while significantly reducing processing cost.
5. Conclusions
This work proposes a novel approach for photovoltaic defect classification using electroluminescence (EL) images by integrating local contrast enhancement, defect-aware RGB false-color mapping, and transfer learning-based deep neural networks. Unlike conventional pipelines that simply replicate grayscale EL images into three channels for backbone compatibility, the proposed preprocessing encodes physically meaningful intensity ranges into distinct color channels. This design explicitly emphasizes cracks, inactive regions, healthy silicon emission, and conductive pathways, which improves defect visibility and enhances feature separability for convolutional models. As observed in the numerical results, this representation is associated with consistently higher defective-class precision, -score, and across all evaluated architectures compared to baseline mappings.
Comprehensive simulations conducted on the ELPV benchmark dataset show that the proposed approach achieves strong and competitive performance relative to previously reported methods. A maximum accuracy of 92.39% is obtained using EfficientNet–B3 at 300 × 300 resolution, exceeding the 88.4% accuracy reported in the literature for a VGG19-based model trained on grayscale EL images at the same resolution. Importantly, comparable—and in some cases superior—performance is also obtained at a lower resolution of 224 × 224, where EfficientNet–B0 and ResNet–50 achieve accuracies of 89.0% and 87.06%, respectively. These results suggest that performance gains are primarily associated with improved defect representation rather than increased spatial resolution alone.
The numerical analysis further indicates that spatial resolution directly affects computational cost. Reducing the input size from 300 × 300 to 224 × 224 decreases training time by approximately 35–40% across all evaluated models while preserving competitive classification performance. In addition, all simulations were conducted using the original imbalanced data distribution to reflect realistic industrial conditions, with class imbalance handled through focal loss, class weighting, and threshold calibration rather than artificial resampling. Taken together, these results indicate that combining defect-aware representation with resolution-aware model design yields an effective and computationally efficient framework for EL-based photovoltaic defect classification, with practical relevance for scalable and cost-sensitive inspection systems.
Finally, the scope of this study is constrained by the native resolution of the available benchmark dataset. While down-sampling-based resolution analysis is feasible, investigating a broader range of natively higher-resolution EL images, as commonly encountered in industrial inspection systems, represents an important next step. Extending the proposed framework to such multi-resolution datasets would enable a more comprehensive assessment of resolution effects and inference-time behavior. In addition, while the method is directly applicable to silicon-based photovoltaic technologies, adapting it to emerging PV technologies will require further investigation into technology-specific EL characteristics and defect mechanisms. These directions are identified as key avenues for future work.