Flexible Solar Panel Recognition Using Deep Learning

Sun, Mingyang; Nguyen, Dinh Hoa

doi:10.3390/en19040872

Open AccessArticle

Flexible Solar Panel Recognition Using Deep Learning

by

Mingyang Sun

¹ and

Dinh Hoa Nguyen

^2,3,*

¹

Graduate School of Mathematics, Kyushu University, Fukuoka 819-0395, Japan

²

Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi 100000, Vietnam

³

International Institute for Carbon-Neutral Energy Research (WPI-I2CNER), and Institute of Mathematics for Industry (IMI), Kyushu University, Fukuoka 819-0395, Japan

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(4), 872; https://doi.org/10.3390/en19040872

Submission received: 29 December 2025 / Revised: 20 January 2026 / Accepted: 4 February 2026 / Published: 7 February 2026

(This article belongs to the Special Issue Renewable Energy System Technologies: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Solar panels are an important device converting light energy into electricity not only from the sun but also from artificial light sources such as light emitting diodes (LEDs) or lasers. Recent advances in solar cell technologies enable them to be flexible, allowing them to be attached to things with different sizes and shapes. Therefore, it is challenging for AI-equipped systems to automatically recognize and distinguish flexible solar panels from other surrounding objects in realistic, complicated environments. Traditional recognition methods usually suffer from low recognition accuracy and high computational cost. Hence, this paper proposes a deep learning method for solar panel recognition using a complete work flow that includes data acquisition and dataset construction, YOLOv8-based model training, real-time solar panel recognition, and extended functionality. The proposed method demonstrates the accurate identification of realistic flat and flexible solar panels, including bent and partially shaded panels, with a mean average precision (mAP)@0.5 of 99.4% and an mAP@0.5:0.95 of 90.4%. The Pareto front for the multi-objective loss function minimization problem is also investigated to determine the optimal set of weighting parameters for the loss components. Furthermore, another functionality is added to detect the sizes of different solar panels if multiple ones co-exist. These features provide a promising foundation for further usage of the proposed deep learning approach to recognize flexible solar panels in realistic contexts.

Keywords:

flexible solar panel recognition; image recognition; optical wireless power transfer; deep learning; YOLO

1. Introduction

Solar power generation is well-known as a very important component among various clean energy generation approaches to reduce greenhouse gas (GHG) emissions in the power and energy sector, thereby contributing to the worldwide combat against climate change. In solar farms, solar panels are covered by dust and are partially shaded by other things from the surrounding open environments, which could significantly reduce their energy conversion efficiency. Thus, regularly scheduled cleaning tasks are needed, which are often costly. To ease such operation and maintenance costs, AI-equipped devices, e.g., drones or robots, are being employed to reduce labor costs. This, however, puts another challenge on the precise recognition of solar panels to save on spraying water and to extend the working time of battery-powered drones and robots.

On the other hand, smaller-scale solar panels are attached on internet of thing (IoT) and consumer electronic devices for harvesting artificial light. For example, in a recent emerging application called optical wireless power transfer (OWPT), solar panels serve as energy receivers, while LEDs or lasers serve as energy transmitters. This type of system has been employed in implanted and wearable devices [1], and in future EV wireless charging [2]. It can also be generalized to obtain the concept of thing-to-thing OWPT systems, providing a sustainable and green energy technology approach to society [3,4]. In such systems, the overall system efficiency is strongly and directly dependent on the alignment between optical transmitters and solar panel receivers; see, e.g., [5]. Recent studies have demonstrated the feasibility of high-power and compact OWPT systems for practical IoT applications, and the performance of OWPT systems is highly dependent on accurate alignment and system optimization [6,7]. Furthermore, in realistic contexts, solar panels are subject to partial shading caused by dynamic changes in the surrounding environments. Therefore, a challenge arises in precisely recognizing and distinguishing solar panels from other objects in complex environments.

To cope with the challenge mentioned above, the research in [8] conducted a system-level analysis of beam alignment in OWPT systems, focusing on the interaction between optical transmitters and photovoltaic receivers, and proposed strategies to optimize beam design and receiver placement to increase alignment tolerance and overall power efficiency. Kang et al. [9] proposed a hybrid target detection and perturbation observation method to improve alignment accuracy in dynamic OWPT systems. The approach continuously adjusts the transmitter orientation based on feedback from the receiver, enabling high-precision automatic alignment without requiring additional optical markers, and experimental results demonstrated enhanced energy conversion efficiency under varying conditions.

Meanwhile, a few studies have explored deep learning models to detect solar panels. Traditional image processing techniques often rely on hand-crafted features and struggle to handle deformable, occluded, or dynamically moving targets under complex backgrounds [10,11]. In contrast, deep learning methods offer several advantages: they offer notable advantages in image recognition tasks, including automatic feature extraction, high generalization, and broad application potential in medical imaging and other domains; demonstrate strong generalization in varying illumination and complex environments; and achieve high detection speed once trained [12]. Such capabilities make deep learning particularly suitable for visual recognition tasks where traditional methods cannot provide reliable solutions.

Building on these advantages, the YOLO (You Only Look Once) series, specifically YOLOv8, has been selected for this paper due to its high precision, real-time performance, and robustness in detecting small, occluded, or deformable objects. By integrating YOLOv8, solar panel recognition has the potential to perform more effectively, facilitating adaptive alignment and improved energy transfer in OWPT systems in challenging operational conditions.

Accordingly, the contributions of the current study are as follows.

A deep learning method for recognizing flexible solar panels is proposed, which is based on the YOLOv8 object detection framework, under challenging conditions such as bending and partial shading, which are difficult to address using traditional image processing techniques.
The loss function is minimized by solving a multi-objective, non-convex, nonlinear optimization problem in which the loss weights are varied. As such, the Pareto front of optima can be obtained, based on which the best set of parameters can be derived.
Real-time inference capability while maintaining high recognition accuracy is achieved in the proposed method, hence demonstrating its high potential for practical deployment in dynamic environments.
An extended real-time functionality is introduced to estimate the relative size of multiple detected solar cells, which provides valuable information for practical scenarios in which the largest solar panel should be selected.

The remainder of this paper is organized as follows. Section 2 introduces the proposed method, including the YOLO network architecture, the loss function of YOLOv8, the solar panel recognition framework, the experimental platform, the training settings, and the model evaluation indicators. Section 3 presents the experimental results, including training performance and convergence analysis, performance optimization via loss weight adjustment, real-time detection, and extended functionality, followed by a detailed discussion. Finally, the conclusions and future research directions are summarized in Section 4.

2. Methods

To effectively recognize and localize solar panels in complicated realistic environments, including OWPT systems, a fast and reliable detection method is required. In this case, YOLO’s fully convolutional design allows it to process high-resolution images efficiently. And, among the various versions of the YOLO, YOLOv8 is selected in this study due to its favorable balance between detection accuracy and architectural flexibility. Compared with earlier YOLO versions, the architectural improvements of YOLOv8, including anchor-free detection heads [13], multi-scale feature fusion [14], and lightweight design, make it suitable for being deployed to realistic systems. These characteristics are particularly suitable for recognizing flexible solar panels that may appear bent or partially occluded conditions. In addition, YOLOv8 provides a mature and stable implementation with efficient training and deployment support, making it well suited for real-time applications in dynamic environments.

2.1. YOLO Network Architecture

The YOLO series is built on a convolutional neural network (CNN) framework, which extracts and learns hierarchical features from input images through successive convolution, normalization, and pooling operations. In fact, recent work has demonstrated that modern CNNs can effectively learn and exploit relationships among hierarchical deep features without explicit prior knowledge, enabling more discriminative representations in complex recognition tasks [15]. The CNN structure enables the model to capture both primitive visual features (such as edges and textures) and semantic features (such as object shapes and categories). Recent surveys have demonstrated that CNNs remain a cornerstone in computer vision due to their strong feature learning capabilities and efficient end-to-end training [16]. In the architecture of YOLO, the image is passed through a series of convolutional and activation layers to encode spatial and contextual information efficiently. This fully convolutional design allows YOLO to perform object localization and classification simultaneously in a single forward pass, resulting in real-time detection performance.

YOLO algorithms segment each input image into a

S \times S

grid, where each grid cell is responsible for predicting objects whose centers fall within it. Every cell outputs B bounding boxes, confidence scores, and C class probabilities [17]. The bounding boxes represent the predicted position and dimensions of potential objects, while confidence scores indicate the model’s certainty about object presence and prediction accuracy. As shown in Figure 1, the YOLO network typically consists of 24 convolutional layers and two fully connected layers. After the fully connected stage, the model produces an output tensor of dimension

S \times S \times (B * 5 + C)

. Each tensor element encodes both geometric and semantic information, and the final detection results are obtained through bounding box regression and class probability estimation based on this tensor data.

2.2. Loss Function of YOLOv8

In deep learning-based object detection, a loss function serves as a quantitative measure of the difference between predicted outputs and ground truth labels. In the YOLO framework, loss functions are used to optimize bounding box localization, object classification, and confidence estimation. The overall detection performance of the model strongly depends on how well these loss components can guide training, especially under challenging conditions such as partial occlusion or deformation [18]. The loss function guides the optimization process by penalizing inaccurate predictions and rewarding correct ones, ensuring that the model learns meaningful representations of the target objects. Earlier research has recognized that careful design and configuration of loss functions in YOLO variants significantly influence model learning and robustness [19]. Hence, an effective loss function directly affects both convergence speed and detection performance, especially in complex visual tasks.

For YOLOv8, the total loss consists of two major components: bounding box loss and classification loss. For bounding box loss, YOLOv8 uses the complete intersection over union (CIoU) [20] and distribution focal loss (DFL) [21] functions, and for classification loss, it employs binary cross-entropy (BCE) [22]. These loss functions enhance object detection performance, especially when dealing with smaller objects. The combination of these complementary terms allows YOLOv8 to achieve high localization precision and robustness in identifying solar panels under complex conditions.

The total loss function of YOLOv8 can be expressed as

L_{total} = λ_{CIoU} L_{CIoU} + λ_{cls} L_{cls} + λ_{dfl} L_{dfl},

(1)

where

λ_{CIoU} = 7.5

,

λ_{cls} = 0.5

, and

λ_{dfl} = 1.5

are weighting factors which are predefined in Ultralytics [23]. These coefficients balance the relative contributions of each term to ensure stable and efficient training across various datasets.

The CIoU loss measures the geometric difference between the predicted and ground-truth bounding boxes, defined as

L_{CIoU} = 1 - IoU + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v,

(2)

where b =

(x, y, w, h)

and

b^{g t} = (x^{g t}, y^{g t}, w^{g t}, h^{g t})

represent the predicted and ground-truth bounding boxes, respectively. IoU denotes the intersection over union between the two boxes, and

ρ (b, b^{g t})

is the Euclidean distance between their center points. The term c is the diagonal length of the smallest enclosing box that contains both b and

b^{g t}

. The variable v measures the difference in aspect ratios and is given by

v = \frac{4}{π^{2}} {(arctan \frac{w^{g t}}{h^{g t}} - arctan \frac{w}{h})}^{2},

(3)

while

α

is a positive weighting factor that balances the impact of v, defined as

α = \frac{v}{(1 - IoU) + v} .

(4)

Equation (2) therefore penalizes poor overlap through the

(1 - IoU)

term, large center displacement through

ρ^{2} / c^{2}

, and inconsistency of the aspect ratio through

α v

, resulting in faster convergence and higher localization accuracy compared to traditional losses based on IoU and GIoU.

The DFL refines bounding box regression by discretizing coordinates into multiple bins and predicting a probability distribution over them

L_{dfl} = - \sum_{i = 0}^{N - 1} q_{i} log (p_{i}),

(5)

where

p_{i}

and

q_{i}

are the predicted and target probabilities for the i-th bin, respectively. This formulation enables more precise localization, particularly for small or partially occluded solar panels.

The classification term uses BCE to evaluate the probability of each object class

L_{cls} = - [y log (p) + (1 - y) log (1 - p)],

(6)

where y is the true label (1 for presence and 0 for absence) and p is the predicted confidence. Equation (6) effectively optimizes the accuracy of the model’s classification by penalizing incorrect predictions.

Finally, minimizing

L_{total}

gives us the box locating the position and size of the solar panel in the considered photo as well as its prediction probability. Note, however, that three components of

L_{total}

, namely

L_{CIoU}

,

L_{cls}

, and

L_{dfl}

, are nonlinear and non-convex. Therefore, the resulting optimization problem is also nonlinear and non-convex. In addition, this programming is a multi-objective optimization problem, making the search for its global optima even more challenging.

To obtain a tractable solution, a fixed set of parameters

λ_{CIoU}

,

λ_{cls}

, and

λ_{dfl}

is utilized, and the commonly used stochastic gradient descent (SGD) algorithm is employed each time the above-mentioned optimization problem is solved. Then, the parameters

λ_{CIoU}

,

λ_{cls}

, and

λ_{dfl}

are varied, and the optimization problem is resolved again. As a result, the Pareto front for the optima of the considering multi-objective optimization problem can be derived.

2.3. Solar Panel Recognition Framework

To provide a clear and intuitive understanding of the proposed solar panel recognition method, this subsection presents the overall framework of the approach. The complete workflow covers the main stages from data acquisition and dataset construction to model training, real-time solar panel recognition, and extended functionality. Figure 2 illustrates the overall flowchart of the proposed framework, highlighting the key processing steps and their logical relationships.

2.3.1. Data Acquisition and Dataset Construction

Due to the limited availability of publicly available datasets for solar panels, we decided to create our own dataset by taking photos of different solar panels we have. In order to achieve the goal of identifying flexible solar panels in complex backgrounds, our dataset includes 852 JPG format photos of different panels in various backgrounds. Among them, the diversity of the constructed dataset is illustrated in Figure 3, which covers variations in lighting conditions, capture environments, and camera distances. Such diversity aims to reflect realistic deployment scenarios in OWPT systems and to reduce the risk of overfitting to a specific environment. Moreover, Figure 4 depicts curved solar panels and Figure 5 exhibits bigger curved and partially shaded solar panels in various complicated environments.

2.3.2. Data Annotation and Augmentation

YOLOv8 officially recommends using Roboflow [24] for dataset management; hence, it was selected to facilitate data handling throughout the entire workflow, including annotation, pre-processing, augmentation and dataset export. In the annotation section, Roboflow provides an interface for drawing bounding boxes and categorizes labels as solar panels. Furthermore, Roboflow supports flexible dataset versioning and automatic format conversion, making it suitable for YOLOv8 applications, such as in the OWPT system. After annotation, the dataset was divided into a training set and a validation set, with 90% of the images allocated for training and 10% used for validation. Before training, all images were uniformly resized to 640 × 640 pixels to meet YOLOv8 input specifications.

Since the original dataset was relatively small for effective training, four augmentation methods were performed randomly on each selected image to obtain additional images. As indicated in Figure 6, the used augmentation methods are horizontal or vertical flipping, hue adjustment, brightness modification, and noise addition. The specific augmentation types and their corresponding variation ranges are summarized in Table 1.

These augmentation approaches help avoid overfitting of the training model, thereby improving the generalization capability of the model. After augmentation, the dataset was increased from 852 to 1610 images, and it can be downloaded as TXT files to suit YOLOv8 training process. The solar panel dataset used in this study has been made publicly available on the Roboflow platform to facilitate reproducibility. (Dataset link: https://app.roboflow.com/saki/solar-cell-p09rs/8 (accessed on 18 January 2026)).

2.3.3. Model Training Based on YOLOv8

Based on the prepared dataset, the YOLOv8 framework is employed to train the solar panel recognition model. The training strategy follows a standard object detection pipeline and aims to optimize the network parameters for accurate and robust solar panel detection, as depicted in Algorithm 1.

Algorithm 1 Model training based on YOLOv8.

Require: Annotated solar panel dataset D
Ensure: Trained YOLOv8 model M
  1: Provide dataset D to the YOLOv8 training framework
  2: Set training hyperparameters (e.g., number of epochs and input image size)
  3: while training not converged do
  4:   Forward propagate input images through the network
  5:   Compute detection loss using the loss function described in Section 2.2
  6:   Backpropagate the loss and update network parameters
  7: end while
  8: Output the trained model M

The trained model is subsequently used for real-time solar panel recognition and extended functionality, as described in the following subsections.

2.3.4. Real-Time Solar Panel Detection and Extended Functionality

This subsection describes the real-time solar panel recognition process based on the trained YOLOv8 model, together with an extended functionality for estimating the relative size of detected solar cells. A video stream is continuously captured and processed frame by frame. For each input frame, the trained model performs object detection in real time, and the detection results are further analyzed to identify solar cells and estimate their relative sizes. This process enables real-time recognition and provides additional information for OWPT systems. A summary of this process is provided in Algorithm 2.

Algorithm 2 Real-time solar panel detection and extended functionality.

Require: Trained YOLOv8 model M, real-time video stream S
Ensure: Real-time detection results with relative size estimation

1:: Initialize video stream S
2:: Load trained model M
3:: while video stream S is active do
4:: Capture an input frame I
5:: Perform object detection on I using model M
6:: Extract detection results and bounding boxes
7:: if no target objects are detected then
8:: Output frame with real-time performance indicators
9:: continue
10:: end if
11:: Identify detections corresponding to solar cells
12:: Compute the area of each detected solar cell
13:: if multiple solar cells are detected then
14:: Determine the solar cell with the maximum area
15:: Mark the corresponding detection as the larger target
16:: end if
17:: Output detection results and relative size information in real time
18:: end while

2.4. Experimental Platform and Training Settings

2.4.1. Hardware and Software Environment

The training process of the presented YOLOv8 model is conducted on a computer with a Windows 11 operating system. The CPU is 13th Gen Intel(R) Core(TM) i7-13650HX with 16 GB memory. The GPU is NVIDIA GeForce RTX 4060 with 8 GB video memory. The software for deep learning running is PyTorch 2.6.0 available with Cuda 11.8 and CuDNN 9.0.8.

2.4.2. Training Hyperparameter Settings

To ensure experimental reproducibility and provide a clear description of the training environment, the experimental platform and key training settings used in this study are summarized in Table 2. The YOLOv8n model was trained using COCO-pretrained weights, and the main hyperparameters, including the optimizer, learning rate, batch size, input image size, and the loss weight, were configured according to the default and recommended settings of the YOLOv8 framework.

The optimizer was automatically selected by the YOLOv8 framework, and the initial learning rate was set to 0.01 with a warm-up strategy to ensure stable convergence during training. The loss weight coefficients were configured based on the default settings of YOLOv8 and further adjusted in the performance optimization experiments described in Section 3.2.

2.5. Model Evaluation Indicators

To quantitatively evaluate the performance of the proposed YOLOv8-based solar panel recognition model, several standard object detection metrics are employed, including precision (P), recall (R), F1-score, average precision (AP), and mean average precision (mAP). These indicators collectively assess the model’s detection accuracy, completeness, and overall effectiveness.

Precision represents the proportion of correctly identified positive samples among all predicted positives,

P = \frac{T P}{T P + F P},

(7)

where

T P

denotes true positives and

F P

represents false positives. A higher precision value indicates that fewer non-solar panel regions are incorrectly detected as solar panels.

Recall measures the proportion of correctly detected positive samples among all actual positives,

R = \frac{T P}{T P + F N},

(8)

where

F N

denotes false negatives. A higher recall value means the model can successfully detect more solar panel targets without omission.

The F1-score combines precision and recall into a single harmonic mean, balancing both detection accuracy and completeness

F 1 = 2 \times \frac{P \times R}{P + R} .

(9)

A higher F1-score indicates that the model achieves a better trade-off between precision and recall.

Average precision is calculated as the area under the precision–recall (P–R) curve,

A P = \int_{0}^{1} P (R) d R,

(10)

where

P (R)

is the precision at a given recall level. AP effectively evaluates how well the model performs across different confidence thresholds.

Finally, mean average precision represents the average AP value over all object classes,

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i},

(11)

where N is the number of classes. In this work,

N = 1

, corresponding to solar panels. To comprehensively evaluate the detection accuracy and localization precision of the proposed model, multiple standard object detection metrics are adopted. Specifically, both mAP@0.5 and mAP@0.5:0.95 are reported. The mAP@0.5 metric measures detection performance under a relatively loose localization criterion, where a predicted bounding box is considered correct if the intersection over union (IoU) exceeds 0.5. In contrast, mAP@0.5:0.95 provides a more stringent and comprehensive evaluation by averaging mAP values over IoU thresholds ranging from 0.5 to 0.95 with a step size of 0.05. This metric is widely regarded as a more reliable indicator of localization accuracy. Reporting both metrics allows a balanced assessment of detection robustness and spatial precision.

In addition to mAP-based metrics, the Average intersection over union (IoU) is adopted to further evaluate localization accuracy. IoU measures the spatial overlap between the predicted bounding box and the corresponding ground-truth annotation. The Average IoU is computed by averaging IoU values over all correctly detected instances in the test set. Compared with mAP, this metric provides a more intuitive assessment of bounding box alignment quality, which is particularly relevant for realistic applications, e.g., OWPT systems where the transmitter–receiver alignment depends on the precise localization of the receiver.

3. Results

3.1. Training Performance and Convergence Analysis

The introduced YOLOv8 model was trained with an input image size of 640 × 640 pixels for 100 epochs. The training results are then shown in Figure 7.

As observed in Figure 7a,b, the CIoU loss and DFL loss, shown in Equations (2) and (5), gradually decreases as the number of training epochs increases. This reduction in loss indicates that the model is learning to predict more accurate bounding boxes, which improves localization performance. Meanwhile, the mAP@0.5 and mAP@0.5:0.95 curve depicted in Equation (11), which are shown in Figure 7c,d, steadily increases with training epochs, reflecting improved detection accuracy and overall model performance.

To further evaluate the detection reliability of the trained model, the F1-confidence curve and the precision–recall (P–R) curve are generated, as shown in Figure 8 and Figure 9, respectively. The F1-confidence curve in Figure 8 reflects the sensitivity of the detection performance to different confidence thresholds by jointly considering precision and recall. It can be observed that the F1-score remains consistently high over a wide range of confidence thresholds and reaches a maximum value of 0.97. This behavior indicates that the proposed model maintains a stable balance between false positives and false negatives.

The P–R curve in Figure 9 provides further insight into the trade-off between precision and recall under varying confidence thresholds. The curve exhibits a consistently high precision across most recall levels, suggesting that the model effectively suppresses false detections while preserving high detection sensitivity. As a result, the model achieves an mAP@0.5 of 0.991 and an mAP@0.5:0.95 of 0.899, which quantitatively confirms its excellent localization and classification capability.

Overall, the F1-confidence and P–R curves provide complementary scientific evidence that the proposed method not only achieves high accuracy but also maintains stable and reliable detection behavior across different operating thresholds, which are essential for real-time solar panel recognition in practical systems.

In terms of localization accuracy, the proposed model achieves an average IoU of 0.936 on the test set, indicating a high degree of spatial overlap between predicted bounding boxes and ground-truth annotations. This result demonstrates that the detected solar panel areas are well-aligned with their actual positions, which is critical for realistic contexts, e.g., for the optical beam alignment in OWPT systems.

3.2. Determination of the Optima Set of Loss Weights Based on the Pareto Front

In the loss function defined in Equation (1), the weighting factors

λ_{CIoU}

,

λ_{cls}

, and

λ_{dfl}

are set to 7.5, 0.5, and 1.5 by default in the Ultralytics implementation. While these values have been shown to perform well on large-scale multi-class datasets, they may not represent the optimal configuration for the single-class solar panel detection task considered in this study. Moreover, as mentioned before, the loss miminization problem is a multi-objective optimization problem in nature. Thus, the three loss weights in the loss function are varied to investigate the Pareto front. Accordingly, a series of experiments were conducted by adjusting the loss weights and evaluating the corresponding changes in mAP@0.5 and mAP@0.5:0.95. The training results under different loss weight configurations are summarized in Table 3. It should be clarified that the loss-weight tuning process was conducted exclusively using the validation set, while the test set was strictly reserved for final performance evaluation. This separation was maintained to avoid any potential data leakage and to ensure a fair assessment of generalization performance.

Among the configurations of Table 3, Config-A to Config-C show additional ablation studies to evaluate the individual contributions of the key components in the proposed method. Moreover, Config-D to Config-F show the performance of the combination of two components. As the proposed model is designed for a single-class detection task, the contribution of the classification loss is relatively limited. Therefore, in Table 3, Config-G to Config-I demonstrate that a moderate reduction in

λ_{cls}

leads to an improvement both in mAP@0.5 and mAP@0.5:0.95, while increasing

λ_{cls}

results in a noticeable degradation both in mAP@0.5 and mAP@0.5:0.95. In addition, further experiments were conducted with

λ_{cls}

fixed at 0.2, while adjusting the ratio between

λ_{ciou}

and

λ_{dfl}

. The results indicate that mAP@0.5 remains relatively stable across different weight ratios. However, mAP@0.5:0.95 experiences a slight fluctuation between Config-J and Config-P. In Config-K, mAP@0.5:0.95 reached the highest value of 0.904.

Consequently, Config-K was selected as the final loss weight configuration, achieving the highest mAP@0.5 of 0.994 and an mAP@0.5:0.95 of 0.904. Compared to the default setting, it represents an improvement of approximately 0.03 in mAP@0.5 and 0.05 in mAP@0.5:0.95. Although the observed mAP improvements obtained through loss-weight adjustment are relatively modest, these gains are consistently observed across multiple training runs and confidence thresholds. Such incremental improvements are common in mature object detection frameworks and can be practically meaningful for real-time applications.

3.3. Real-Time Detection and Extended Functionality

The proposed model successfully achieves real-time detection of flexible solar panels. Furthermore, an extended functionality has been added: when two or more solar cells appear simultaneously, the program will recognize the biggest solar cell and mark the bounding box in green. As shown in Figure 10, the model maintains an average inference speed of 32 frames per second (FPS) on the test platform. Figure 10a demonstrates the recognition of smaller solar panels, whereas Figure 10b shows an easily confused item with the solar panel. Figure 10c,d depict the detection of a flexible solar panel, under different bending conditions. Figure 10e,f illustrate the detection of a partially shaded solar panel, which also show the functionality of recognizing the bigger solar panel. It should be noted that “solar cell (bigger)” is not defined as an independent class during training. Instead, it is determined during post-processing based on the relative bounding box area among detected solar cells within the same frame. This real-time detection with the extended functionality shows the capability of the proposed deep learning approach for realistic systems. For example, in the practical deployment of OWPT systems, the precise and continuous alignment of the optical transmitter and solar panel receiver is essential for stable and high-efficiency wireless power transfer.

4. Conclusions and Outlook

4.1. Conclusions

This study presents a deep learning-based method to recognize flexible solar panels embedded in complicated environments. Employing the YOLOv8 object detection framework, the proposed approach achieves accurate and robust recognition of real flexible solar panels, including those under bent and partially shaded scenarios, which are often challenging for traditional image processing methods. In addition, this study further improves the detection accuracy by investigating the Pareto front to find the best set of loss weights. The introduced model achieves a high detection accuracy together with a precise localization performance, reaching an mAP@0.5 of 99.4% and an mAP@0.5:0.95 of 90.4% with a real-time inference performance of 32 FPS. Moreover, an extended real-time functionality is introduced, enabling the model to identify the sizes of detected solar cells, hence providing additional information for realistic applications, e.g., the adaptive OWPT alignment. All of the above-mentioned features demonstrate the high capability of the proposed deep learning approach for practical deployment in dynamic realistic environments.

4.2. Outlook

Although the presented deep learning model has a very good recognition performance, several drawbacks still exist. First, the dataset used in this study is relatively small, which may limit generalization to more diverse environmental conditions, largely due to the lack of publicly available large-scale solar panel datasets. Second, the performane of the extended real-time functionality is limited for curved solar cells, since the relative size estimation is purely based on the bounding box area and is independent of the solar panel shape. In addition, although the proposed YOLOv8-based framework achieves real-time inference in the current experimental setup, its performance on resource-constrained or edge devices may be affected by hardware limitations. Hence, the future work will expand the dataset scale, further improve the extended functionality of real-time recognition, and explore model optimization strategies to enhance deployment efficiency on embedded hardware platforms.

It should also be noted that geometric distortions such as perspective transformation and in-plane rotation were not explicitly included in the data augmentation process. This design choice was made because the primary variability of flexible solar panels arises from the physical deformation (e.g., bending and partial curvature) rather than arbitrary rotations. Moreover, the image acquisition setup in realistic contexts, e.g., OWPT systems and drone-assisted inspection, usually maintains a relatively constrained viewing angle, reducing the necessity of aggressive geometric augmentation. Nevertheless, incorporating perspective and rotation-based augmentation remains a promising direction for future work to further improve the proposed approach’s robustness under more unconstrained deployment conditions.

Moreover, the current study focuses on single-class recognition of flexible solar panels, but the proposed framework is inherently extensible to multi-class detection. YOLOv8 naturally supports multi-class learning by introducing additional object categories during dataset annotation and training, without requiring changes to the network architecture. In practical deployment scenarios, visually similar objects such as metallic plates or reflective surfaces may act as hard-negative samples and potentially cause false detections. These confusing background objects can be explicitly incorporated as negative classes to improve discrimination capability through hard-negative mining.

In addition, the potential risk of overfitting should be carefully discussed, particularly given the single-class nature of the current recognition task. Although the data augmentation and diverse acquisition conditions (Figure 3) were adopted to improve robustness, the absence of explicit cross-scene or cross-environment validation may still limit the generalization ability of the model when deployed in unseen scenarios. This risk is further amplified by the relatively limited dataset size, which may bias the model toward scene-specific visual patterns rather than intrinsic panel features. Nevertheless, the consistent convergence behavior and stable performance across training and validation sets indicate that severe overfitting is unlikely in the current setting. In future work, cross-scene evaluation protocols and validation strategies with leaving one environment out will be introduced to assess generalization capability under unseen deployment conditions.

Beyond panel-level detection, recent studies have explored more fine-grained photovoltaic analysis using deep learning, such as fault inspection, cell-level segmentation, and performance estimation. Representative examples include segmentation-oriented frameworks such as SEiPV-Net [25], which demonstrate the effectiveness of deep neural networks in extracting structural and semantic information from photovoltaic modules. While the present work deliberately focuses on robust and real-time panel-level recognition, these segmentation-based approaches highlight promising directions for future extensions.

Author Contributions

Conceptualization, M.S. and D.H.N.; Methodology, M.S. and D.H.N.; Software, M.S.; Validation, M.S.; Formal analysis, M.S.; Data curation, M.S.; Writing—original draft, M.S.; Writing—review & editing, D.H.N.; Supervision, D.H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nguyen, D.H. Optical wireless power transfer for implanted and wearable devices. Sustainability 2023, 15, 8146. [Google Scholar] [CrossRef]
Nguyen, D.H. Dynamic Optical Wireless Power Transfer for Electric Vehicles. IEEE Access 2023, 11, 2787–2795. [Google Scholar] [CrossRef]
Nguyen, D.H.; Tumen-Ulzii, G.; Matsushima, T.; Adachi, C. Performance Analysis of a Perovskite-Based Thing-to-Thing Optical Wireless Power Transfer System. IEEE Photonics J. 2022, 14, 6213208. [Google Scholar] [CrossRef]
Nguyen, D.H.; Qin, C.; Matsushima, T.; Adachi, C. Towards thing-to-thing optical wireless power transfer: Metal halide perovskite transceiver as an enabler. Front. Energy Res. 2021, 9, 679125. [Google Scholar] [CrossRef]
Wong, Y.L.; Shibui, S.; Koga, M.; Hayashi, S.; Uchida, S. Optical wireless power transmission using a GaInP power converter cell under high-power 635 nm laser irradiation of 53.5 W/cm². Energies 2022, 15, 3690. [Google Scholar] [CrossRef]
Zhao, M.; Miyamoto, T. Optimization for compact and high output LED-based optical wireless power transmission system. Photonics 2021, 9, 14. [Google Scholar] [CrossRef]
Zhao, M.; Miyamoto, T. 1 W high performance LED-array based optical wireless power transmission system for IoT terminals. Photonics 2022, 9, 576. [Google Scholar] [CrossRef]
Asaba, K.; Miyamoto, T. System level requirement analysis of beam alignment and shaping for optical wireless power transmission system. Photonics 2022, 9, 452. [Google Scholar] [CrossRef]
Kang, J.; Sun, L.; Zhou, Y.; Bai, Y. Enhancing alignment accuracy in laser wireless power transmission systems using integrated target detection and perturbation-observation method. Photonics 2024, 11, 1094. [Google Scholar] [CrossRef]
Neha, F.; Bhati, D.; Shukla, D.K.; Amiruzzaman, M. From classical techniques to convolution-based models: A review of object detection algorithms. In Proceedings of the 2024 IEEE 6th International Conference on Image Processing, Applications and Systems (IPAS), Lyon, France, 9–11 January 2025; pp. 1–6. [Google Scholar]
Zhao, Z.-Q.; Zheng, P.; Xu, S.-T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
Cai, Z.; Zhou, K.; Liao, Z. A Systematic Review of YOLO-Based Object Detection in Medical Imaging: Advances, Challenges, and Future Directions. Comput. Mater. Contin. 2025, 85, 2255–2303. [Google Scholar] [CrossRef]
Ryu, J.; Kwak, D.; Choi, S. YOLOv8 with post-processing for small object detection enhancement. Appl. Sci. 2025, 15, 7275. [Google Scholar] [CrossRef]
Hermens, F. Automatic object detection for behavioural research using YOLOv8. Behav. Res. Methods 2024, 56, 7307–7330. [Google Scholar] [CrossRef] [PubMed]
Xiong, S.; Tan, Y.; Wang, G.; Yan, P.; Xiang, X. Learning feature relationships in CNN model via relational embedding convolution layer. Neural Netw. 2024, 179, 106510. [Google Scholar] [CrossRef] [PubMed]
Krichen, M. Convolutional neural networks: A survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-Time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Terven, J.; Cordova-Esparza, D.-M.; Romero-González, J.-A.; Ramírez-Pedraza, A.; Chávez-Urbiola, E.A. A comprehensive survey of loss functions and metrics in deep learning. Artif. Intell. Rev. 2025, 58, 195. [Google Scholar] [CrossRef]
Shao, Y.; Zhang, D.; Chu, H.; Zhang, X.; Rao, Y. A review of YOLO object detection based on deep learning. J. Electron. Inf. Technol. 2022, 44, 3697–3708. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In Advances in Neural Information Processing Systems 33, Proceedings of the Annual Conference on Neural Information Processing Systems 2020 (NeurIPS 2020), Virtual, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 21002–21012. [Google Scholar]
Wu, T.; Dong, Y. YOLO-SE: Improved YOLOv8 for remote sensing object detection and recognition. Appl. Sci. 2023, 13, 12977. [Google Scholar] [CrossRef]
Ultralytics. YOLOv8. 2023. Available online: https://github.com/autogyro/yolo-V8 (accessed on 3 February 2026).
Roboflow. Available online: https://roboflow.com/ (accessed on 3 February 2026).
Eesaar, H.; Joe, S.; Rehman, M.U.; Jang, Y.; Chong, K.T. SEiPV-Net: An efficient deep learning framework for autonomous multi-defect segmentation in electroluminescence images of solar photovoltaic modules. Energies 2023, 16, 7726. [Google Scholar] [CrossRef]

Figure 1. YOLO network architecture.

Figure 2. Flowchart of the proposed framework.

Figure 3. Illustration of dataset diversity across different scenarios: (a,b) lighting conditions with high and low illumination; (c,d) indoor and outdoor environments; (e,f) image capture at short and long camera distances.

Figure 4. Curved solar panels in different environments: (a) flat and being on top of a copy machine; (b) bent inward; (c) bent outward; (d) flat with a side view and being near a game controller.

Figure 5. Bigger curved solar panels with partially shaded conditions: (a) with other smaller solar panels; (b) bent inward; (c) bent outward and being near other objects; (d) partially shaded by an object being with other objects.

Figure 6. Different data augmentation methods: (a) origin; (b) flip; (c) hue; (d) brightness (darker); (e) brightness (brighter); (f) noise addition.

Figure 7. Training performances of the YOLOv8 model for solar panel recognition, where the dotted curves show the mean values: (a) CIoU loss; (b) DFL loss; (c) mAP@0.5 curve; (d) mAP@0.5:0.95 curve.

Figure 8. F1-confidence curve.

Figure 9. Precision–recall curve.

Figure 10. Real-time detection of flexible solar panels: (a) a small solar panel; (b) distinguished with a similar item; (c,d) bent solar panel; (e,f) partially shaded solar panel.

Table 1. Data augmentation strategies and parameter settings.

Augmentation Type	Parameter Setting
Flip	Horizontal/Vertical
Hue adjustment	$- 25^{\circ}$ to $+ 25^{\circ}$
Brightness adjustment	$- 30 %$ to $+ 30 %$
Noise addition	Up to 5% of image pixels

Table 2. Setup for the training process.

Parameter	Value
Model architecture	YOLOv8n
Pretrained weights	COCO-pretrained
Optimizer	Auto (YOLOv8 default)
Initial learning rate	0.01
Warm-up epochs	3
Batch size	16
Number of epochs	100
Input image size	640 × 640
IoU threshold	0.7
Bounding box loss weight (CIoU)	7.5
Classification loss weight (CLS)	0.5
Distribution focal loss weight (DFL)	1.5

Table 3. Comparison of mAP@0.5 and mAP@0.5:0.95 under different loss weight configurations.

Configuration	$λ_{CIoU}$	$λ_{cls}$	$λ_{dfl}$	mAP@0.5	mAP@0.5:0.95
Default	7.5	0.5	1.5	0.991	0.899
Config-A	7.5	0.0	0.0	0.625	0.461
Config-B	0.0	0.5	0.0	0.847	0.478
Config-C	0.0	0.0	1.5	0.774	0.598
Config-D	7.5	0.0	1.5	0.658	0.476
Config-E	0.0	0.5	1.5	0.991	0.895
Config-F	7.5	0.5	0.0	0.991	0.898
Config-G	7.5	0.2	1.5	0.994	0.901
Config-H	7.5	1.0	1.5	0.983	0.883
Config-I	7.5	1.5	1.5	0.985	0.879
Config-J	7.5	0.2	3.0	0.993	0.901
Config-K	5.0	0.2	3.0	0.994	0.904
Config-L	5.0	0.2	4.0	0.993	0.898
Config-M	2.5	0.2	5.0	0.993	0.893
Config-N	1.5	0.2	7.5	0.993	0.893
Config-O	15.0	0.2	1.0	0.990	0.891
Config-P	15.0	0.2	0.5	0.990	0.891

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, M.; Nguyen, D.H. Flexible Solar Panel Recognition Using Deep Learning. Energies 2026, 19, 872. https://doi.org/10.3390/en19040872

AMA Style

Sun M, Nguyen DH. Flexible Solar Panel Recognition Using Deep Learning. Energies. 2026; 19(4):872. https://doi.org/10.3390/en19040872

Chicago/Turabian Style

Sun, Mingyang, and Dinh Hoa Nguyen. 2026. "Flexible Solar Panel Recognition Using Deep Learning" Energies 19, no. 4: 872. https://doi.org/10.3390/en19040872

APA Style

Sun, M., & Nguyen, D. H. (2026). Flexible Solar Panel Recognition Using Deep Learning. Energies, 19(4), 872. https://doi.org/10.3390/en19040872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Flexible Solar Panel Recognition Using Deep Learning

Abstract

1. Introduction

2. Methods

2.1. YOLO Network Architecture

2.2. Loss Function of YOLOv8

2.3. Solar Panel Recognition Framework

2.3.1. Data Acquisition and Dataset Construction

2.3.2. Data Annotation and Augmentation

2.3.3. Model Training Based on YOLOv8

2.3.4. Real-Time Solar Panel Detection and Extended Functionality

2.4. Experimental Platform and Training Settings

2.4.1. Hardware and Software Environment

2.4.2. Training Hyperparameter Settings

2.5. Model Evaluation Indicators

3. Results

3.1. Training Performance and Convergence Analysis

3.2. Determination of the Optima Set of Loss Weights Based on the Pareto Front

3.3. Real-Time Detection and Extended Functionality

4. Conclusions and Outlook

4.1. Conclusions

4.2. Outlook

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI