1. Introduction
Solar power generation is well-known as a very important component among various clean energy generation approaches to reduce greenhouse gas (GHG) emissions in the power and energy sector, thereby contributing to the worldwide combat against climate change. In solar farms, solar panels are covered by dust and are partially shaded by other things from the surrounding open environments, which could significantly reduce their energy conversion efficiency. Thus, regularly scheduled cleaning tasks are needed, which are often costly. To ease such operation and maintenance costs, AI-equipped devices, e.g., drones or robots, are being employed to reduce labor costs. This, however, puts another challenge on the precise recognition of solar panels to save on spraying water and to extend the working time of battery-powered drones and robots.
On the other hand, smaller-scale solar panels are attached on internet of thing (IoT) and consumer electronic devices for harvesting artificial light. For example, in a recent emerging application called optical wireless power transfer (OWPT), solar panels serve as energy receivers, while LEDs or lasers serve as energy transmitters. This type of system has been employed in implanted and wearable devices [
1], and in future EV wireless charging [
2]. It can also be generalized to obtain the concept of thing-to-thing OWPT systems, providing a sustainable and green energy technology approach to society [
3,
4]. In such systems, the overall system efficiency is strongly and directly dependent on the alignment between optical transmitters and solar panel receivers; see, e.g., [
5]. Recent studies have demonstrated the feasibility of high-power and compact OWPT systems for practical IoT applications, and the performance of OWPT systems is highly dependent on accurate alignment and system optimization [
6,
7]. Furthermore, in realistic contexts, solar panels are subject to partial shading caused by dynamic changes in the surrounding environments. Therefore, a challenge arises in precisely recognizing and distinguishing solar panels from other objects in complex environments.
To cope with the challenge mentioned above, the research in [
8] conducted a system-level analysis of beam alignment in OWPT systems, focusing on the interaction between optical transmitters and photovoltaic receivers, and proposed strategies to optimize beam design and receiver placement to increase alignment tolerance and overall power efficiency. Kang et al. [
9] proposed a hybrid target detection and perturbation observation method to improve alignment accuracy in dynamic OWPT systems. The approach continuously adjusts the transmitter orientation based on feedback from the receiver, enabling high-precision automatic alignment without requiring additional optical markers, and experimental results demonstrated enhanced energy conversion efficiency under varying conditions.
Meanwhile, a few studies have explored deep learning models to detect solar panels. Traditional image processing techniques often rely on hand-crafted features and struggle to handle deformable, occluded, or dynamically moving targets under complex backgrounds [
10,
11]. In contrast, deep learning methods offer several advantages: they offer notable advantages in image recognition tasks, including automatic feature extraction, high generalization, and broad application potential in medical imaging and other domains; demonstrate strong generalization in varying illumination and complex environments; and achieve high detection speed once trained [
12]. Such capabilities make deep learning particularly suitable for visual recognition tasks where traditional methods cannot provide reliable solutions.
Building on these advantages, the YOLO (You Only Look Once) series, specifically YOLOv8, has been selected for this paper due to its high precision, real-time performance, and robustness in detecting small, occluded, or deformable objects. By integrating YOLOv8, solar panel recognition has the potential to perform more effectively, facilitating adaptive alignment and improved energy transfer in OWPT systems in challenging operational conditions.
Accordingly, the contributions of the current study are as follows.
A deep learning method for recognizing flexible solar panels is proposed, which is based on the YOLOv8 object detection framework, under challenging conditions such as bending and partial shading, which are difficult to address using traditional image processing techniques.
The loss function is minimized by solving a multi-objective, non-convex, nonlinear optimization problem in which the loss weights are varied. As such, the Pareto front of optima can be obtained, based on which the best set of parameters can be derived.
Real-time inference capability while maintaining high recognition accuracy is achieved in the proposed method, hence demonstrating its high potential for practical deployment in dynamic environments.
An extended real-time functionality is introduced to estimate the relative size of multiple detected solar cells, which provides valuable information for practical scenarios in which the largest solar panel should be selected.
The remainder of this paper is organized as follows.
Section 2 introduces the proposed method, including the YOLO network architecture, the loss function of YOLOv8, the solar panel recognition framework, the experimental platform, the training settings, and the model evaluation indicators.
Section 3 presents the experimental results, including training performance and convergence analysis, performance optimization via loss weight adjustment, real-time detection, and extended functionality, followed by a detailed discussion. Finally, the conclusions and future research directions are summarized in
Section 4.
2. Methods
To effectively recognize and localize solar panels in complicated realistic environments, including OWPT systems, a fast and reliable detection method is required. In this case, YOLO’s fully convolutional design allows it to process high-resolution images efficiently. And, among the various versions of the YOLO, YOLOv8 is selected in this study due to its favorable balance between detection accuracy and architectural flexibility. Compared with earlier YOLO versions, the architectural improvements of YOLOv8, including anchor-free detection heads [
13], multi-scale feature fusion [
14], and lightweight design, make it suitable for being deployed to realistic systems. These characteristics are particularly suitable for recognizing flexible solar panels that may appear bent or partially occluded conditions. In addition, YOLOv8 provides a mature and stable implementation with efficient training and deployment support, making it well suited for real-time applications in dynamic environments.
2.1. YOLO Network Architecture
The YOLO series is built on a convolutional neural network (CNN) framework, which extracts and learns hierarchical features from input images through successive convolution, normalization, and pooling operations. In fact, recent work has demonstrated that modern CNNs can effectively learn and exploit relationships among hierarchical deep features without explicit prior knowledge, enabling more discriminative representations in complex recognition tasks [
15]. The CNN structure enables the model to capture both primitive visual features (such as edges and textures) and semantic features (such as object shapes and categories). Recent surveys have demonstrated that CNNs remain a cornerstone in computer vision due to their strong feature learning capabilities and efficient end-to-end training [
16]. In the architecture of YOLO, the image is passed through a series of convolutional and activation layers to encode spatial and contextual information efficiently. This fully convolutional design allows YOLO to perform object localization and classification simultaneously in a single forward pass, resulting in real-time detection performance.
YOLO algorithms segment each input image into a
grid, where each grid cell is responsible for predicting objects whose centers fall within it. Every cell outputs
B bounding boxes, confidence scores, and
C class probabilities [
17]. The bounding boxes represent the predicted position and dimensions of potential objects, while confidence scores indicate the model’s certainty about object presence and prediction accuracy. As shown in
Figure 1, the YOLO network typically consists of 24 convolutional layers and two fully connected layers. After the fully connected stage, the model produces an output tensor of dimension
. Each tensor element encodes both geometric and semantic information, and the final detection results are obtained through bounding box regression and class probability estimation based on this tensor data.
2.2. Loss Function of YOLOv8
In deep learning-based object detection, a loss function serves as a quantitative measure of the difference between predicted outputs and ground truth labels. In the YOLO framework, loss functions are used to optimize bounding box localization, object classification, and confidence estimation. The overall detection performance of the model strongly depends on how well these loss components can guide training, especially under challenging conditions such as partial occlusion or deformation [
18]. The loss function guides the optimization process by penalizing inaccurate predictions and rewarding correct ones, ensuring that the model learns meaningful representations of the target objects. Earlier research has recognized that careful design and configuration of loss functions in YOLO variants significantly influence model learning and robustness [
19]. Hence, an effective loss function directly affects both convergence speed and detection performance, especially in complex visual tasks.
For YOLOv8, the total loss consists of two major components: bounding box loss and classification loss. For bounding box loss, YOLOv8 uses the complete intersection over union (CIoU) [
20] and distribution focal loss (DFL) [
21] functions, and for classification loss, it employs binary cross-entropy (BCE) [
22]. These loss functions enhance object detection performance, especially when dealing with smaller objects. The combination of these complementary terms allows YOLOv8 to achieve high localization precision and robustness in identifying solar panels under complex conditions.
The total loss function of YOLOv8 can be expressed as
where
,
, and
are weighting factors which are predefined in Ultralytics [
23]. These coefficients balance the relative contributions of each term to ensure stable and efficient training across various datasets.
The CIoU loss measures the geometric difference between the predicted and ground-truth bounding boxes, defined as
where
b =
and
represent the predicted and ground-truth bounding boxes, respectively. IoU denotes the intersection over union between the two boxes, and
is the Euclidean distance between their center points. The term
c is the diagonal length of the smallest enclosing box that contains both
b and
. The variable
v measures the difference in aspect ratios and is given by
while
is a positive weighting factor that balances the impact of
v, defined as
Equation (
2) therefore penalizes poor overlap through the
term, large center displacement through
, and inconsistency of the aspect ratio through
, resulting in faster convergence and higher localization accuracy compared to traditional losses based on IoU and GIoU.
The DFL refines bounding box regression by discretizing coordinates into multiple bins and predicting a probability distribution over them
where
and
are the predicted and target probabilities for the
i-th bin, respectively. This formulation enables more precise localization, particularly for small or partially occluded solar panels.
The classification term uses BCE to evaluate the probability of each object class
where
y is the true label (1 for presence and 0 for absence) and
p is the predicted confidence. Equation (
6) effectively optimizes the accuracy of the model’s classification by penalizing incorrect predictions.
Finally, minimizing gives us the box locating the position and size of the solar panel in the considered photo as well as its prediction probability. Note, however, that three components of , namely , , and , are nonlinear and non-convex. Therefore, the resulting optimization problem is also nonlinear and non-convex. In addition, this programming is a multi-objective optimization problem, making the search for its global optima even more challenging.
To obtain a tractable solution, a fixed set of parameters , , and is utilized, and the commonly used stochastic gradient descent (SGD) algorithm is employed each time the above-mentioned optimization problem is solved. Then, the parameters , , and are varied, and the optimization problem is resolved again. As a result, the Pareto front for the optima of the considering multi-objective optimization problem can be derived.
2.3. Solar Panel Recognition Framework
To provide a clear and intuitive understanding of the proposed solar panel recognition method, this subsection presents the overall framework of the approach. The complete workflow covers the main stages from data acquisition and dataset construction to model training, real-time solar panel recognition, and extended functionality.
Figure 2 illustrates the overall flowchart of the proposed framework, highlighting the key processing steps and their logical relationships.
2.3.1. Data Acquisition and Dataset Construction
Due to the limited availability of publicly available datasets for solar panels, we decided to create our own dataset by taking photos of different solar panels we have. In order to achieve the goal of identifying flexible solar panels in complex backgrounds, our dataset includes 852 JPG format photos of different panels in various backgrounds. Among them, the diversity of the constructed dataset is illustrated in
Figure 3, which covers variations in lighting conditions, capture environments, and camera distances. Such diversity aims to reflect realistic deployment scenarios in OWPT systems and to reduce the risk of overfitting to a specific environment. Moreover,
Figure 4 depicts curved solar panels and
Figure 5 exhibits bigger curved and partially shaded solar panels in various complicated environments.
2.3.2. Data Annotation and Augmentation
YOLOv8 officially recommends using Roboflow [
24] for dataset management; hence, it was selected to facilitate data handling throughout the entire workflow, including annotation, pre-processing, augmentation and dataset export. In the annotation section, Roboflow provides an interface for drawing bounding boxes and categorizes labels as solar panels. Furthermore, Roboflow supports flexible dataset versioning and automatic format conversion, making it suitable for YOLOv8 applications, such as in the OWPT system. After annotation, the dataset was divided into a training set and a validation set, with 90% of the images allocated for training and 10% used for validation. Before training, all images were uniformly resized to 640 × 640 pixels to meet YOLOv8 input specifications.
Since the original dataset was relatively small for effective training, four augmentation methods were performed randomly on each selected image to obtain additional images. As indicated in
Figure 6, the used augmentation methods are horizontal or vertical flipping, hue adjustment, brightness modification, and noise addition. The specific augmentation types and their corresponding variation ranges are summarized in
Table 1.
These augmentation approaches help avoid overfitting of the training model, thereby improving the generalization capability of the model. After augmentation, the dataset was increased from 852 to 1610 images, and it can be downloaded as TXT files to suit YOLOv8 training process. The solar panel dataset used in this study has been made publicly available on the Roboflow platform to facilitate reproducibility. (Dataset link:
https://app.roboflow.com/saki/solar-cell-p09rs/8 (accessed on 18 January 2026)).
2.3.3. Model Training Based on YOLOv8
Based on the prepared dataset, the YOLOv8 framework is employed to train the solar panel recognition model. The training strategy follows a standard object detection pipeline and aims to optimize the network parameters for accurate and robust solar panel detection, as depicted in Algorithm 1.
| Algorithm 1 Model training based on YOLOv8. |
Require: Annotated solar panel dataset D Ensure: Trained YOLOv8 model M 1: Provide dataset D to the YOLOv8 training framework 2: Set training hyperparameters (e.g., number of epochs and input image size) 3: while training not converged do 4: Forward propagate input images through the network 5: Compute detection loss using the loss function described in Section 2.2 6: Backpropagate the loss and update network parameters 7: end while 8: Output the trained model M |
The trained model is subsequently used for real-time solar panel recognition and extended functionality, as described in the following subsections.
2.3.4. Real-Time Solar Panel Detection and Extended Functionality
This subsection describes the real-time solar panel recognition process based on the trained YOLOv8 model, together with an extended functionality for estimating the relative size of detected solar cells. A video stream is continuously captured and processed frame by frame. For each input frame, the trained model performs object detection in real time, and the detection results are further analyzed to identify solar cells and estimate their relative sizes. This process enables real-time recognition and provides additional information for OWPT systems. A summary of this process is provided in Algorithm 2.
| Algorithm 2 Real-time solar panel detection and extended functionality. |
Require: Trained YOLOv8 model M, real-time video stream S Ensure: Real-time detection results with relative size estimation
- 1:
Initialize video stream S - 2:
Load trained model M - 3:
while video stream S is active do - 4:
Capture an input frame I - 5:
Perform object detection on I using model M - 6:
Extract detection results and bounding boxes - 7:
if no target objects are detected then - 8:
Output frame with real-time performance indicators - 9:
continue - 10:
end if - 11:
Identify detections corresponding to solar cells - 12:
Compute the area of each detected solar cell - 13:
if multiple solar cells are detected then - 14:
Determine the solar cell with the maximum area - 15:
Mark the corresponding detection as the larger target - 16:
end if - 17:
Output detection results and relative size information in real time - 18:
end while
|
2.4. Experimental Platform and Training Settings
2.4.1. Hardware and Software Environment
The training process of the presented YOLOv8 model is conducted on a computer with a Windows 11 operating system. The CPU is 13th Gen Intel(R) Core(TM) i7-13650HX with 16 GB memory. The GPU is NVIDIA GeForce RTX 4060 with 8 GB video memory. The software for deep learning running is PyTorch 2.6.0 available with Cuda 11.8 and CuDNN 9.0.8.
2.4.2. Training Hyperparameter Settings
To ensure experimental reproducibility and provide a clear description of the training environment, the experimental platform and key training settings used in this study are summarized in
Table 2. The YOLOv8n model was trained using COCO-pretrained weights, and the main hyperparameters, including the optimizer, learning rate, batch size, input image size, and the loss weight, were configured according to the default and recommended settings of the YOLOv8 framework.
The optimizer was automatically selected by the YOLOv8 framework, and the initial learning rate was set to 0.01 with a warm-up strategy to ensure stable convergence during training. The loss weight coefficients were configured based on the default settings of YOLOv8 and further adjusted in the performance optimization experiments described in
Section 3.2.
2.5. Model Evaluation Indicators
To quantitatively evaluate the performance of the proposed YOLOv8-based solar panel recognition model, several standard object detection metrics are employed, including precision (P), recall (R), F1-score, average precision (AP), and mean average precision (mAP). These indicators collectively assess the model’s detection accuracy, completeness, and overall effectiveness.
Precision represents the proportion of correctly identified positive samples among all predicted positives,
where
denotes true positives and
represents false positives. A higher precision value indicates that fewer non-solar panel regions are incorrectly detected as solar panels.
Recall measures the proportion of correctly detected positive samples among all actual positives,
where
denotes false negatives. A higher recall value means the model can successfully detect more solar panel targets without omission.
The F1-score combines precision and recall into a single harmonic mean, balancing both detection accuracy and completeness
A higher F1-score indicates that the model achieves a better trade-off between precision and recall.
Average precision is calculated as the area under the precision–recall (P–R) curve,
where
is the precision at a given recall level. AP effectively evaluates how well the model performs across different confidence thresholds.
Finally, mean average precision represents the average AP value over all object classes,
where
N is the number of classes. In this work,
, corresponding to solar panels. To comprehensively evaluate the detection accuracy and localization precision of the proposed model, multiple standard object detection metrics are adopted. Specifically, both mAP@0.5 and mAP@0.5:0.95 are reported. The mAP@0.5 metric measures detection performance under a relatively loose localization criterion, where a predicted bounding box is considered correct if the intersection over union (IoU) exceeds 0.5. In contrast, mAP@0.5:0.95 provides a more stringent and comprehensive evaluation by averaging mAP values over IoU thresholds ranging from 0.5 to 0.95 with a step size of 0.05. This metric is widely regarded as a more reliable indicator of localization accuracy. Reporting both metrics allows a balanced assessment of detection robustness and spatial precision.
In addition to mAP-based metrics, the Average intersection over union (IoU) is adopted to further evaluate localization accuracy. IoU measures the spatial overlap between the predicted bounding box and the corresponding ground-truth annotation. The Average IoU is computed by averaging IoU values over all correctly detected instances in the test set. Compared with mAP, this metric provides a more intuitive assessment of bounding box alignment quality, which is particularly relevant for realistic applications, e.g., OWPT systems where the transmitter–receiver alignment depends on the precise localization of the receiver.
4. Conclusions and Outlook
4.1. Conclusions
This study presents a deep learning-based method to recognize flexible solar panels embedded in complicated environments. Employing the YOLOv8 object detection framework, the proposed approach achieves accurate and robust recognition of real flexible solar panels, including those under bent and partially shaded scenarios, which are often challenging for traditional image processing methods. In addition, this study further improves the detection accuracy by investigating the Pareto front to find the best set of loss weights. The introduced model achieves a high detection accuracy together with a precise localization performance, reaching an mAP@0.5 of 99.4% and an mAP@0.5:0.95 of 90.4% with a real-time inference performance of 32 FPS. Moreover, an extended real-time functionality is introduced, enabling the model to identify the sizes of detected solar cells, hence providing additional information for realistic applications, e.g., the adaptive OWPT alignment. All of the above-mentioned features demonstrate the high capability of the proposed deep learning approach for practical deployment in dynamic realistic environments.
4.2. Outlook
Although the presented deep learning model has a very good recognition performance, several drawbacks still exist. First, the dataset used in this study is relatively small, which may limit generalization to more diverse environmental conditions, largely due to the lack of publicly available large-scale solar panel datasets. Second, the performane of the extended real-time functionality is limited for curved solar cells, since the relative size estimation is purely based on the bounding box area and is independent of the solar panel shape. In addition, although the proposed YOLOv8-based framework achieves real-time inference in the current experimental setup, its performance on resource-constrained or edge devices may be affected by hardware limitations. Hence, the future work will expand the dataset scale, further improve the extended functionality of real-time recognition, and explore model optimization strategies to enhance deployment efficiency on embedded hardware platforms.
It should also be noted that geometric distortions such as perspective transformation and in-plane rotation were not explicitly included in the data augmentation process. This design choice was made because the primary variability of flexible solar panels arises from the physical deformation (e.g., bending and partial curvature) rather than arbitrary rotations. Moreover, the image acquisition setup in realistic contexts, e.g., OWPT systems and drone-assisted inspection, usually maintains a relatively constrained viewing angle, reducing the necessity of aggressive geometric augmentation. Nevertheless, incorporating perspective and rotation-based augmentation remains a promising direction for future work to further improve the proposed approach’s robustness under more unconstrained deployment conditions.
Moreover, the current study focuses on single-class recognition of flexible solar panels, but the proposed framework is inherently extensible to multi-class detection. YOLOv8 naturally supports multi-class learning by introducing additional object categories during dataset annotation and training, without requiring changes to the network architecture. In practical deployment scenarios, visually similar objects such as metallic plates or reflective surfaces may act as hard-negative samples and potentially cause false detections. These confusing background objects can be explicitly incorporated as negative classes to improve discrimination capability through hard-negative mining.
In addition, the potential risk of overfitting should be carefully discussed, particularly given the single-class nature of the current recognition task. Although the data augmentation and diverse acquisition conditions (
Figure 3) were adopted to improve robustness, the absence of explicit cross-scene or cross-environment validation may still limit the generalization ability of the model when deployed in unseen scenarios. This risk is further amplified by the relatively limited dataset size, which may bias the model toward scene-specific visual patterns rather than intrinsic panel features. Nevertheless, the consistent convergence behavior and stable performance across training and validation sets indicate that severe overfitting is unlikely in the current setting. In future work, cross-scene evaluation protocols and validation strategies with leaving one environment out will be introduced to assess generalization capability under unseen deployment conditions.
Beyond panel-level detection, recent studies have explored more fine-grained photovoltaic analysis using deep learning, such as fault inspection, cell-level segmentation, and performance estimation. Representative examples include segmentation-oriented frameworks such as SEiPV-Net [
25], which demonstrate the effectiveness of deep neural networks in extracting structural and semantic information from photovoltaic modules. While the present work deliberately focuses on robust and real-time panel-level recognition, these segmentation-based approaches highlight promising directions for future extensions.