1. Introduction
The global transition to digital infrastructure has resulted in a burgeoning crisis of electronic waste (e-waste), projected to reach 74 million metric tons annually by 2030 [
1]. Semiconductor e-waste, specifically printed circuit boards (PCBs) and integrated circuits (ICs), represents a concentrated source of critical raw materials, including gold, silver, palladium, and rare earth elements [
2]. Efficient material recovery from these sources is hampered by the heterogeneity, minute size, and occluded nature of components embedded in the waste stream. Traditional sorting relies heavily on manual labor or bulk chemical processing, both of which are environmentally intensive, hazardous, and economically inefficient due to low material purity post-sorting.
Early efforts in automated e-waste sorting relied on techniques like X-ray fluorescence (XRF) and near-infrared (NIR) spectroscopy. While effective for elemental analysis, these are typically slow and struggle with the spatial resolution required for small components. The integration of computer vision has become essential. Recent research has explored various deep learning models, including R-CNN variants and single-shot detectors, but few have rigorously addressed the trade-off between high-speed requirements and high-accuracy demands typical of industrial e-waste sorting [
3].
The demand for non-destructive, high-throughput component identification is paramount. Computer vision (CV) technologies, particularly deep learning-based object detectors, offer a viable solution for real-time automated sorting. Printed circuit board (PCB) component detection presents unique challenges for automated object detection, including small component sizes, dense layouts, and high visual similarity between component types such as integrated circuits (ICs), capacitors, and connectors. YOLO (You Only Look Once) models are well suited to this industrial application due to their single-pass architecture, which enables faster inference than two-stage detectors and meets the stringent timing requirements of high-speed industrial conveyor belts. Recent YOLO variants, including YOLOv5, YOLOv7, YOLOv8, and YOLOv12, have been successfully applied to PCB inspection tasks, demonstrating strong performance in component localization and classification through architectural improvements such as attention mechanisms, enhanced feature pyramids, and optimized anchor strategies [
4,
5,
6,
7].
Since its introduction by Redmon et al. (2016), the You Only Look Once (YOLO) framework has reconceptualized object detection as a single regression problem, using a single convolutional neural network to predict bounding boxes and class probabilities simultaneously in a single pass [
8]. It has evolved through multiple generations with significant architectural improvements to date. Key advances include the Darknet-53 backbone and FPN integration (YOLOv3), PyTorch (≥1.7.0) migration and Mosaic augmentation (YOLOv5), anchor-free detection with C2f module (YOLOv8), Programmable Gradient Information (YOLOv9), NMS-free dual-label assignment (YOLOv10), cross-stage self-attention via C3k2 (YOLOv11), and Area Attention with FlashAttention (YOLOv12) [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. This study evaluates YOLOv5 through YOLOv12.
Figure 1 and
Table 1 illustrate the evolution of YOLO models through the years.
However, single-model detectors often struggle with the subtle visual variations (e.g., surface wear, differing component colors, and slight rotations) characteristic of a real-world e-waste stream. This reduces classification and localization accuracy, directly affecting the purity and yield of recycled material.
Ensemble learning combines predictions from multiple models to enhance overall detection performance and robustness. By integrating outputs from diverse models, this approach reduces individual model limitations, including variance, bias, and overfitting, through common techniques such as model stacking, boosting, and various forms of prediction fusion. Although YOLO models demonstrate strong efficiency and precision in object detection, they remain susceptible to missed detections or classification errors when processing complex or highly variable datasets. For object detection tasks, the primary challenge lies in effectively merging bounding box predictions—both location and class information—from different models. Techniques such as Non-Maximum Suppression (NMS), Soft-NMS, and Weighted Box Fusion (WBF) have been proposed to address this challenge, with WBF often demonstrating superior results by intelligently merging highly overlapping boxes while retaining high-confidence predictions. Recent research has addressed these limitations by implementing ensemble strategies that aggregate predictions from multiple YOLO variants, significantly improving system reliability. Several studies have demonstrated the effectiveness of ensemble approaches in object detection applications. For example, Lin et al. [
20] combined YOLOv5, YOLOv9, YOLOv10, YOLOv11 and YOLOv12 for detection of Barrett’s esophagus lesions in endoscopic images, achieving a highest recall of 97.4%; Tsai et al. [
21] used YOLO ensemble learning and WBS to enhance Fisheye object detection; their method achieved an F1-score of 64.13%, ranking second among 62 competing teams at AI City Challenge 2025. Liu et al. [
22] combined different version of YOLOv5 models for railway surveillance, achieving an overall accuracy of 85.4% and 83.4% mAP; Hu et al. [
23] combined YOLOv6, YOLOv7, and Faster R-CNN models to achieve an F1-score of 90.5% in crowd surveillance. These studies collectively demonstrate that ensemble approaches, whether homogeneous or heterogeneous, improve detection accuracy and stability.
This study aims to investigate the feasibility and performance of deploying YOLO models for automated PCB component detection, with emphasis on ensemble methods to improve detection accuracy and system reliability. The study pursues the following objectives:
To benchmark the component detection and localization capabilities of individual YOLO models on PCB image datasets, accessed through standard object detection metrics.
To enhance detection performance through testing multiple ensemble fusion techniques to integrate the output of various YOLO models. By integrating predictions from multiple YOLO architectures, the proposed ensemble strategies suppress model-specific false positives and missed detections, yielding a more robust and consistent detection pipeline.
To systematically compare ensemble strategies against one another and against single model baselines, evaluating their respective advantages and disadvantages in PCB component recognition, with the aim of identifying the most effective and practical architecture for real-time PCB component detection.
Our primary contribution is the development of an ensemble YOLO framework tailored for the recognition of semiconductor e-waste. By combining predictions from several independently trained YOLO models, we significantly enhance the accuracy of component detection, improve localization precision, and increase overall robustness across diverse component conditions. This advancement pushes the boundary of industrial informatics for recycling and provides a blueprint for integrating high-accuracy CV into automated material recovery facilities (MRFs).
In the following sections of the study, the evolution of YOLO models, the dataset, and ensemble strategies are described and compared in the materials and methods section. In the results section, the performance of individual models is evaluated, followed by the performance of the proposed ensemble methods using several metrics and is compared with each other. In the discussion section, the accuracy and efficiency of the ensemble strategies are discussed. In the conclusion, the general results are summarized, and the potential impacts and future work of the methods for PCB detection are emphasized.
3. Results
3.6. Individual vs. Ensemble Model Comparison
Table 10 presents the per-class performance comparison between the best individual models and the best ensemble configurations. Ensemble methods demonstrated improvements across all component classes, with varying degrees of enhancement.
For IC detection, the Affirmative Top-6 ensemble achieved 70.65% mAP@0.5, representing an 8.1% relative improvement over YOLOv11s (65.38%). Recall improved substantially from 66.5% to 70.7%, while precision increased from 64.4% to 68.1%. Connector detection showed notable improvement with Unanimous Top-4 achieving 65.53% mAP@0.5 compared to 59.28% for YOLOv8s (10.5% relative improvement).
Electrolytic capacitor detection, already performing well individually, further improved with Consensus Top-4, achieving 93.22% mAP@0.5 compared to 90.67% for YOLOv5s. Recall improved significantly from 78.9% to 88.0%.
For the challenging capacitor class, ensemble methods achieved substantial relative improvements. The Unanimous Top-6 ensemble achieved 13.95% mAP@0.5, representing a 64.1% relative improvement over the best individual model (YOLOv12s, 8.50%). This improvement was driven primarily by increased precision (23.5% vs. 11.0%), demonstrating that ensemble agreement effectively filtered false positive detections for this difficult class.
Figure 8 illustrates the per-class detection performance comparison between the best individual models and best ensemble configurations, demonstrating consistent improvements across all component classes with relative gains ranging from 2.8% (electrolytic capacitor) to 64.1% (capacitor).
Figure 9 presents the per-class precision–recall curves, which better illustrate model performance given the low detection accuracy for the capacitor class. The Consensus voting ensemble consistently achieves the highest mAP@0.5 across all classes, with improvements ranging from 7.3% to 38.2% over the baseline YOLOv8s model. The PR curves show different detection features: IC, connector, and electrolytic capacitor have high precision over a broad recall range, while the capacitor class has low precision and rapidly declines, indicating detection difficulties that ensemble methods alone cannot completely solve.
Table 11 summarizes the overall performance improvements achieved by ensemble methods relative to the best individual model (YOLOv8s). All ensemble configurations demonstrated positive mAP@0.5 improvements, with voting-based strategies consistently outperforming NMS ensembles.
The Consensus Top-4 ensemble achieved the highest mAP@0.5 of 59.63%, representing a 10.3% relative improvement over YOLOv8s (54.04%). This configuration also achieved the best efficiency trade-off, providing substantial accuracy gains with moderate computational overhead (384.9 ms, 6.1× slowdown relative to YOLOv8s).
For applications requiring maximum precision, the Unanimous Top-4 configuration achieved 67.2% precision with 9.4% mAP@0.5 improvement.
Figure 10 presents the detection results of the best-performing models in a sample PCB image. The ensemble models detect more components than individual models; however, they also exhibit a higher tendency toward false-positive detections, especially in misclassifying small components like pads and leds as capacitors. This effect also reflects the low mAP in capacitor detection. Consensus and Unanimous voting strategies partially alleviate this by requiring inter-model agreement at the cost of reduced recall.
5. Conclusions
This study successfully developed an ensemble learning framework using the YOLO model series (ranging from YOLOv5 to YOLOv12) for real-time detection of PCB components, including microchips (ICs), capacitors, and connectors. By combining several YOLO versions, the system proved more accurate and reliable than any single model.
Among individual models, YOLOv8s achieved the highest mAP@0.5 of 54.04%, followed closely by YOLOv11s (53.96%) and YOLOv5s (53.66%). YOLOv12s achieved the highest precision (62.9%), making it a suitable choice when minimizing false positives is prioritized. The Top-4 ensemble configurations, combining YOLOv8s, YOLOv11s, YOLOv5s, and YOLOv9s, consistently outperformed individual models in mAP@0.5. The Consensus Voting Top-4 configuration achieved the highest mAP@0.5 (59.63%), a 10.3% relative improvement over the best individual model. WBF-based methods, while achieving the highest F1-score (60.55%), produced lower mAP@0.5 than individual models (51.84–52.26%), because WBF confidence scaling causes single-model detections to fall below evaluation thresholds, reducing the area under the precision–recall curve.
The system saw significant improvements in detection for ICs (+6.6%) and connectors (+8.4%). However, small capacitors remain difficult, with accuracy still below 0.14. Due to their tiny size, these parts often appear as background noise or solder spots to the AI. Additionally, while the ensemble method offers greater accuracy, it demands more “computational overhead,” requiring increased processing power compared to a single model.
Future research may explore several directions to address these limitations. These include incorporating super-resolution preprocessing to enhance small component visibility, investigating attention mechanisms specifically designed for tiny object detection, and optimizing ensemble fusion strategies to improve computational efficiency. Multi-scale feature pyramid networks and adaptive confidence thresholding based on component class could further enhance detection accuracy for challenging categories.
To overcome the current limitations, future research will focus on several key directions.
Super-resolution and attention mechanisms: We plan to incorporate super-resolution preprocessing and attention mechanisms to improve the detection of small components such as capacitors, enabling the model to focus on fine-grained features in dense layouts.
Optimized fusion strategies: We will investigate optimized ensemble fusion strategies to reduce computational overhead, aiming to achieve faster processing speeds without sacrificing detection accuracy.
Multi-scale feature pyramids: The adoption of multi-scale feature pyramids will allow the system to simultaneously recognize components across varying scales and viewing distances, improving robustness in real-world deployment scenarios.
Image preprocessing for e-waste recycling: We will explore preprocessing techniques such as decolorization to enhance model generalization, with the goal of extending the system to broader semiconductor e-waste recycling applications [
31].
Multispectral detection: Combining RGB and hyperspectral imaging (HSI) data can enhance the detection of small components like capacitors. Arbash et al. [
32] showed a 37.8% improvement in capacitor detection when using an RGB+HSI fusion model compared to an RGB-only model.
The YOLO-based ensemble framework provides an effective solution for automated recycling and manufacturing. Although detecting microscopic parts remains challenging, this system demonstrates strong potential for intelligent quality assurance and high-purity material recovery in the electronics sector.
In summary, future research may explore several directions to address these limitations. These include incorporating super-resolution preprocessing to enhance the visibility of small components, investigating attention mechanisms tailored to tiny object detection, and optimizing ensemble fusion strategies to improve computational efficiency. Multi-scale feature pyramid networks and adaptive confidence thresholding based on component class could further enhance detection accuracy for challenging categories.