Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling

Zhou, Xinglong; Agaian, Sos

doi:10.3390/info17040322

Open AccessArticle

Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling

by

Xinglong Zhou

^1,* and

Sos Agaian

^1,2

¹

The Graduate Center, City University of New York, New York, NY 10016, USA

²

Computer Science Department, The Graduate Center, College of Staten Island (CSI), City University of New York, New York, NY 10314, USA

^*

Author to whom correspondence should be addressed.

Information 2026, 17(4), 322; https://doi.org/10.3390/info17040322

Submission received: 27 February 2026 / Revised: 18 March 2026 / Accepted: 21 March 2026 / Published: 26 March 2026

(This article belongs to the Special Issue AI and Machine Learning in the Big Data Era: Advanced Algorithms and Real-World Applications)

Download

Browse Figures

Versions Notes

Abstract

The global rise in electronic waste (e-waste), especially in semiconductor components such as circuit boards and microchips, underscores a critical need for improved recycling technology. Current industrial sorters often miss small, high-value components. This leads to the loss of precious metals and inefficient recycling processes. This paper introduces an automated detection framework for detecting semiconductor components in e-waste. It assesses ensemble learning methods that leverage the strengths of multiple YOLO (You Only Look Once) object detection models, including YOLOv5, YOLOv8, YOLOv9, YOLOv10, YOLOv11, and YOLOv12. Three ensemble fusion strategies are systematically compared: standard Non-Maximum Suppression (NMS), voting-based strategies (Affirmative, Consensus, Unanimous), and Weighted Box Fusion (WBF) with both static and dynamic weight optimization. Our simulations demonstrate that using multiple models together is far more effective than a single model for the following reasons. 1. Higher Accuracy: The best configuration, Top-4 Consensus Voting ensemble strategy, achieved an mAP@0.5 of 59.63%, a 10.3% improvement over the best individual model (YOLOv8s, 54.04%); 2. Greater Reliability: It significantly reduced “false negatives” (missed detections), even in cluttered or crowded e-waste scenarios; 3. Enhanced Detection: While the individual YOLOv8 model is fast (taking only 62.6 ms), supporting real-time detection, the best ensemble configuration (Consensus Top-4) takes 384.9 ms, creating a trade-off between detection accuracy and speed; 4. Well-Balanced Performance: Some fusion strategies showed slight trade-offs in mAP for certain parts, but collectively achieved a 7% rise in F1-score, indicating a better balance between precision and recall. This research marks significant progress in smart recycling. Improved component identification allows for more efficient recovery of high-purity materials. This promotes a circular economy by ensuring that rare and strategic materials in electronics are reused instead of discarded.

Keywords:

object detection; YOLO ensemble; waste-printed circuit boards; electronic waste; deep learning; artificial intelligence

1. Introduction

The global transition to digital infrastructure has resulted in a burgeoning crisis of electronic waste (e-waste), projected to reach 74 million metric tons annually by 2030 [1]. Semiconductor e-waste, specifically printed circuit boards (PCBs) and integrated circuits (ICs), represents a concentrated source of critical raw materials, including gold, silver, palladium, and rare earth elements [2]. Efficient material recovery from these sources is hampered by the heterogeneity, minute size, and occluded nature of components embedded in the waste stream. Traditional sorting relies heavily on manual labor or bulk chemical processing, both of which are environmentally intensive, hazardous, and economically inefficient due to low material purity post-sorting.

Early efforts in automated e-waste sorting relied on techniques like X-ray fluorescence (XRF) and near-infrared (NIR) spectroscopy. While effective for elemental analysis, these are typically slow and struggle with the spatial resolution required for small components. The integration of computer vision has become essential. Recent research has explored various deep learning models, including R-CNN variants and single-shot detectors, but few have rigorously addressed the trade-off between high-speed requirements and high-accuracy demands typical of industrial e-waste sorting [3].

The demand for non-destructive, high-throughput component identification is paramount. Computer vision (CV) technologies, particularly deep learning-based object detectors, offer a viable solution for real-time automated sorting. Printed circuit board (PCB) component detection presents unique challenges for automated object detection, including small component sizes, dense layouts, and high visual similarity between component types such as integrated circuits (ICs), capacitors, and connectors. YOLO (You Only Look Once) models are well suited to this industrial application due to their single-pass architecture, which enables faster inference than two-stage detectors and meets the stringent timing requirements of high-speed industrial conveyor belts. Recent YOLO variants, including YOLOv5, YOLOv7, YOLOv8, and YOLOv12, have been successfully applied to PCB inspection tasks, demonstrating strong performance in component localization and classification through architectural improvements such as attention mechanisms, enhanced feature pyramids, and optimized anchor strategies [4,5,6,7].

Since its introduction by Redmon et al. (2016), the You Only Look Once (YOLO) framework has reconceptualized object detection as a single regression problem, using a single convolutional neural network to predict bounding boxes and class probabilities simultaneously in a single pass [8]. It has evolved through multiple generations with significant architectural improvements to date. Key advances include the Darknet-53 backbone and FPN integration (YOLOv3), PyTorch (≥1.7.0) migration and Mosaic augmentation (YOLOv5), anchor-free detection with C2f module (YOLOv8), Programmable Gradient Information (YOLOv9), NMS-free dual-label assignment (YOLOv10), cross-stage self-attention via C3k2 (YOLOv11), and Area Attention with FlashAttention (YOLOv12) [9,10,11,12,13,14,15,16,17,18,19]. This study evaluates YOLOv5 through YOLOv12. Figure 1 and Table 1 illustrate the evolution of YOLO models through the years.

However, single-model detectors often struggle with the subtle visual variations (e.g., surface wear, differing component colors, and slight rotations) characteristic of a real-world e-waste stream. This reduces classification and localization accuracy, directly affecting the purity and yield of recycled material.

Ensemble learning combines predictions from multiple models to enhance overall detection performance and robustness. By integrating outputs from diverse models, this approach reduces individual model limitations, including variance, bias, and overfitting, through common techniques such as model stacking, boosting, and various forms of prediction fusion. Although YOLO models demonstrate strong efficiency and precision in object detection, they remain susceptible to missed detections or classification errors when processing complex or highly variable datasets. For object detection tasks, the primary challenge lies in effectively merging bounding box predictions—both location and class information—from different models. Techniques such as Non-Maximum Suppression (NMS), Soft-NMS, and Weighted Box Fusion (WBF) have been proposed to address this challenge, with WBF often demonstrating superior results by intelligently merging highly overlapping boxes while retaining high-confidence predictions. Recent research has addressed these limitations by implementing ensemble strategies that aggregate predictions from multiple YOLO variants, significantly improving system reliability. Several studies have demonstrated the effectiveness of ensemble approaches in object detection applications. For example, Lin et al. [20] combined YOLOv5, YOLOv9, YOLOv10, YOLOv11 and YOLOv12 for detection of Barrett’s esophagus lesions in endoscopic images, achieving a highest recall of 97.4%; Tsai et al. [21] used YOLO ensemble learning and WBS to enhance Fisheye object detection; their method achieved an F1-score of 64.13%, ranking second among 62 competing teams at AI City Challenge 2025. Liu et al. [22] combined different version of YOLOv5 models for railway surveillance, achieving an overall accuracy of 85.4% and 83.4% mAP; Hu et al. [23] combined YOLOv6, YOLOv7, and Faster R-CNN models to achieve an F1-score of 90.5% in crowd surveillance. These studies collectively demonstrate that ensemble approaches, whether homogeneous or heterogeneous, improve detection accuracy and stability.

This study aims to investigate the feasibility and performance of deploying YOLO models for automated PCB component detection, with emphasis on ensemble methods to improve detection accuracy and system reliability. The study pursues the following objectives:

To benchmark the component detection and localization capabilities of individual YOLO models on PCB image datasets, accessed through standard object detection metrics.
To enhance detection performance through testing multiple ensemble fusion techniques to integrate the output of various YOLO models. By integrating predictions from multiple YOLO architectures, the proposed ensemble strategies suppress model-specific false positives and missed detections, yielding a more robust and consistent detection pipeline.
To systematically compare ensemble strategies against one another and against single model baselines, evaluating their respective advantages and disadvantages in PCB component recognition, with the aim of identifying the most effective and practical architecture for real-time PCB component detection.

Our primary contribution is the development of an ensemble YOLO framework tailored for the recognition of semiconductor e-waste. By combining predictions from several independently trained YOLO models, we significantly enhance the accuracy of component detection, improve localization precision, and increase overall robustness across diverse component conditions. This advancement pushes the boundary of industrial informatics for recycling and provides a blueprint for integrating high-accuracy CV into automated material recovery facilities (MRFs).

In the following sections of the study, the evolution of YOLO models, the dataset, and ensemble strategies are described and compared in the materials and methods section. In the results section, the performance of individual models is evaluated, followed by the performance of the proposed ensemble methods using several metrics and is compared with each other. In the discussion section, the accuracy and efficiency of the ensemble strategies are discussed. In the conclusion, the general results are summarized, and the potential impacts and future work of the methods for PCB detection are emphasized.

2. Materials and Methods

This study uses YOLO-based object detection models, improved via ensemble learning, to enhance the accuracy and robustness of PCB object detection. As illustrated in Figure 2, the process involves three key stages: training the YOLO model, combining models through ensemble methods, and evaluating performance.

2.1. Dataset

The Printed Circuit Board dataset [24] (Roboflow Inc., Des Moines, IA, USA) comprises 620 high-resolution images, each 640 × 640 pixels, partitioned into training and validation subsets using an 80:20 split. Sample images from both the training and validation sets are illustrated in Figure 3. Our experiment focuses on four distinct component categories—integrated circuits (ICs), electrolytic capacitors, capacitors, and connectors—with examples of each class depicted in Figure 4.

2.2. YOLO Model Training

Six versions of the YOLO model, namely YOLOv5, YOLOv8, YOLOv9, YOLOv10, YOLOv11 and YOLOv12, were selected for comparative evaluation in PCB component detection analysis. These models were chosen for their proven capabilities in real-time detection tasks and for their iterative improvements in feature extraction, computational efficiency, and generalization. Although YOLOv6, and YOLOv7 are widely used in object detection tasks, they were excluded from this study due to architectural redundancy with the selected models or limited performance improvements observed in preliminary trials on our dataset. A short architectural summary is provided below.

YOLOv5s: The first PyTorch-based YOLO implementation, featuring Spatial Pyramid Pooling Fast (SPPF), AutoAnchor algorithm, and PANet-based feature aggregation. Offers excellent deployment accessibility.
YOLOv8s: Adopted anchor-free detection with a decoupled head separating classification and regression tasks. Introduced the C2f module for enhanced feature fusion.
YOLOv9s: Incorporates Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Networks (GELAN) to enhance feature extraction in complex scenarios.
YOLOv10s: Eliminates NMS through consistent dual assignments, using a one-to-many head during training and a one-to-one head for inference. Achieves the fastest inference speeds among YOLO variants.
YOLOv11s: Introduces the C3k2 block and C2PSA module combining cross-stage partial networks with self-attention mechanisms. Demonstrates balanced performance across accuracy and efficiency.
YOLOv12s: The latest release, integrating Area Attention (A2), Residual Efficient Layer Aggregation Networks (R-ELAN), and FlashAttention for attention-centric learning with reduced computational overhead.

It is worth noting that the original source code repositories for YOLOv1 and YOLOv2, as cited in their respective papers by Redmon et al. [8,9], are no longer publicly accessible at the links provided. The repositories listed in Table 2 for these two models are community-maintained replications hosted by independent users and are not the official releases. Ultralytics is an open-source AI company that has re-implemented and maintained several YOLO variants—including YOLOv3, YOLOv5, YOLOv8, and YOLOv11—under a unified, actively maintained framework [12,15,18]. To ensure experimental consistency across model architectures, all models used in this study are trained and evaluated using the Ultralytics library (Ultralytics Inc., Los Angeles, CA, USA), which provides a standardized training pipeline, data augmentation strategy, and inference interface. This choice eliminates implementation-level discrepancies that would otherwise confound direct performance comparisons between model generations.

2.3. YOLO Ensemble Learning Strategy

To enhance the performance and stability of PCB component detection, this study adopts an ensemble learning approach that integrates predictions from six YOLO models: YOLOv5s, YOLOv8s, YOLOv9s, YOLOv10s, YOLOv11s, and YOLOv12s. By combining outputs from multiple models, the ensemble strategy mitigates biases and variances inherent in individual models, thereby strengthening detection capability for PCB components.

2.3.1. Bounding Box Fusion Techniques

A key challenge in assembling object detectors is merging bounding box predictions from multiple models. Several fusion techniques have been proposed to address this challenge.

Non-Maximum Suppression (NMS) is the standard approach that sorts predicted boxes by confidence scores, selects the highest-scoring box, and removes all overlapping boxes exceeding a specified Intersection over Union (IoU) threshold [25]. While effective for single models, NMS discards potentially valid predictions when multiple models detect the same object with slightly different localizations.

Soft-NMS addresses this limitation by reducing the confidence scores of overlapping boxes in proportion to their IoU overlap, rather than completely removing them [26]. This allows nearby objects to be preserved while still suppressing redundant detections.

Weighted Box Fusion (WBF) takes a fundamentally different approach by using confidence scores of all proposed bounding boxes to construct averaged boxes [27]. Unlike NMS and Soft-NMS which discard boxes, WBF computes weighted averages of coordinates based on confidence scores, potentially producing more accurate localization when multiple models agree on an object’s presence but differ slightly in predicted positions. As illustrated in Figure 5, NMS selects the highest-confidence box and suppresses overlapping predictions, whereas WBF fuses all contributing boxes into a single weighted average.

2.3.2. Weighted Voting WBF Strategies

We adapt the ensemble strategies proposed by Şimşek et al. [28] for meniscus segmentation, which utilize pixel-based voting, weighted multiple voting, and dynamic weight optimization via grid search. While the original work operates on pixel-level segmentation masks for MRI images, we adapt these approaches to bounding-box object detection for PCB component analysis using Weighted Box Fusion (WBF). Our implementation focuses on two primary methods derived from the reference:

Method 1 (Model-Weighted Voting) assigns static weights to the top 3 models (YOLOv11, YOLOv5, and YOLOv8) based on their individual mAP@0.5 performance ranking.
Method 2 (Dynamic Weighted Voting) employs a grid search to exhaustively evaluate weight combinations (step size 0.1, $\sum w_{i} = 1.0$ ) to identify the configuration that maximizes detection metrics.

The assigned weights are then used by Weighted Box Fusion (WBF), which merges each model’s detections according to their weights. This results in a single set of fused bounding boxes with combined confidence scores. Refer to Figure 6 for an illustration.

2.3.3. Ensemble Voting Strategies

Following the ensemble framework proposed by Casado-García and Heras [29], we employ three voting strategies to determine whether a detected region contains a valid object:

Affirmative Strategy: All detections from any model are retained. This method reduces false negatives by keeping objects identified by at least one model, but it may also increase false positives due to the accumulation of incorrect detections from individual models.

Consensus Strategy: Only detections agreed upon by more than half of the models (>m/2, where m is the number of models) are retained. This strategy provides a balanced trade-off between false positives and false negatives, analogous to majority voting in classification ensembles.

Unanimous Strategy: Only detections where all models agree are retained. This approach minimizes false positives but may increase false negatives, as all models must detect an object for it to be considered valid.

The ensemble algorithm is illustrated graphically in Figure 7.

Table 3 summarizes the key characteristics, advantages, and limitations of the bounding box fusion and ensemble voting strategies employed in this study.

2.3.4. Implementation

All ensemble methods shared the following evaluation parameters: confidence threshold = 0.25, IoU threshold for box grouping and mAP@0.5 computation = 0.50, and WBF grid search step size = 0.1. Inference times were measured on an Apple Mac Mini M4 (Apple Silicon, MPS backend) and averaged over 41 test images. All experiments were run once; performance variability across runs was not assessed, which is a limitation for future research.

For each image, predictions from all selected YOLO models were aggregated, and overlapping bounding boxes were grouped based on IoU threshold (>0.5) and class label. We evaluated three fusion approaches: (1) standard NMS applied directly to the aggregated predictions; (2) voting-based strategies (Affirmative, Consensus, and Unanimous) followed by NMS within each retained group, and (3) weighted voting ensemble methods based on Weighted Box Fusion (WBF), comprising static model-weighted voting with performance-rank-derived weights and dynamic weight optimization via an exhaustive grid search algorithm. This three-way comparison enables a systematic evaluation of ensemble fusion strategies: from the simplest aggregation baseline (NMS) to agreement-based filtering (voting strategies) and finally to optimized weighted fusion (WBF with static and dynamic weights), allowing us to identify the contribution of each design decision to overall detection performance.

This choice is informed by the spatial characteristics of PCB components and the inherent properties of each fusion method. Components on PCB boards are typically well separated with distinct boundaries, unlike natural scene images where objects frequently overlap or occlude one another. In such scenarios, the aggressive box suppression of standard NMS effectively eliminates redundant detections without discarding valid predictions. Soft-NMS, designed to preserve overlapping objects by gradually decaying confidence scores, provides limited benefit when objects rarely overlap. Similarly, WBF’s coordinate averaging mechanism, which excels when multiple models detect the same object with slight localization variations, may introduce unnecessary positional shifts for clearly bounded components. Therefore, the ablation study in Section 3.8 evaluates Method 3’s performance compared to individual models. It should be noted that the grid search for the weight optimization was performed over the same test set used for final evaluation, although a more rigorous approach would have been to hold a separate validation set for weight search, a practice that will be adopted in the future to derive unbiased estimates of the model’s performance. It empirically determines whether weight-optimized WBF fusion can offset its tendency for positional averaging and surpass the results of single models.

The ensemble configurations (dual, triple, quadruple, quintuple, and full six-model) were designed to assess the impact of architectural diversity and fusion depth on detection performance. Models were incrementally added based on their individual mAP@0.5 rankings to systematically evaluate how ensemble depth affects detection sensitivity across different voting strategies.

2.4. Evaluation Metrics

To comprehensively assess the performance of YOLO-based ensemble learning methods for PCB component detection, this study employed four primary evaluation metrics: precision, recall, F1-score, and mean average precision (mAP). These metrics provide complementary perspectives on model performance, allowing assessment of both prediction accuracy and detection completeness.

Precision quantifies the proportion of correctly identified components among all regions predicted as components, reflecting the accuracy of the model predictions.
Recall measures detection sensitivity by evaluating the model’s ability to successfully detect actual component instances.
F1-score, the harmonic mean of precision and recall, offers a balanced evaluation metric particularly suited for imbalanced class scenarios.
mAP@0.5 computes the mean average precision across all classes at an IoU threshold of 0.5, serving as the standard detection accuracy metric.

The metrics are formally defined by the following equations:

Precision = \frac{T P}{T P + F P}

(1)

Recall = \frac{T P}{T P + F N}

(2)

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(3)

mAP @ 0.5 = \frac{1}{C} \sum_{c = 1}^{C} A P_{c} |_{I o U = 0.5}

(4)

where

T P

(true positive) denotes correctly identified component regions,

F P

(false positive) refers to incorrectly predicted non-component regions,

F N

(false negative) indicates missed components, and C represents the total number of component classes.

For PCB component detection in e-waste recycling applications, recall is particularly important as it reflects the system’s ability to identify all valuable components, minimizing material loss during the recovery process. However, maintaining reasonable precision ensures that non-component regions are not incorrectly processed, reducing operational inefficiencies.

To evaluate computational efficiency, we measure Throughput (T), defined as the number of images processed per second:

T = \frac{N}{t_{t o t a l}}

(5)

where N is the total number of evaluated images, and

t_{t o t a l}

is the total inference time in seconds.

To provide a unified measure that balances detection accuracy and computational efficiency, we utilize the F1 efficiency index (FEI) [30]:

FEI = F 1 \times {log}_{10} (T)

(6)

This metric captures the trade-off between detection accuracy and processing speed, enabling fair comparison across models with varying computational requirements. The logarithmic scaling of throughput prevents speed from dominating the composite score while still rewarding faster models. Higher FEI values indicate better overall performance considering both accuracy and efficiency.

3. Results

3.1. Individual YOLO Model Performance

Table 4 presents the detection performance of six YOLO model variants evaluated on the PCB component dataset. YOLOv8s achieved the highest mAP@0.5 of 54.04%, followed closely by YOLOv11s (53.96%), YOLOv5s (53.66%), and YOLOv9s (53.51%). YOLOv12s and YOLOv10s demonstrated comparatively lower accuracy at 52.78% and 46.77% mAP@0.5, respectively.

In terms of detection speed, YOLOv5s achieved the fastest inference time of 62.2 ms per image, followed by YOLOv8s (62.6 ms) and YOLOv11s (76.2 ms). YOLOv9s exhibited moderate speed at 80.3 ms, while YOLOv12s and YOLOv10s were the slowest at 123.9 ms and 131.3 ms, respectively. Notably, YOLOv12s achieved the highest precision (62.9%) among all models, whereas YOLOv11s attained the highest recall (53.3%).

3.2. NMS-Based Ensemble Performance

Table 5 summarizes the performance of NMS-based ensemble configurations with varying numbers of models. The ensembles were constructed by incrementally adding models based on their individual mAP@0.5 rankings: YOLOv8s, YOLOv11s, YOLOv5s, YOLOv9s, YOLOv12s, and YOLOv10s.

All NMS ensemble configurations outperformed the best individual model (YOLOv8s) in terms of mAP@0.5. The Top-4 ensemble achieved the highest mAP@0.5 of 56.59%, representing a 4.7% relative improvement over YOLOv8s (54.04%). Notably, increasing ensemble size beyond four models provided diminishing returns, with Top-6 achieving marginally lower mAP@0.5 (56.54%) than Top-4.

A consistent improvement in recall was observed as ensemble size increased, from 57.4% (Top-2) to 58.6% (Top-6), while precision remained relatively stable around 59–60%. The computational cost increased proportionally with ensemble size, ranging from 155.6 ms (Top-2) to 538.3 ms (Top-6).

3.3. Voting-Based Ensemble Performance

Table 6 presents the performance of three voting strategies (Affirmative, Consensus, and Unanimous) applied to Top-3, Top-4, and Top-6 ensemble configurations.

The Consensus voting strategy with Top-4 configuration achieved the best overall mAP@0.5 of 59.63%, representing a 10.3% relative improvement over the best individual model and a 5.4% improvement over the best NMS ensemble. The Consensus Top-6 and Affirmative Top-6 configurations achieved comparable results at 59.37% and 59.25% mAP@0.5, respectively.

The Unanimous voting strategy exhibited mixed results depending on ensemble size. With Top-3 and Top-4 configurations, Unanimous voting achieved competitive mAP@0.5 (58.94% and 59.11%, respectively) with the highest precision values (64.3% and 67.2%). However, with Top-6 configuration, Unanimous voting performance degraded substantially to 57.12% mAP@0.5, as the stringent requirement for all six models to agree significantly reduced recall (48.3%).

Inference times for voting ensembles ranged from 269.9 ms (Affirmative Top-3) to 679.8 ms (Affirmative Top-6), with the additional fusion overhead accounting for approximately 60–140 ms compared to equivalent NMS configurations.

3.4. Per-Class Performance Analysis

Table 7 presents the class characteristics and detection difficulty across the four PCB component categories. Detection performance varied substantially, reflecting differences in visual characteristics and dataset distribution.

The electrolytic capacitor achieved the highest detection accuracy, with 90.67% mAP@0.5 (YOLOv5s), attributable to its distinctive cylindrical shape and larger size. Integrated circuit (IC) detection reached 65.38% mAP@0.5 (YOLOv11s), benefiting from the rectangular geometry and consistent appearance of chip packages. Connector detection achieved 59.28% mAP@0.5 (YOLOv8s), with moderate difficulty due to varying connector types and orientations.

Capacitor detection presented the greatest challenge, with only 8.50% mAP@0.5 (YOLOv12s). This poor performance is attributable to the small size, high density (3630 instances), and visual similarity of capacitors to background PCB features such as solder pads and vias.

Table 8 presents the detailed per-class AP@0.5 performance for each individual model.

3.5. Comparison Between NMS and Voting-Based Ensembles

Table 9 provides a direct comparison between NMS and voting-based ensemble strategies for equivalent model configurations. Voting-based strategies consistently outperformed NMS across all configurations in terms of mAP@0.5.

For Top-4 configuration, the Consensus voting strategy achieved 59.63% mAP@0.5 compared to 56.59% for NMS, representing a 5.4% relative improvement. The Unanimous Top-4 achieved the highest precision (67.2%) among all configurations, while Affirmative Top-6 achieved the highest recall (62.0%).

The precision–recall trade-off varied across strategies: NMS ensembles provided balanced precision–recall with moderate values for both metrics. Affirmative voting improved recall while maintaining similar precision. Unanimous voting maximized precision at the expense of recall, particularly evident in the Top-6 configuration, where precision reached 67.4%, but recall dropped to 48.3%.

Regarding computational cost, voting ensembles required approximately 27–36% more time than equivalent NMS configurations due to the fusion overhead.

3.6. Individual vs. Ensemble Model Comparison

Table 10 presents the per-class performance comparison between the best individual models and the best ensemble configurations. Ensemble methods demonstrated improvements across all component classes, with varying degrees of enhancement.

For IC detection, the Affirmative Top-6 ensemble achieved 70.65% mAP@0.5, representing an 8.1% relative improvement over YOLOv11s (65.38%). Recall improved substantially from 66.5% to 70.7%, while precision increased from 64.4% to 68.1%. Connector detection showed notable improvement with Unanimous Top-4 achieving 65.53% mAP@0.5 compared to 59.28% for YOLOv8s (10.5% relative improvement).

Electrolytic capacitor detection, already performing well individually, further improved with Consensus Top-4, achieving 93.22% mAP@0.5 compared to 90.67% for YOLOv5s. Recall improved significantly from 78.9% to 88.0%.

For the challenging capacitor class, ensemble methods achieved substantial relative improvements. The Unanimous Top-6 ensemble achieved 13.95% mAP@0.5, representing a 64.1% relative improvement over the best individual model (YOLOv12s, 8.50%). This improvement was driven primarily by increased precision (23.5% vs. 11.0%), demonstrating that ensemble agreement effectively filtered false positive detections for this difficult class.

Figure 8 illustrates the per-class detection performance comparison between the best individual models and best ensemble configurations, demonstrating consistent improvements across all component classes with relative gains ranging from 2.8% (electrolytic capacitor) to 64.1% (capacitor).

Figure 9 presents the per-class precision–recall curves, which better illustrate model performance given the low detection accuracy for the capacitor class. The Consensus voting ensemble consistently achieves the highest mAP@0.5 across all classes, with improvements ranging from 7.3% to 38.2% over the baseline YOLOv8s model. The PR curves show different detection features: IC, connector, and electrolytic capacitor have high precision over a broad recall range, while the capacitor class has low precision and rapidly declines, indicating detection difficulties that ensemble methods alone cannot completely solve.

Table 11 summarizes the overall performance improvements achieved by ensemble methods relative to the best individual model (YOLOv8s). All ensemble configurations demonstrated positive mAP@0.5 improvements, with voting-based strategies consistently outperforming NMS ensembles.

The Consensus Top-4 ensemble achieved the highest mAP@0.5 of 59.63%, representing a 10.3% relative improvement over YOLOv8s (54.04%). This configuration also achieved the best efficiency trade-off, providing substantial accuracy gains with moderate computational overhead (384.9 ms, 6.1× slowdown relative to YOLOv8s).

For applications requiring maximum precision, the Unanimous Top-4 configuration achieved 67.2% precision with 9.4% mAP@0.5 improvement.

Figure 10 presents the detection results of the best-performing models in a sample PCB image. The ensemble models detect more components than individual models; however, they also exhibit a higher tendency toward false-positive detections, especially in misclassifying small components like pads and leds as capacitors. This effect also reflects the low mAP in capacitor detection. Consensus and Unanimous voting strategies partially alleviate this by requiring inter-model agreement at the cost of reduced recall.

3.7. Computational Efficiency Analysis

Table 12 presents the computational efficiency analysis comparing the best individual model with all ensemble configurations. The F1 efficiency index (FEI) provides a unified measure balancing detection performance with processing speed.

YOLOv8s achieved the highest FEI of 0.672, demonstrating that despite its lower F1-score (55.9%), its substantially higher throughput (15.97 FPS) results in superior overall efficiency. Among ensemble configurations, NMS Top-2 achieved the highest FEI (0.475), followed by NMS Top-3 (0.411) and Affirmative Top-3 (0.344).

A clear inverse relationship between ensemble complexity and efficiency was observed. As the ensemble size increased from Top-2 to Top-6, FEI decreased consistently across all fusion strategies. For NMS ensembles, FEI dropped from 0.475 (Top-2) to 0.158 (Top-6). Similarly, voting-based ensembles showed FEI reductions from 0.339–0.344 (Top-3) to 0.096–0.102 (Top-6).

Among voting strategies with equivalent ensemble sizes, Affirmative and Consensus strategies achieved comparable FEI scores, while Unanimous voting showed slightly lower efficiency due to reduced F1-scores at larger ensemble sizes. Notably, the Consensus Top-4 configuration, which achieved the highest mAP@0.5 (59.63%), yielded an FEI of only 0.254, indicating that maximum detection accuracy does not correspond to optimal efficiency.

These results suggest that for real-time e-waste sorting applications where throughput is critical, smaller ensemble configurations (Top-2 or Top-3) offer a more favorable accuracy–efficiency trade-off. For batch-processing scenarios where inference time is less constrained, larger voting-based ensembles deliver superior detection performance.

3.8. Ablation Study

An ablation study was conducted to evaluate the effect of different weight assignment strategies on the performance of the WBF-based weighted voting ensemble. As shown in Table 13, three weight configurations were applied to the top three models (YOLOv11s, YOLOv5s, and YOLOv8s): equal weights, performance-rank-derived static weights, and dynamically optimized weights obtained via grid search. The results are compared against the individual model baselines to assess the contribution of each weighting strategy to overall detection performance.

A notable observation is that while the F1-score improved markedly, peaking at 0.6055 in the grid-search-optimized mode, the mAP@0.5 remained lower than that of the best individual model (YOLOv8). This behavior is characteristic of WBF assembling, which effectively penalizes detections from individual models. When a box is detected by only one model, its fused confidence is scaled by the model’s weight (e.g.,

w_{i} \times c o n f

), often pushing valid boxes below the confidence threshold and causing them to be dropped.

The grid search-optimized method (Method 3) identified

(0.40, 0.00, 0.60)

as the optimal configuration, effectively discarding the weakest model (YOLOv5) to maximize overall reliability. This indicates that for PCB detection, assembling effectively balances precision and recall (F1). However, the drop in confidence for non-consensus detections causes a minor decline in average precision relative to individual method models.

4. Discussion

The results demonstrate that all three ensemble fusion strategies improved upon the best individual model baseline (YOLOv8s, mAP@0.5 = 54.04%, F1 = 55.9%) in at least one key metric, though their accuracy, precision–recall trade-off, and computational costs differed substantially. Table 14 summarises one representative configuration per strategy, along with the individual model baseline, evaluated by F1-score, mAP@0.5, and the F1 efficiency index.

4.1. Detection Accuracy

Among the three fusion strategies, Consensus voting achieved the highest mAP@0.5 of 58.94%, a 9.1% relative improvement over YOLOv8s and a further 4.2% gain over NMS Top-3 (56.56%), while both methods ran at essentially the same speed (3.69 FPS vs. 4.95 FPS). The NMS improvement is attributable to prediction pooling across three complementary YOLO architectures, which recovers components missed by any single model; recall improved from 51.8% to 57.9% while precision remained stable at 60.4%. Consensus voting built on this by introducing an inter-model agreement requirement (>m/2), which filtered model-specific false positives and raised precision to 64.3% without sacrificing recall, confirming that a meaningful portion of NMS candidate boxes are noise uncorroborated by other architectures. Notably, expanding NMS beyond three models yielded diminishing returns (Top-4: 56.59%) and only +0.03% improvement, suggesting that the top three models already capture most of the available detections within the YOLO family.

The WBF-based grid search method presented a counterintuitive outcome: it achieved the highest F1-score of all configurations (60.6%) yet the lowest mAP@0.5 (52.26%), falling below even the individual model baseline. Figure 11 illustrates this confidence compression effect with a real detection from our dataset. An IC detection by a single YOLOv11s model with an initial confidence of 0.596 achieves a fused confidence of

0.596 \times 0.40 = 0.238

because YOLOv8s did not detect this IC. Since 0.238 is below the evaluation confidence threshold of 0.25, it is ignored. This divergence arises because WBF scales a detection’s fused confidence by the contributing model’s weight, causing single-model detections to drop below the evaluation threshold and reducing the area under the precision–recall curve. By contrast, agreement-based voting retains all single-model detections that meet the threshold criterion and applies no confidence penalty, which is why Consensus voting outperformed WBF in mAP@0.5 by 6.68%. The grid search further revealed that the optimal weights assigned zero contribution to YOLOv5s, effectively reducing the ensemble to two models, indicating that YOLOv5 predictions were too correlated with YOLOv8s to provide complementary information when using WBF fusion.

The per-class analysis reveals that ensemble benefit is inversely related to detection difficulty. For well-defined components, ensemble methods produce consistent gains: the electrolytic capacitor improved from 90.67% mAP (YOLOv5s) to 93.22% with Consensus Top-4, the IC from 65.38% (YOLOv11s) to 70.65% with Affirmative Top-6, and the connector from 59.28% (YOLOv8s) to 65.53% with Unanimous Top-4. These improvements indicate cross-model complementarity—different architectures capture distinct features such as shape, size, and texture, and ensemble fusion recovers missed detections by leveraging these complementary strengths. The capacitor class, however, remained a persistent outlier: despite representing 66.7% of all ground-truth instances, the best ensemble configuration achieved only 13.95% mAP (Unanimous Top-6), representing a 64.1% relative gain over the best individual model (8.50%) but still far below practical application. This result indicates that the challenge in detecting capacitors stems from their small size, high density, and similarity to background PCB features, rather than from insufficient model capacity, and that ensemble methods alone cannot replace class-specific solutions such as higher-resolution inputs or targeted augmentation.

4.2. Efficiency and Practical Implications

From an efficiency standpoint, all ensemble methods incurred a significant FEI penalty w.r.t the individual model baseline. YOLOv8s retained the highest FEI (0.672) by combining competitive F1 with high throughput (15.97 FPS), whereas NMS Top-3 (FEI = 0.411) offered the best balance among ensemble configurations by achieving a meaningful mAP gain at moderate latency (202.1 ms). Consensus Top-3 and grid search WBF converged to nearly identical FEI values, but Consensus Top-3 achieved 58.94% mAP@0.5 while grid search WBF reached only 52.26%, making Consensus voting a better candidate when both methods are jointly considered.

Based on these findings, we suggest the following deployment guidelines. For real-time conveyor belt applications requiring FPS above 10, YOLOv8s is still the best single-model choice, with an FEI of 0.672. For moderate cases needing 3–6 FPS, NMS Top-3 provides the most efficient ensemble, with an FEI of 0.411. In offline batch inspections where speed is not critical, Consensus Top-4 offers the highest mAP@0.5, with an FEI of 0.254. This framework helps users pick the right setup for their specific operational needs.

4.3. Limitations

Three limitations merit acknowledgment. First, all participating models share the same YOLO architectural family, which limits inter-model diversity; incorporating heterogeneous architectures such as transformer-based detectors would likely yield stronger complementarity and larger mAP gains. Second, the WBF weighted optimization was conducted on the same test data used for the final evaluation. This means that there is a potential risk of overfitting to the test distribution. The results of the WBF performance should therefore be viewed as an optimistic upper bound. A proper protocol would (a) split the dataset into proper sets for training, validation, and testing; (b) perform weight search on the validation partition; and (c) perform the final output on the testing set. Third, the capacitor class remained a fundamental bottleneck across all configurations. Despite comprising 66.7% of all ground-truth instances, it achieved at most a 13.95% mAP, indicating that ensemble fusion alone cannot resolve the visual ambiguity of small, densely packed capacitors and that dedicated solutions, such as class-specific augmentation or higher-resolution inputs, are warranted.

5. Conclusions

This study successfully developed an ensemble learning framework using the YOLO model series (ranging from YOLOv5 to YOLOv12) for real-time detection of PCB components, including microchips (ICs), capacitors, and connectors. By combining several YOLO versions, the system proved more accurate and reliable than any single model.

Among individual models, YOLOv8s achieved the highest mAP@0.5 of 54.04%, followed closely by YOLOv11s (53.96%) and YOLOv5s (53.66%). YOLOv12s achieved the highest precision (62.9%), making it a suitable choice when minimizing false positives is prioritized. The Top-4 ensemble configurations, combining YOLOv8s, YOLOv11s, YOLOv5s, and YOLOv9s, consistently outperformed individual models in mAP@0.5. The Consensus Voting Top-4 configuration achieved the highest mAP@0.5 (59.63%), a 10.3% relative improvement over the best individual model. WBF-based methods, while achieving the highest F1-score (60.55%), produced lower mAP@0.5 than individual models (51.84–52.26%), because WBF confidence scaling causes single-model detections to fall below evaluation thresholds, reducing the area under the precision–recall curve.

The system saw significant improvements in detection for ICs (+6.6%) and connectors (+8.4%). However, small capacitors remain difficult, with accuracy still below 0.14. Due to their tiny size, these parts often appear as background noise or solder spots to the AI. Additionally, while the ensemble method offers greater accuracy, it demands more “computational overhead,” requiring increased processing power compared to a single model.

Future research may explore several directions to address these limitations. These include incorporating super-resolution preprocessing to enhance small component visibility, investigating attention mechanisms specifically designed for tiny object detection, and optimizing ensemble fusion strategies to improve computational efficiency. Multi-scale feature pyramid networks and adaptive confidence thresholding based on component class could further enhance detection accuracy for challenging categories.

To overcome the current limitations, future research will focus on several key directions.

Super-resolution and attention mechanisms: We plan to incorporate super-resolution preprocessing and attention mechanisms to improve the detection of small components such as capacitors, enabling the model to focus on fine-grained features in dense layouts.
Optimized fusion strategies: We will investigate optimized ensemble fusion strategies to reduce computational overhead, aiming to achieve faster processing speeds without sacrificing detection accuracy.
Multi-scale feature pyramids: The adoption of multi-scale feature pyramids will allow the system to simultaneously recognize components across varying scales and viewing distances, improving robustness in real-world deployment scenarios.
Image preprocessing for e-waste recycling: We will explore preprocessing techniques such as decolorization to enhance model generalization, with the goal of extending the system to broader semiconductor e-waste recycling applications [31].
Multispectral detection: Combining RGB and hyperspectral imaging (HSI) data can enhance the detection of small components like capacitors. Arbash et al. [32] showed a 37.8% improvement in capacitor detection when using an RGB+HSI fusion model compared to an RGB-only model.

The YOLO-based ensemble framework provides an effective solution for automated recycling and manufacturing. Although detecting microscopic parts remains challenging, this system demonstrates strong potential for intelligent quality assurance and high-purity material recovery in the electronics sector.

In summary, future research may explore several directions to address these limitations. These include incorporating super-resolution preprocessing to enhance the visibility of small components, investigating attention mechanisms tailored to tiny object detection, and optimizing ensemble fusion strategies to improve computational efficiency. Multi-scale feature pyramid networks and adaptive confidence thresholding based on component class could further enhance detection accuracy for challenging categories.

Author Contributions

Conceptualization, S.A.; methodology, X.Z.; software, X.Z.; validation, X.Z.; formal analysis, X.Z.; investigation, X.Z.; resources, X.Z.; data curation, X.Z.; writing—original draft preparation, X.Z.; writing—review and editing, S.A.; visualization, X.Z.; supervision, S.A.; project administration, S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in the ensemble methods of YOLO models for PCB detection at https://github.com/Jerrychow0505/Ensemble-methods-of-YOLO-models-for-PCB-detection.git (accessed on 20 February 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Forti, V.; Baldé, C.P.; Kuehr, R.; Bel, G. The Global E-Waste Monitor 2020: Quantities, Flows, and the Circular Economy Potential; International Telecommunication Union: Geneva, Switzerland, 2020; pp. 1–119. [Google Scholar]
Selvakumar, S.; Adithe, S.; Isaac, J.S.; Pradhan, R.; Venkatesh, V.; Boopathi, S. A study of the printed circuit board (PCB) e-waste recycling process. In Sustainable Approaches and Strategies for E-Waste Management and Utilization; IGI Global: Hershey, PA, USA, 2023; pp. 159–184. [Google Scholar]
Mohsin, M.; Rovetta, S.; Masulli, F.; Cabri, A. Artificial Intelligence Approach for Waste-Printed Circuit Board Recycling: A Systematic Review. Computers 2025, 14, 304. [Google Scholar] [CrossRef]
Luo, S.; Wan, F.; Lei, G.; Xu, L.; Ye, Z.; Liu, W.; Zhou, W.; Xu, C. EC-YOLO: Improved YOLOv7 model for PCB electronic component detection. Sensors 2024, 24, 4363. [Google Scholar] [CrossRef] [PubMed]
Du, B.; Wan, F.; Lei, G.; Xu, L.; Xu, C.; Xiong, Y. YOLO-MBBi: PCB surface defect detection method based on enhanced YOLOv5. Electronics 2023, 12, 2821. [Google Scholar] [CrossRef]
Tang, J.; Liu, S.; Zhao, D.; Tang, L.; Zou, W.; Zheng, B. PCB-YOLO: An improved detection algorithm of PCB surface defects based on YOLOv5. Sustainability 2023, 15, 5963. [Google Scholar] [CrossRef]
Yin, X.; Zhao, Z.; Weng, L. MAS-YOLO: A lightweight detection algorithm for PCB defect detection based on improved YOLOv12. Appl. Sci. 2025, 15, 6238. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Jocher, G. Ultralytics YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 February 2026).
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 February 2026).
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Jocher, G.; Qiu, J. Ultralytics YOLO11. 2024. Available online: https://docs.ultralytics.com/models/yolo11 (accessed on 1 February 2026).
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Lin, W.C.; Wang, C.C.; Tsai, M.C.; Huang, C.Y.; Lin, C.C.; Tseng, M.H. A YOLO Ensemble Framework for Detection of Barrett’s Esophagus Lesions in Endoscopic Images. Diagnostics 2025, 15, 2290. [Google Scholar] [CrossRef] [PubMed]
Tsai, C.M.; Wu, L.L.; Chen, T.Y. Enhanced fisheye object detection via yolo ensemble learning and weighted box fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 19–20 October 2025; pp. 5237–5244. [Google Scholar]
Liu, L.; Zhou, B.; Liu, G.; Lian, D.; Zhang, R. Yolo-based multi-model ensemble for plastic waste detection along railway lines. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 7658–7661. [Google Scholar]
Hu, X.; Jeon, Y.; Gwak, J. Heterogeneous ensemble approaches for robust face mask detection in crowd scenes. J. Comput. Cogn. Eng. 2023, 2, 343–351. [Google Scholar] [CrossRef]
Roboflow Inc. Printed Circuit Board Dataset. 2024. Available online: https://universe.roboflow.com/roboflow-100/printed-circuit-board (accessed on 14 November 2025).
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
Solovyev, R.; Wang, W.; Gabruseva, T. Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis. Comput. 2021, 107, 104117. [Google Scholar] [CrossRef]
Şimşek, M.A.; Sertbaş, A.; Sasani, H.; Dinçel, Y.M. Automatic meniscus segmentation using YOLO-based deep learning models with ensemble methods in knee MRI images. Appl. Sci. 2025, 15, 2752. [Google Scholar] [CrossRef]
Casado-García, Á.; Heras, J. Ensemble methods for object detection. In ECAI 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2688–2695. [Google Scholar]
Ayunts, H.; Agaian, S.S.; Grigoryan, A.M. Solar Photovoltaic System Fault Classification via Hierarchical Deep Learning with Imbalanced Multi-Class Thermal Dataset. Energies 2026, 19, 462. [Google Scholar] [CrossRef]
Ayunts, H.; Agaian, S. No-reference quality metrics for image decolorization. IEEE Trans. Consum. Electron. 2023, 69, 1177–1185. [Google Scholar] [CrossRef]
Arbash, E.; Fuchs, M.; Rasti, B.; Lorenz, S.; Ghamisi, P.; Gloaguen, R. PCB-vision: A multiscene rgb-hyperspectral benchmark dataset of printed circuit boards. IEEE Sens. J. 2024, 24, 17140–17158. [Google Scholar] [CrossRef]

Figure 1. Evolution of YOLO algorithms through the years.

Figure 2. Stages of YOLO-based ensemble learning.

Figure 3. Sample images from the PCB dataset. (a) Training set sample showing diverse PCB layouts with annotated components, including integrated circuits, capacitors, and connectors. (b) Testing set sample. Both sets contain high-resolution images that capture the complexity and variability of electronic component arrangements on printed circuit boards.

Figure 4. Per-class labeled PCB images.

Figure 5. Illustration of how NMS and WBF handle overlapping, inaccurate predictions. Black boxes represent different models’ outputs; the green box represents the actual ground truth.

Figure 6. Flowchart of weighted voting algorithms. We first group the detection of the top 3 individual models (YOLOv5, YOLOv8, YOLOv11) then assign static or dynamic weights to each model before fusion. Finally, we use WBF to obtain the final predictions.

Figure 7. Ensemble detection workflow for PCB components. We first group the detection of top 3 individual models (YOLOv5, YOLOv8, YOLOv11), then apply three voting strategies—Affirmative, Consensus, and Unanimous—before fusion. Finally, we use NMS to obtain the final predictions.

Figure 8. Per-class detection performance: individual vs. ensemble.

Figure 9. Per-class PR curves for the best models.

Figure 10. Predictions of the best models.

Figure 11. WBF confidence suppression demonstrated on a real PCB image (cropped to the IC region). IC detection by YOLOv11s with conf = 0.596 is suppressed after WBF weighting (

0.596 \times 0.40 = 0.238 < 0.25

threshold) and does not appear in the WBF output.

Figure 11. WBF confidence suppression demonstrated on a real PCB image (cropped to the IC region). IC detection by YOLOv11s with conf = 0.596 is suppressed after WBF weighting (

0.596 \times 0.40 = 0.238 < 0.25

threshold) and does not appear in the WBF output.

Table 1. Evolution of YOLO architecture.

Version	Key Contributions	Framework	Strengths	Limitations
YOLOv1 (2015) [8]	Single-stage object detector	Darknet	Rapid inference speed	Poor small object detection
YOLOv2 (2016) [9]	Multi-scale training, anchor boxes	Darknet	Better small object detection	Higher computation cost
YOLOv3 (2018) [10]	SPP block, Darknet-53 backbone	Darknet	Multi-scale capability	Resource-intensive
YOLOv4 (2020) [11]	Mish activation, CSPDarknet-53, PANet	Darknet	Robust to visual complexity	Complex configuration
YOLOv5 (2020) [12]	SPPF, AutoAnchor, PANet	PyTorch	Lightweight, easy deployment	Requires domain tuning
YOLOv6 (2022) [13]	EfficientRep, self-attention	PyTorch	Enhanced precision	High GPU demand
YOLOv7 (2022) [14]	E-ELAN, reparameterization	PyTorch	Accurate in cluttered scenes	Efficiency drops at scale
YOLOv8 (2023) [15]	Anchor-free, C2f module, DFL	PyTorch	Fast, multi-task support	Minute object difficulty
YOLOv9 (2024) [16]	PGI, GELAN	PyTorch	Generalizes well on small data	Slower inference
YOLOv10 (2024) [17]	NMS-free, dual label assignments	PyTorch	Fastest, most compact	Lower accuracy on occlusions
YOLO11 (2024) [18]	C3k2 block, C2PSA self-attention	PyTorch	Balanced speed and accuracy	—
YOLOv12 (2025) [19]	Area Attention (A2), R-ELAN, FlashAttention	PyTorch	Superior feature extraction	High latency and memory use

Table 2. Open -source links of each YOLO model.

Version	Link	Source	Comments
YOLOv1 (2015) [8]	https://github.com/nsoul97/yolov1_pytorch# (accessed on 1 February 2026)	Redmon et al.	Original link no longer contains the source code
YOLOv2 (2016) [9]	https://github.com/dwaithe/yolov2 (accessed on 1 February 2026)	Redmon et al.	Original link no longer contains the source code
YOLOv3 (2018) [10]	https://github.com/ultralytics/yolov3 (accessed on 1 February 2026)	Ultralytics
YOLOv4 (2020) [11]	https://github.com/AlexeyAB/darknet (accessed on 1 February 2026)	Darknet
YOLOv5 (2020) [12]	https://github.com/ultralytics/yolov5 (accessed on 1 February 2026)	Ultralytics	Used in study
YOLOv6 (2022) [13]	https://github.com/meituan/YOLOv6 (accessed on 1 February 2026)	Meituan
YOLOv7 (2022) [14]	https://github.com/WongKinYiu/yolov7 (accessed on 1 February 2026)	Wang et al.
YOLOv8 (2023) [15]	https://github.com/ultralytics/ultralytics (accessed on 1 February 2026)	Ultralytics	Used in study
YOLOv9 (2024) [16]	https://github.com/WongKinYiu/yolov9 (accessed on 1 February 2026)	Wang et al.	Used in study
YOLOv10 (2024) [17]	https://github.com/THU-MIG/yolov10 (accessed on 1 February 2026)	THU-MIG	Used in study
YOLOv11 (2024) [18]	https://docs.ultralytics.com/models/yolo11 (accessed on 1 February 2026)	Ultralytics	Used in study
YOLOv12 (2025) [19]	https://github.com/sunsmarterjie/yolov12 (accessed on 1 February 2026)	Tian et al.	Used in study

Table 3. Comparison of fusion techniques and ensemble voting strategies.

Method	Mechanism	Advantages	Limitations
NMS	Selects the highest-confidence box and suppresses all overlapping boxes exceeding a fixed IoU threshold [25].	Simple and computationally efficient No additional hyperparameters beyond IoU threshold Works well when model predictions are largely non-redundant	Treats all models equally regardless of performance Precision degrades as ensemble size grows due to false-positive accumulation
Voting-Based Strategies	Retains detections based on the number of models agreeing: Affirmative (≥1), Consensus (>m/2), or Unanimous (all models) [29].	Provides an explicit precision–recall trade-off via the agreement threshold Consensus voting is generally the most robust Unanimous voting minimizes false positives in high-precision applications	Stricter thresholds significantly reduce recall No single threshold optimizes both precision and recall across all component classes
Weighted Voting with WBF	Fuses all predicted boxes by computing confidence-weighted averages across models [27].	Utilizes all model predictions without discarding any Produces more accurate localization when models agree on presence but differ in position Dynamic weighting can assign zero weight to redundant models	Unconfirmed detections are suppressed when model weights are low Static weights cannot adapt to per-class variation Grid search is computationally expensive and weights may not generalize

Table 4. Performance comparison of individual YOLO models on PCB component detection.

Model	Precision	Recall	mAP@0.5	Time (ms)	FPS
YOLOv5s	57.8%	51.0%	53.66%	62.2	16.1
YOLOv8s	60.6%	51.8%	54.04%	62.6	16.0
YOLOv9s	60.5%	51.9%	53.51%	80.3	12.5
YOLOv10s	51.4%	49.2%	46.77%	131.3	7.6
YOLOv11s	60.1%	53.3%	53.96%	76.2	13.1
YOLOv12s	62.9%	48.6%	52.78%	123.9	8.1

Bold values indicate the best performance in each column.

Table 5. Performance of NMS-based ensemble configurations.

Config	Models	Precision	Recall	mAP@0.5	Time (ms)	FPS
Top-2	YOLOv8s, YOLOv11s	60.3%	57.4%	56.42%	155.6	6.4
Top-3	+YOLOv5s	60.4%	57.9%	56.56%	202.1	4.9
Top-4	+YOLOv9s	60.4%	57.8%	56.59%	282.9	3.5
Top-6	+YOLOv12s, YOLOv10s	59.1%	58.6%	56.54%	538.3	1.9

Bold values indicate the best mAP@0.5 among NMS-based ensemble configurations.

Table 6. Performance of voting-based ensemble configurations.

Config	Strategy	Precision	Recall	mAP@0.5	Time (ms)	FPS
Top-3	Affirmative	61.6%	59.6%	58.81%	269.9	3.7
Top-3	Consensus	64.3%	56.6%	58.94%	271.0	3.7
Top-3	Unanimous	64.3%	56.6%	58.94%	272.9	3.7
Top-4	Affirmative	59.8%	60.9%	58.67%	396.5	2.5
Top-4	Consensus	63.2%	59.3%	59.63%	384.9	2.6
Top-4	Unanimous	67.2%	54.0%	59.11%	388.8	2.6
Top-6	Affirmative	60.1%	62.0%	59.25%	679.8	1.5
Top-6	Consensus	60.7%	61.7%	59.37%	679.3	1.5
Top-6	Unanimous	67.4%	48.3%	57.12%	677.3	1.5

Bold values indicate the best mAP@0.5 among voting-based ensemble configurations.

Table 7. Class characteristics and detection difficulty.

Class	GT Boxes	Best Individual AP@0.5	Best Model
Electrolytic Capacitor	166	90.67%	YOLOv5s
IC	1026	65.38%	YOLOv11s
Connector	714	59.28%	YOLOv8s
Capacitor	3630	8.50%	YOLOv12s

Table 8. Per-class mAP@0.5 performance of individual YOLO models.

Model	IC	Capacitor	Connector	Electrolytic Capacitor
YOLOv5s	61.02%	7.61%	55.33%	90.67%
YOLOv8s	63.84%	6.83%	59.28%	86.21%
YOLOv9s	63.18%	6.78%	56.00%	88.08%
YOLOv10s	60.56%	6.41%	39.31%	80.79%
YOLOv11s	65.38%	7.65%	54.78%	88.02%
YOLOv12s	63.79%	8.50%	52.87%	85.98%

Table 9. Direct comparison between NMS and voting-based ensemble strategies.

Config	Strategy	Precision	Recall	mAP@0.5	Time (ms)
Top-4	NMS	60.4%	57.8%	56.59%	282.9
	Affirmative	59.8%	60.9%	58.67%	396.5
	Consensus	63.2%	59.3%	59.63%	384.9
	Unanimous	67.2%	54.0%	59.11%	388.8
Top-6	NMS	59.1%	58.6%	56.54%	538.3
	Affirmative	60.1%	62.0%	59.25%	679.8
	Consensus	60.7%	61.7%	59.37%	679.3
	Unanimous	67.4%	48.3%	57.12%	677.3

Bold values indicate the best performance in each column within each configuration group (Top-4 and Top-6).

Table 10. Per-class performance: individual vs. ensemble comparison.

Class	Configuration	Precision	Recall	mAP@0.5	Improvement
IC	YOLOv11s (Best)	64.4%	66.5%	65.38%	–
	Affirmative Top-6	68.1%	70.7%	70.65%	+8.1%
Connector	YOLOv8s (Best)	74.5%	51.5%	59.28%	–
	Unanimous Top-4	75.9%	57.8%	65.53%	+10.5%
Electrolytic Cap.	YOLOv5s (Best)	94.2%	78.9%	90.67%	–
	Consensus Top-4	96.7%	88.0%	93.22%	+2.8%
Capacitor	YOLOv12s (Best)	11.0%	16.4%	8.50%	–
	Unanimous Top-6	23.5%	8.7%	13.95%	+64.1%

Table 11. Overall performance comparison between individual and ensemble models.

Configuration	Precision	Recall	mAP@0.5	Time (ms)
YOLOv8s (Best Individual)	60.6%	51.8%	54.04%	62.6
NMS Top-4 (Best NMS)	60.4% (−0.3%)	57.8% (+11.6%)	56.59% (+4.7%)	282.9 (4.5×)
Consensus Top-4 (Best Voting)	63.2% (+4.3%)	59.3% (+14.5%)	59.63% (+10.3%)	384.9 (6.1×)

Table 12. Computational efficiency comparison across individual and ensemble models.

Configuration	F1	Time (ms)	Throughput (FPS)	FEI
YOLOv8s	55.9%	62.6	15.97	0.672
NMS Top-2	58.8%	155.6	6.43	0.475
NMS Top-3	59.1%	202.1	4.95	0.411
NMS Top-4	59.1%	282.9	3.54	0.324
NMS Top-6	58.8%	538.3	1.86	0.158
Affirmative Top-3	60.6%	269.9	3.70	0.344
Affirmative Top-4	60.3%	396.5	2.52	0.242
Affirmative Top-6	61.0%	679.8	1.47	0.102
Consensus Top-3	60.2%	271.0	3.69	0.341
Consensus Top-4	61.2%	384.9	2.60	0.254
Consensus Top-6	61.2%	679.3	1.47	0.102
Unanimous Top-3	60.2%	272.9	3.66	0.339
Unanimous Top-4	59.9%	388.8	2.57	0.246
Unanimous Top-6	56.3%	677.3	1.48	0.096

Table 13. Ablation study comparing individual YOLO models and weighted ensemble methods.

Configuration	Weights (v11, v5, v8)	mAP@0.5	F1	FEI
YOLOv11s	(1.00, 0.00, 0.00)	0.5396	0.5650	0.632
YOLOv5s	(0.00, 1.00, 0.00)	0.5366	0.5420	0.654
YOLOv8s	(0.00, 0.00, 1.00)	0.5404	0.5590	0.673
Method 1: Equal	(0.34, 0.33, 0.33)	0.5186	0.5917	0.336
Method 2: Performance-based	(0.50, 0.30, 0.20)	0.5184	0.5950	0.337
Method 3: Grid Search	(0.40, 0.00, 0.60)	0.5226	0.6055	0.343

Bold values indicate the best performance in each column within each group (individual models and ensemble methods). WBF ensemble inference time estimated at ∼271 ms (201 ms sequential model inference + ∼70 ms WBF fusion overhead).

Table 14. Performance and efficiency of representative ensemble configurations.

Configuration	F1	mAP@0.5	FEI
YOLOv8s (Best Individual)	55.9%	54.04%	0.672
NMS Top-3	59.1%	56.56%	0.411
Consensus Voting Top-3	60.2%	58.94%	0.341
Grid Search WBF (0.40/0.00/0.60)	60.6%	52.26%	0.343

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, X.; Agaian, S. Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling. Information 2026, 17, 322. https://doi.org/10.3390/info17040322

AMA Style

Zhou X, Agaian S. Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling. Information. 2026; 17(4):322. https://doi.org/10.3390/info17040322

Chicago/Turabian Style

Zhou, Xinglong, and Sos Agaian. 2026. "Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling" Information 17, no. 4: 322. https://doi.org/10.3390/info17040322

APA Style

Zhou, X., & Agaian, S. (2026). Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling. Information, 17(4), 322. https://doi.org/10.3390/info17040322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. YOLO Model Training

2.3. YOLO Ensemble Learning Strategy

2.3.1. Bounding Box Fusion Techniques

2.3.2. Weighted Voting WBF Strategies

2.3.3. Ensemble Voting Strategies

2.3.4. Implementation

2.4. Evaluation Metrics

3. Results

3.1. Individual YOLO Model Performance

3.2. NMS-Based Ensemble Performance

3.3. Voting-Based Ensemble Performance

3.4. Per-Class Performance Analysis

3.5. Comparison Between NMS and Voting-Based Ensembles

3.6. Individual vs. Ensemble Model Comparison

3.7. Computational Efficiency Analysis

3.8. Ablation Study

4. Discussion

4.1. Detection Accuracy

4.2. Efficiency and Practical Implications

4.3. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI