GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg

Qiu, Zeyang; Huang, Xueyu; Sun, Zhaojie; Li, Sifan; Wang, Jionghui

doi:10.3390/su17125663

Open AccessArticle

GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg

by

Zeyang Qiu

^1,2

,

Xueyu Huang

^1,3,*,

Zhaojie Sun

¹,

Sifan Li

¹ and

Jionghui Wang

⁴

¹

School of Software Engineering, Jiangxi University of Science and Technology, Nanchang 330013, China

²

Yichun Lithium New Energy Industry Research Institute, Jiangxi University of Science and Technology, Yichun 336000, China

³

School of Electronic Information Industry, Jiangxi University of Science and Technology, Ganzhou 341600, China

⁴

Minmetals Exploration and Development Co., Ltd., Beijing 100010, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(12), 5663; https://doi.org/10.3390/su17125663

Submission received: 19 May 2025 / Revised: 12 June 2025 / Accepted: 17 June 2025 / Published: 19 June 2025

(This article belongs to the Special Issue Data-Driven Sustainable Development: Techniques and Applications)

Download

Browse Figures

Versions Notes

Abstract

Efficient identification and removal of low-grade minerals during graphite ore processing is essential for improving product quality, optimizing resource recovery, and promoting sustainable production. To address the limitations of traditional sorting methods and performance bottlenecks in edge devices, this paper proposes a lightweight instance segmentation model, GS-YOLO-seg, for rapid identification and intelligent sorting of low-grade graphite ore in industrial production lines. The model first reduces network depth by adjusting the depth factor. Subsequently, the backbone network adopts the lightweight and efficient GSConv to perform downsampling, while a novel C3k2-Faster architecture is proposed to improve the effectiveness of feature extraction. Finally, the Segment-Efficient segmentation head is optimized to reduce redundant computations, further lowering the model load. On a self-constructed graphite ore image dataset, GS-YOLO-seg achieved comparable segmentation performance to the baseline YOLO11n-seg, while achieving a 30% reduction in FLOPs, 59% fewer parameters, 56% smaller model size, and 8% higher FPS. This method enhances the intelligence of the sorting process, preventing low-grade ores from entering subsequent stages, thus reducing resource waste, energy consumption, and carbon emissions, providing crucial technical support and feasible deployment pathways for building intelligent, green, and sustainable mining systems.

Keywords:

sustainable production; green mining; instance segmentation; graphite ore; YOLO11-seg; lightweight network; edge deployment

1. Introduction

With technological advancements and the rapid growth of the new energy vehicle industry, graphite, as a key anode material for lithium-ion batteries, has seen continuously increasing demand in critical fields such as high-end manufacturing and energy storage systems. It has become a strategic resource essential for securing the new energy supply chain and enhancing industry competitiveness [1,2,3,4]. The International Energy Agency (IEA) estimates that achieving net-zero emissions by 2040 will drive graphite demand to grow by 20 to 25 times, thereby requiring global mineral output to rise by over 300% compared to 2020 [5]. Against this backdrop, the efficient development and green utilization of graphite mineral resources have become focal points for sustainable development worldwide [6].

Graphite ore is commonly associated with various types of other minerals, especially low-grade minerals characterized by low effectiveness fixed carbon content and high impurity levels. Insufficient pre-screening of large amounts of low-grade ore leads to higher energy and material costs during processing and reduces the end-product graphite quality, ultimately disrupting industrial efficiency and product stability [7]. Therefore, developing accurate and efficient methods for identifying low-grade minerals is crucial for improving resource utilization, ensuring product quality, and advancing green mining initiatives.

Traditional ore sorting methods primarily rely on complex equipment and manual operations, often supplemented by chemical treatments for mineral separation. These approaches are characterized by intricate processing workflows and high energy consumption, while inevitably causing environmental pollution and ecological burdens due to the use of chemical reagents. Moreover, their strong dependency on manual intervention and low automation levels make them inadequate to meet the demands of modern mining industries for green, intelligent, and efficient sorting. In recent years, with the rapid advancement of deep learning and computer vision technologies, non-contact mineral identification methods based on object detection or image segmentation have emerged as critical research directions in intelligent ore sorting [8,9]. These methods enable automatic extraction of texture, color, and structural features from ore images, facilitating precise delineation and recognition of regions with varying grades. Such advancements significantly enhance the intelligence and sustainability of beneficiation systems, optimize mineral sorting processes from the source, reduce wastage of high-quality resources, and effectively support the strategic implementation of green mining and “dual carbon” initiatives. Nevertheless, existing image segmentation models still face significant limitations in terms of parameter scale, computational complexity, and real-time responsiveness, which hinder their practical deployment in mining sites, especially on edge devices [10,11].

This study addresses the aforementioned challenges by proposing a lightweight graphite ore image instance segmentation model designed for industrial production lines and deployable on edge devices. The model accurately identifies low-grade regions during the primary ore processing stage, enabling early removal of redundant materials, thereby reducing subsequent ineffective operations and significantly lowering resource consumption and carbon emissions. This approach aims to provide critical technical support and a feasible implementation pathway for building intelligent, green, and sustainable mining systems, demonstrating the broad application potential of data-driven methods in resource development and sustainable utilization.

This study outlines its core contributions and methodology as outlined below:

By employing high-resolution imaging equipment on industrial production lines to capture ore images under varying exposure conditions, followed by extensive manual annotation, a graphite ore dataset was established, comprising three exposure durations and two target grade levels.
By adjusting the network’s depth factor, the global depth was effectively compressed, resulting in a more compact YOLO11t-seg architecture compared to the original model. This modification significantly reduced redundant operations in certain components while preserving core segmentation performance, thereby achieving a substantial reduction in parameter count.
GSConv was introduced into the backbone to replace standard convolutions, effectively maintaining model accuracy while avoiding additional computational overhead. In addition, the Faster Block was integrated into the C3k2 module to construct a novel feature extraction module, termed C3k2-Faster. This design significantly improved global feature extraction efficiency through partial convolution, without incurring noticeable accuracy loss.
The segmentation head was streamlined by removing redundant depthwise separable convolution operations, leading to the development of an efficient segmentation head, termed Segment-Efficient. This optimization significantly reduced model complexity while further enhancing deployment efficiency on resource-constrained devices.

The structure of this paper is arranged as follows. Section 2 offers a comprehensive overview of existing studies related to the analysis of graphite ore. In Section 3, a refined segmentation framework tailored for graphite ore is introduced, with emphasis on its architectural improvements. Section 4 outlines the procedures for dataset compilation, elaborates on the experimental configurations, and explains the performance assessment criteria. Section 5 delivers a thorough evaluation of the experimental findings, incorporating comparative analysis with baseline models. Lastly, Section 6 encapsulates the study’s core contributions and proposes future research trajectories.

2. Related Work

Traditional approaches for analyzing the grade of graphite ore predominantly depend on manual visual assessment and physicochemical techniques, typically involving instrumentation such as XRD, SEM-EDS, and OM [12,13]. For example, Ye et al. [14] developed a method to determine fixed carbon using a high-frequency infrared carbon-sulfur analyzer, incorporating HNO₃ pretreatment and flux fusion for accurate detection of low carbon content. Li et al. [15] proposed a titration-based approach for crystalline graphite using KMnO₄, AlCl₃, and concentrated H₂SO₄ to enhance pretreatment and reduce interference. These methods are limited by destructive sampling, long testing cycles, high costs, and potential chemical pollution. Additionally, traditional manual sorting is labor-intensive, subjective, and inefficient, making it unsuitable for modern requirements of intelligent, efficient, and sustainable mineral processing [16].

Driven by the swift progress in computer vision technologies, deep learning techniques have become forefront approaches for intelligent mineral identification. These algorithms possess the capability to automatically capture intricate nonlinear correlations between visual characteristics and mineral grades directly from images. This advancement surpasses the constraints of conventional rule-based models, greatly minimizing human bias and subjectivity, and thereby enhancing the precision of mineral classification and recognition tasks [17]. For instance, Okada et al. [18] applied NCA in combination with machine learning algorithms to optimize wavelength selection for arsenic mineral detection, achieving over 91.9% accuracy with minimal spectral bands, thus enhancing the efficiency and practicality of multispectral ore sorting. Es-sahly et al. [19] utilized SWIR spectroscopy combined with machine learning models to pre-concentrate low-grade sedimentary copper ores, achieving up to 90% classification accuracy and a 43% upgrading rate, thereby validating the potential of NIR-based sorting in copper beneficiation. Theerthagiri et al. [20] developed a deep residual neural network (D-Resnet) for classifying beach sand minerals, achieving 89% accuracy across six mineral types while reducing dependency on high-resolution imagery through effective convolutional feature selection.

However, existing deep learning-based approaches for graphite ore recognition still face several limitations. For example, Krebbers et al. [21] integrated X-ray and CT with image processing to develop a 3D structure scanning and recognition framework for graphite ore, enabling quantitative characterization of critical mineralogical features. While this approach provides high recognition accuracy and fine-scale structural resolution, it depends on complex sample preparation and laboratory setups, with limited imaging coverage, making it unsuitable for real-time and large-scale industrial deployment. Huang et al. [22] proposed an image recognition model incorporating pyramid convolutions and spatial attention mechanisms, achieving 92.5% accuracy and significantly advancing the intelligent recognition of graphite grade. Nevertheless, the model was designed for single-piece classification and lacks the capacity for batch-level, rapid sorting in industrial settings. Similarly, Shu et al. [23] developed a hybrid ResNet-Transformer architecture that enhances both global and local feature modeling through multi-level fusion, attaining 93.4% accuracy on 19 graphite ore classes. Despite its strong performance under laboratory conditions, the model is not readily scalable to high-throughput processing and remains limited to object-level detection, which hinders pixel-wise quality assessment via segmentation.

In addition, most existing models suffer from excessive complexity and large parameter counts. For instance, the improved Inception-ResNet model proposed by Pan et al. [24] achieves over 90% accuracy but contains tens of millions of parameters, making it impractical for deployment on edge devices or in resource-constrained environments due to limited inference speed and high computational cost. These constraints substantially hinder the applicability of such models in real-world industrial scenarios.

To facilitate smart and sustainable development in graphite ore processing, it is imperative to develop a segmentation model that integrates both accuracy and lightweight architecture. The model should maintain strong segmentation performance while ensuring computational efficiency for deployment on edge devices. Accurate identification and early removal of low-grade ore can reduce unnecessary processing, lower energy and chemical usage, and improve overall resource efficiency and product quality. This provides essential support for building efficient, green, and intelligent mining systems.

3. Methods

3.1. The Original YOLO11-Seg Network

YOLO11 [25], as one of the leading object detection algorithms in terms of performance, represents a systematic enhancement and upgrade built upon the PGI framework established in its earlier versions. Its major improvements are summarized as follows: (i) Reconstruction of the feature extraction unit: The original C2f module has been replaced with the newly designed C3k2 module, which integrates the structural advantages of both C2f and C3. This redesign enhances information flow while maintaining feature richness, effectively reducing the number of parameters and improving computational efficiency. (ii) Enhanced multi-scale feature perception: A new C2PSA module is added after the original SPPF module to strengthen the model’s ability to perceive objects at various scales, particularly in cases of occlusion, thereby improving overall detection robustness. (iii) Optimization of the detection head: Depthwise separable convolutions are introduced to replace conventional convolutional layers, partially reducing computational cost and accelerating inference speed. The overall network architecture and its core components are illustrated in Figure 1.

3.2. The Improved GS-YOLO-Seg Network

Given the limited computational resources typically available in practical industrial environments, this study proposes a YOLO11-based segmentation framework tailored for low-grade graphite ore, designed to maintain high accuracy while ensuring lightweight performance. The overall architecture is illustrated in Figure 2.

The model retains the core structure of YOLO11-seg, comprising the backbone, neck, and head components, while incorporating several key optimizations. First, the global network depth was reduced to half by adjusting the depth factor, significantly decreasing the parameter count and computational burden. Second, GSConv [26] was introduced into the backbone to replace standard convolutions as a new downsampling module, thereby improving computational efficiency without compromising accuracy. Furthermore, enhancements were made to the traditional C3k2 module through the incorporation of the Faster Block [27], yielding a streamlined version named C3k2-Faster. This improved design employs partial convolution (PConv), which restricts convolutional operations to a selected subset of channels, thereby decreasing computational burden. Additionally, in light of the focused nature of the application scenario and the limited range of target mineral classes, the segmentation head was simplified by eliminating depthwise separable convolutions. A newly designed structure, named Segment-Efficient, was introduced to maintain essential segmentation capabilities while notably reducing model complexity and improving deployment performance on devices with restricted computational resources.

The upcoming subsections systematically provide an in-depth explanation of each of these optimized components and the underlying technical details.

3.3. YOLO11t-Seg

Scalability in the YOLO11 framework is controlled by adjusting width and depth parameters, which affect channel dimensions and structural repetition. Although larger models improve feature extraction, they incur higher computational costs. Given the limited class set in this industrial application, the YOLO11n-seg model remains overly complex. To improve efficiency, a globally compressed depth strategy was adopted, resulting in a lightweight variant named YOLO11t-seg.

This model maintains the original global width while reducing the depth factor to half of its original value, thereby effectively lowering the repetition of feature extraction modules and avoiding redundant operations. A 50% reduction in the repetition of modules was applied to the Backbone (layers 2, 4, 6, and 8) and Neck (layers 13, 16, 19, and 22) components of YOLO11n-seg to decrease structural redundancy. This adjustment alleviates redundant computations, decreases network depth, and significantly reduces the number of parameters. Table 1 provides a detailed comparison among YOLO11n-seg, YOLO11s-seg, and the depth-compressed variant YOLO11t-seg in terms of scaling factors, the number of repeats in feature extraction modules (Repeats), and parameter counts (Params).

3.4. GSConv

GSConv is a lightweight convolutional structure that combines the advantages of standard convolution and depthwise separable convolution (DWConv) [28]. Through sophisticated channel partitioning and information mixing fusion strategies, it can significantly reduce the number of parameters and computation while still effectively improving the model’s feature expression ability. A schematic diagram of the convolution process is shown in Figure 3.

Specifically, for an input tensor

X \in R^{h \times w \times c_{1}}

(height

h

, width

w

, and input channel number

c_{1}

), the convolution operation in the standard convolution branch can be expressed as:

X_{C o n v} = W_{C o n v}^{3 \times 3} * X, X_{C o n v} \in R^{h \times w \times \frac{c_{2}}{2}}

(1)

In this expression, “∗” stands for the convolution operation, while

c_{2}

indicates the count of output channels. The DWConv branch can be similarly represented as follows:

X_{D W C o n v} = W_{D W C o n v}^{3 \times 3} ⊙ X, X_{D W C o n v} \in R^{h \times w \times \frac{c_{2}}{2}}

(2)

Here, the symbol “⊙” signifies the depthwise separable convolution. Subsequently, the outputs from the two branches are fused through a concatenation (Concat) operation, followed by a channel shuffle mechanism to promote effective interaction and complementarity between features extracted by standard convolution and depthwise separable convolution. This strategy alleviates information isolation across feature channels, facilitating the generation of more uniformly distributed and richly expressed feature maps. The entire process of GSConv can be summarized as follows:

X_{G S C o n v} = S h u f f l e (C o n c a t (X_{C o n v}, X_{D W C o n v})), X_{G S C o n v} \in R^{h \times w {\times c}_{2}}

(3)

The primary benefit of GSConv is its capacity to merge the advantages of traditional convolution and depthwise separable convolution efficiently by employing a channel shuffle mechanism. This design preserves implicit inter-channel connectivity while significantly reducing computational complexity. As a downsampling strategy, GSConv maintains model prediction accuracy without introducing additional computational overhead, making it particularly suitable for lightweight network architectures.

3.5. C3k2-Faster

Although C3k2 enhances both feature representation and semantic fusion through a more dynamic feature flow compared to previous extraction modules, its inference efficiency can be insufficient under specific conditions, falling short of the high-performance demands of industrial edge environments. To address this limitation, this paper introduces an improved version of the original C3k2 module within the YOLO11-seg architecture, termed C3k2-Faster, specifically tailored for rapid sorting of graphite ore. The overall design of the proposed module is illustrated in Figure 4.

The C3k2-Faster module improves computational efficiency by introducing the Faster Block in the core bottleneck, while retaining a multi-branch structure to enhance feature diversity. Specifically, the input features

X \in R^{h \times w \times c}

are first preprocessed using a standard CBS module (i.e., Conv-BN-SiLU). A Split operation is then applied to divide the features into two branches, denoted as

X_{1}

and

X_{2}

.

X_{1}

is directly reserved for the final concatenation, whereas

X_{2}

is sequentially passed through multiple Faster Block sub-units, denoted as

F

, to perform deep feature extraction. Subsequently, the features from

X_{1}

and the processed output of the bottleneck path are concatenated along the channel dimension, followed by another CBS module to produce the final output

Y

. The entire process can be formally expressed as:

Y = C B S (C o n c a t (X_{1}, F^{(n)} (X_{2}))), X_{1}, X_{2} = S p l i t (X)

(4)

Within the Faster Block bottleneck, PConv and PWConv are integrated. The main branch input

X_{2}

first undergoes a 3 × 3 PConv to effectively capture local spatial details, then a 1 × 1 PWConv to promote inter-channel communication and modify channel dimensions. The output is then normalized via Batch Normalization and passed through a ReLU activation to boost nonlinear expressiveness. After the main branch is processed through the PBR sequence (PWConv-BN-ReLU), the output features are further passed through an additional 1 × 1 PWConv and then fused with the shortcut branch

X_{2}^{'}

via residual connection, denoted as “+”. This fusion with the original input to the bottleneck improves information integration and stabilizes the gradient flow. Mathematically, the full procedure of the Faster Block is described as:

F (X) = X^{'} + {P W C o n v}_{1 \times 1} ({P B R}_{1 \times 1} ({P C o n v}_{3 \times 3} (X)))

(5)

Partial Convolution serves as the core operation enabling faster feature extraction in the C3k2-Faster module. Only a fraction of the input channels undergo convolution, whereas the remaining channels are transmitted straight to the output. This design helps optimize memory access and reduce parameter count. A comparison between the workflows of PConv and standard convolution is illustrated in Figure 5.

To highlight the significant reduction in parameter count achieved by PConv compared to standard convolution, consider an input tensor of size

h \times w \times k

, where

h

and

w

represent the height and width of the feature map, and

k

denotes the kernel size. Let

c_{p}

be the number of channels processed by standard convolution in PConv. When the ratio

c_{p} / c_{1} = 1 / 4

, the remaining

({c_{1} - c}_{p})

channels bypass convolution and are directly mapped to the output. For the sake of comparability, we further assume that the output channel count

c_{2}

equals the input channel count

c_{1}

, and the kernel size is fixed at 3. Under these assumptions, the theoretical number of parameters for standard convolution and PConv can be approximated as follows:

{C o n v}_{p a r a m s} = c_{1} \times c_{2} \times k^{2} = 9 c_{1}^{2}

(6)

{P C o n v}_{p a r a m s} = c_{p} \times c_{2} \times k^{2} = \frac{9}{4} c_{1}^{2}

(7)

Under the assumed conditions, PConv achieves a parameter count that is 1/4 of that of standard convolution, significantly reducing computational complexity. Building on this, the Faster Block and C3k2-Faster structures preserve the core feature extraction capabilities of the original feature extractor, while improving efficiency during both training and inference stages without significant loss in accuracy. This makes them highly suitable for industrial scenarios with limited computational resources or stringent real-time requirements.

3.6. Segment-Efficient

Given that this study focuses on graphite ore grade segmentation in a single industrial scenario with a limited number of target categories, there is room to further optimize the original segmentation head. In order to better satisfy particular application demands, a streamlined prediction module named Segment-Efficient is introduced, which eliminates redundant processes within the segmentation head. While maintaining the decoupled design between the bounding box regression and classification tasks, the module removes all DWConvs from the classification loss branch, effectively simplifying the head structure. This approach maintains the separation of localization, classification, and segmentation tasks, prevents feature interference, lowers computational demands, boosts inference speed, and facilitates deployment on resource-limited edge devices.

Following the design principles of YOLACT [29], the model head is structured into two distinct branches: one for object detection and the other for instance segmentation. The detection branch outputs class labels, bounding box coordinates, and mask coefficients, which are used to compute classification and box regression losses. The segmentation branch generates global prototype masks. After non-maximum suppression (NMS), the final instance masks are obtained by combining the predicted mask coefficients with the prototype masks through matrix multiplication, followed by binary cross-entropy (BCE) loss optimization. Final segmentation results are produced via cropping and thresholding operations. The architecture of the head incorporating the Segment-Efficient module is illustrated in Figure 6.

4. Data and Experimental Preparation

4.1. Dataset Construction

The image data used in this study were collected from China Minmetals Corporation (Heilongjiang) Graphite Industry Co., Ltd. (Hegang City, China), comprising medium-to-coarse graphite ores (2–15 cm) sampled from the conveyor belt post-secondary crushing. These samples exhibit clear textural and compositional features suitable for vision-based recognition. Image acquisition was performed using a Canon EOS 5D Mark II DSLR (Canon Corporation, Kunisaki City, Japan) at an original resolution of 5616 × 3744 pixels. To meet model input requirements and industrial constraints, images were cropped and resized to 3744 × 3744 pixels.

To mitigate the effects of conveyor speed, ore types, and lighting variations on texture visibility in industrial sorting, a multi-exposure imaging strategy was employed to enhance surface detail and dataset diversity. Specifically: (i) Short exposure (400 µs) reduces motion blur during rapid ore movement; (ii) Medium exposure (600 µs) offers balanced brightness for general texture capture; (iii) Long exposure (800 µs) improves texture visibility under low-light or low-reflectivity conditions. This approach enhances model robustness and sorting stability in real-world scenarios. A total of 1978 images were collected: 652 (400 µs), 730 (600 µs), and 596 (800 µs), with examples shown in Figure 7.

Ore grade is visually correlated with surface features: high-grade primary ore appears darker due to fewer impurities and stronger light absorption, while low-grade ore appears brighter due to light-colored impurities (e.g., quartz, silicate oxides) and higher surface reflectivity. Based on sorting requirements, low-grade ore was further divided into below-average (“below”) and low-grade (“low”) subcategories, annotated using Labelme [30]. Annotations were saved in JSON and converted to YOLO-compatible TXT format. The final dataset includes two grade categories under three exposure settings, totaling 19,614 low-grade instances, with 11,449 below-average samples (58.39%) and 8,165 low-grade samples (41.61%), resulting in a class ratio of approximately 1.4:1 (Below:Low). The dataset was split into training, validation, and test sets using an 8:1:1 ratio. Table 2 summarizes the instance distribution across the three subsets, along with their corresponding proportions.

4.2. Experimental Environment

All experiments were carried out on a consistent hardware setup. The system utilized a 12th-generation Intel Core i5-12490F CPU running at 3.00 GHz (Intel Corporation, Ho Chi Minh City, Vietnam) and an NVIDIA GeForce RTX 4060 GPU equipped with 8 GB of VRAM (NVIDIA Corporation, Santa Clara, CA, USA). The operating environment was Windows 11 Professional, with Python 3.8.19 as the primary programming language and PyTorch 1.12 as the deep learning framework, accelerated by CUDA version 11.3.

Training parameters were kept uniform throughout all experimental runs. The network was trained over 150 epochs using a batch size of 8. The optimization algorithm employed was stochastic gradient descent, with a 0.937 momentum factor, an initial learning rate of 0.01, and a weight decay coefficient of 0.0005. To prevent overfitting and unnecessary computation, an early stopping condition with a patience value of 50 epochs was implemented. Moreover, Mosaic data augmentation [31] was deactivated during the final 10 epochs to enhance convergence behavior.

4.3. Evaluation Criteria

To thoroughly assess both the segmentation effectiveness and lightweight nature of the proposed model under real-world application constraints, eight key evaluation indicators are utilized. Precision (P) and mAP50 (mean Average Precision at an IoU threshold of 0.50) serve as the key indicators of accuracy. Their definitions are mathematically outlined as follows:

I O U = \frac{A \cap B}{A \cup B}

(8)

P = \frac{T P}{T P + F P} \times 100 %

(9)

m A P 50 = \frac{\sum_{i = 1}^{N} \int_{0}^{1} \int P_{i} (r_{i}) d r_{i}}{N} \times 100 %

(10)

In these equations, A refers to the true bounding box annotation, and B represents the corresponding prediction made by the model. TP denotes the count of true positive detections, whereas FP indicates the number of false positives.

P_{i} (r_{i})

denotes the interpolated precision corresponding to a recall level

r_{i}

for the i-th category, while N signifies the total number of target categories involved.

The accuracy metrics are divided into four parts:

P_{b o x}

and

P_{m a s k}

measure the precision of detection and segmentation, while

{m A P 50}_{b o x}

and

{m A P 50}_{m a s k}

reflect the average precision for bounding boxes and masks.

The lightweight performance of the model is assessed from four dimensions: FLOPs, FPS, parameter count (Params), and model size (Size). FLOPs and FPS reflect computational load and inference efficiency, while the number of parameters and model size indicate memory usage and deployment feasibility.

5. Experiments and Discussion

5.1. Model Training

Under the predefined parameter settings, the GS-YOLO-seg model was trained on a custom graphite ore dataset and evaluated alongside the baseline YOLO11n-seg and the larger YOLO11s-seg model from the same series. The training process is illustrated in Figure 8, showing the changes in the mean Average Precision for segmentation masks and corresponding loss curves. In the early stages of training, all three models quickly adapted to the data characteristics, with a rapid increase in

{m A P 50}_{m a s k}

and a sharp decrease in segmentation loss, indicating good initial convergence. Around the 20th epoch, the GS-YOLO-seg model demonstrated a faster and more stable performance improvement compared to the baseline, while the YOLO11s-seg also exhibited strong performance due to its larger model capacity. As training progressed, the performance of all models gradually stabilized; however, the baseline model still showed noticeable fluctuations in accuracy, whereas GS-YOLO-seg achieved stronger convergence in the later stages, reflecting more stable training dynamics.

5.2. Ablation Study

To thoroughly assess the contribution of each proposed module, this work employs the channel-adjusted YOLO11t-seg as the reference architecture. Enhancements were incrementally introduced following a stepwise experimental framework. All tests were performed under uniform hyperparameter settings to maintain fair conditions and ensure that variable control was strictly enforced. A series of ablation trials based on combinatorial configurations were conducted to objectively verify the performance impact of each improvement. The architectural changes are outlined as follows:

A: Utilizing GSConv in place of the conventional downsampling layers within the backbone to improve efficiency and representation;

B: Employing the standard C3k2 module across the network with the accelerated C3k2-Faster structure to reduce computational overhead;

C: Integrating the streamlined Segment-Efficient head to minimize segmentation complexity while preserving performance.

Table 3 presents the outcomes of various module combinations, and Figure 9 visualizes the performance trends of intermediate models incorporating different configurations. These results distinctly highlight how each individual module influences both the segmentation accuracy and the lightweight characteristics of the network.

As evidenced in Table 3, the results of the ablation analysis affirm that the introduced modules effectively boost model accuracy while facilitating lightweight optimization. First, the introduction of GSConv led to approximately a 1% improvement in both bounding box precision and segmentation average precision, while significantly reducing the number of model parameters, demonstrating its strong structural optimization capability without compromising performance. Second, the C3k2-Faster and Segment-Efficient modules also contributed to model lightweighting to varying degrees. The Faster Block reduced computational cost by performing convolution only on selected channels, and the Segment-Efficient head minimized unnecessary computation by streamlining the segmentation head. However, introducing the Segment-Efficient module alone resulted in a slightly drop in segmentation average precision, indicating a trade-off between efficiency and detection accuracy.

Building upon the GSConv-only baseline, the staged integration of feature extraction modules (A + B) and segmentation decision modules (A + C) led to further performance improvements while continuing to reduce model complexity. The final model, GS-YOLO-seg, incorporates depth reduction, optimized downsampling, enhanced feature extraction, and an efficient segmentation head. It outperforms the original YOLO11n-seg in three accuracy metrics and achieves significant reductions in FLOPs (30%), parameter count (59%), and model size (56%), along with an 8% increase in FPS. These results clearly demonstrate that the proposed modules can effectively optimize the model structure for lightweight deployment while maintaining high segmentation and detection accuracy.

5.3. Comparison Study

A comprehensive benchmark comparison was performed between GS-YOLO-seg and several representative segmentation models to validate its performance superiority. To maintain experimental fairness, all models were trained without pretrained parameters and relied solely on their standard hyperparameter configurations, without any further adjustment. Comprehensive results for all evaluated models can be found in Table 4, while Figure 10 illustrates a comparison of their lightweight performance metrics.

Mainstream segmentation frameworks such as Mask R-CNN, U-Net, and the MaskFormer series, all employing ResNet-50 [40] as the backbone, demonstrate strong segmentation performance due to their deep architectures. They achieve slightly higher average precision in both object detection and instance segmentation, exceeding the proposed model’s accuracy by less than 2%. However, these models typically require hundreds of GFLOPs and tens of millions of parameters, resulting in large model sizes of several hundred megabytes and low inference speed. These high computational requirements restrict their use on edge devices and in applications requiring real-time processing.

The GS-YOLO-seg shows significant advantages in inference speed compared to other YOLO-series segmentation models of similar scale. Against lightweight models like YOLOv5n-seg and YOLOv8n-seg, GS-YOLO-seg improves mean average precision by 0.4% and 2.7%, respectively, while maintaining a more compact network and faster response. Compared with the more complex YOLO11s-seg model from the same series, GS-YOLO-seg uses the efficient GSConv module, the C3k2-Faster module, and the optimized Segment-Efficient segmentation head to achieve comparable accuracy, with differences in mean average precision for detection and segmentation under 0.5%. Meanwhile, it reduces FLOPs by 80%, parameter count by 88%, and model size by 87%, while increasing FPS by 31%. These results demonstrate that GS-YOLO-seg balances accuracy with substantial model compression and acceleration, making it well suited for resource-limited deployment.

5.4. Visualization Study

For a straightforward visualization of segmentation effectiveness, four sample test images were used for comparison. Figure 11 shows the instance segmentation results of the proposed GS-YOLO-seg model alongside baseline models. Two subcategories of low-grade graphite ore are annotated with distinct colors: below-average grade (yellow) and low-grade (red). Confidence scores (0–1) within bounding boxes indicate the model’s estimated accuracy for each detection. Green arrows highlight key regions to emphasize performance differences between models.

The first row shows images at 400 µs exposure, where low-grade ore blocks display distinct visual features. All four models accurately detected and classified the instances with similar confidence scores and precise segmentation. In the second row (600 µs exposure), medium-low grade ores are densely packed. YOLO11s-seg produced overlapping bounding boxes due to duplicate detections in the center, an issue absent in other models. Both YOLO11s-seg and GS-YOLO-seg detected an edge-region ore block missed by the baseline, showing stronger performance in edge and dense target detection. The third row (600 µs exposure) highlights two medium-low grade ores; YOLO11s-seg segmented both correctly, GS-YOLO-seg detected only the larger one, and YOLO11n-seg missed both. The final row (800 µs exposure) features complex lighting with reflection interference. Two high-grade blocks with reflective areas were misclassified by YOLO11n-seg but correctly identified by GS-YOLO-seg and YOLO11s-seg, indicating better robustness under challenging conditions.

Based on the visual analysis of the results, GS-YOLO-seg demonstrates comparable performance to the baseline model in distinguishing low-grade graphite ore under different exposure conditions. This stable performance is mainly attributed to the integration of the GSConv and C3k2-Faster modules, which enhance the model’s ability to extract features across ore grades. In addition, the integration of a global depth scaling technique and a streamlined Segment-Efficient head greatly decreases both parameters and computation, while preserving strong accuracy. With respect to edge deployment needs, GS-YOLO-seg outperforms the baseline by a small margin, reflecting its enhanced competitiveness and real-world value.

5.5. Generalization Study

To assess the generalization capability of the proposed model beyond graphite ore, GS-YOLO-seg was evaluated on two external public datasets: a lithium mineral microscopy dataset [41] (1013 images, 15,682 annotated instances across feldspar, quartz, and lepidolite) and a petrographic mineral microscopy dataset [42] (1092 images, 6244 annotated instances across actinolite, garnet, and hornblende). As shown in Table 5 and Figure 12, GS-YOLO-seg achieved comparable segmentation performance to YOLO11n-seg while significantly reducing computational complexity. On both datasets, the model reduced FLOPs by over 28%, parameters by up to 61%, and model size by more than 55%, alongside a consistent improvement in inference speed. These results confirm the model’s generalization potential across different mineralogical contexts and its suitability for lightweight deployment in diverse industrial scenarios.

5.6. Limitations and Future Work

Despite the promising results achieved in this study, several limitations remain to be addressed. First, the graphite ore dataset used for model development was collected from a single mining site, which may limit the diversity and representativeness of mineral variations under different geological conditions. Second, the model occasionally exhibits false detections or missed segmentations when dealing with complex scenarios, particularly where ore grade distinctions are subtle or confounded by specular reflections. Third, although GS-YOLO-seg has been optimized for edge deployment, real-time performance and adaptability under varying production environments still require further enhancement before large-scale industrial application.

To overcome these limitations, future research will focus on the following directions: (1) expanding the training dataset to include samples from multiple mining locations and geological backgrounds, thereby enhancing the robustness and generalizability of the model; (2) incorporating multimodal data sources—such as hyperspectral or infrared imaging—in combination with visual data to improve recognition accuracy and grade discrimination; (3) further optimizing the network architecture and hardware acceleration strategy to improve processing speed and adaptability in real-world deployment; (4) extending the framework to support intelligent sorting of other mineral types, promoting broader applicability in industrial ore processing scenarios; and (5) establishing a multi-dimensional evaluation system to quantify the model’s sustainability and green impact, including metrics such as energy efficiency, reduction in manual labor and reagent use, processing throughput, and environmental benefits.

6. Conclusions

Accurate identification and removal of low-grade graphite ore is critical for maintaining product quality, improving resource efficiency, and reducing environmental impact during beneficiation and processing. Traditional sorting methods often rely on complex equipment and manual intervention, resulting in high energy consumption, excessive use of chemical reagents, and significant ecological burden—factors that hinder sustainable production. To address these challenges while enhancing edge deployment efficiency, this study proposes GS-YOLO-seg, a lightweight segmentation model based on an improved YOLO11-seg framework, designed for rapid identification and intelligent classification of low-grade graphite ore in industrial settings. The key contributions of this work include:

A high-quality graphite ore image dataset was independently constructed for real-world industrial scenarios. Images were captured on-site under various exposure conditions using high-resolution imaging equipment, and manually annotated to distinguish between medium-low and low-grade ores. The final dataset comprises diverse samples across three exposure durations and two grade levels.
The YOLO11-seg architecture was comprehensively optimized to improve lightweight characteristics and computational efficiency. First, a global depth factor adjustment was applied to compress the network and construct a lightweight YOLO11t-seg. Second, GSConv was introduced into the backbone as an efficient downsampling operator, reducing computation while enhancing representational capacity. Third, a Faster Block was incorporated into the feature extraction module, forming the C3k2-Faster structure to accelerate processing. Finally, the segmentation head was optimized by eliminating redundant operations and designing the efficient Segment-Efficient module.
Several ablation studies were performed to assess the impact of each proposed module. The GS-YOLO-seg was compared against mainstream segmentation methods and other lightweight variants across several key metrics. Visual comparisons were also carried out to demonstrate the segmentation accuracy and lightweight advantages of the improved model in practical scenarios. Further experiments on additional datasets validated the generalization and robustness of the proposed approach.
Compared with the baseline YOLO11n-seg, GS-YOLO-seg achieves comparable detection and segmentation accuracy while significantly reducing computational and memory overhead. Specifically, FLOPs decreased to 7.1 G, the parameter count dropped to 1.16 M, and model size was reduced to 2.53 MB—corresponding to reductions of approximately 30%, 59%, and 56%, respectively. In addition, inference speed improved by 8% (FPS). These results highlight the model’s well-balanced performance across accuracy, speed, and lightweight deployment, demonstrating its strong potential for industrial application and edge-device integration.

Future work will focus on refining the method for real-world production use, enabling efficient graphite ore sorting. The framework will also be adapted to other minerals to expand its industrial applications. By improving upstream intelligent sorting, the approach aims to reduce low-grade ore processing, minimizing resource waste, energy use, and emissions. These findings provide a foundation for building smart, sustainable mining systems.

Author Contributions

Conceptualization, Z.Q., X.H. and J.W.; methodology, X.H.; software, Z.Q. and Z.S.; validation, Z.Q. and S.L.; resources, Z.Q.; data curation, Z.Q. and Z.S.; writing—original draft preparation, Z.Q.; writing—review and editing, Z.Q.; visualization, Z.Q.; supervision, X.H. and J.W.; project administration, X.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National key Research and Development program of China (Grant 2020YFB1713700) and Key Science and Technology Project of Yichun City (Grant No. 2023YBKJGG03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors express their gratitude to China Minmetals Corporation (Heilongjiang, China) Graphite Industry Co., Ltd. for providing ore materials and research facilities, and the Yichun Lithium Battery Research Institute for supplying instruments and the venue. They also gratefully acknowledge the insightful comments and constructive feedback from the reviewers and editors, which have greatly enhanced the clarity and quality of this paper. Special appreciation is also extended to Joseph Redmon and his research team for their pioneering work on the YOLO algorithm, which has laid a solid theoretical foundation for real-time object detection and significantly informed the development of this study.

Conflicts of Interest

Author Jionghui Wang was employed by Minmetals Exploration and Development Co., Ltd., The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

XRD	X-ray diffraction
SEM-EDS	Scanning electron microscopy with energy-dispersive spectroscopy
OM	Optical microscopy
NCA	Neighborhood Component Analysis
SWIR	Short-Wave Infrared
NIR	Near-Infrared
CT	Micro-computed tomography
PGI	Programmable Gradient Information
C2f	Cross-Stage Partial Structure with 2 Convolutions
C3k2	Cross-Stage Partial Structure with Convolution 3 and kernel size 2
C3	CSPDarknet53 with 3 Convolutions
C2PSA	Convolutional block with Parallel Spatial Attention

References

Yan, L.; Chen, J.; Ouyang, Y. Analysis of China’s Graphite Resource Security in the Context of Global New Energy Industry Development and New Trade Patterns. China Min. Mag. 2025, 34, 111–118. [Google Scholar]
Park, J.; Cho, S.; Shin, S. Overview of Graphite Supply Chain and Its Challenges. Geosci. J. 2025, 29, 329–341. [Google Scholar] [CrossRef]
Yadav, R.; Sharma, A.K.; Sharma, S. Advance Development in Natural Graphite Material and Its Applications: A Review. Min. Metall. Explor. 2025, 42, 361–385. [Google Scholar] [CrossRef]
Liu, C.; Zhao, T.; Liu, S. Demand Prediction of Natural Graphite Resources in China from 2025 to 2035. China Min. Mag. 2024, 33, 78–88. [Google Scholar]
Kim, T.Y.; Gould, T.; Bennet, S.; Briens, F.; Dasgupta, A.; Gonzales, P.; Lagelee, J. The Role of Critical Minerals in Clean Energy Transitions; International Energy Agency: Washington, DC, USA, 2021; pp. 70–71. [Google Scholar]
Sun, C.; Shen, S.; Wang, W.; Yuan, G.; Huang, X.; Zhang, Y. High-Quality Development of Graphite Resources and Graphite Material Industry. Strateg. Study Chin. Acad. Eng. 2022, 24, 29–39. [Google Scholar] [CrossRef]
Guo, R.; Li, W.; Han, Y. Progress on the Separation, Purification and Application of Natural Graphite. Chem. Ind. Eng. Prog. 2021, 40, 6155–6172. [Google Scholar]
McCoy, J.T.; Auret, L. Machine Learning Applications in Minerals Processing: A Review. Miner. Eng. 2019, 132, 95–109. [Google Scholar] [CrossRef]
Jooshaki, M.; Nad, A.; Michaux, S. A Systematic Review on the Application of Machine Learning in Exploiting Mineralogical Data in Mining and Mineral Industry. Minerals 2021, 11, 816. [Google Scholar] [CrossRef]
Tan, Y.; Li, T.; Zhang, Y. Research on Neural Network Model Generation and Deployment for Edge Intelligence. Comput. Eng. 2024, 50, 1–12. [Google Scholar]
Cardoso, F.W.; Silva, M.C.; de C. Meira, N.F.; Oliveira, R.A.R.; Bianchi, A.G.C. Advancing Particle Size Detection in Mineral Processing: Exploring Edge AI Solutions. In Enterprise Information Systems; Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S., Eds.; Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2024; Volume 518, pp. 1–15. [Google Scholar]
Chen, Z.; Yu, F.; Gao, H.; Guan, J.; Ren, Z. Process Mineralogy and Beneficiation Research of a Graphite Ore in Shaanxi Province. J. Mineral. Petrol. 2019, 39, 66–69. [Google Scholar]
Zhu, J.; Liu, F.; Liu, F.; Shi, C.; Wang, F.; Xu, W. Carbon Isotope and Genesis Studies of Graphite Deposits in the Liaohe Group of the Jiao-Liao-Ji Orogenic Belt. Acta Petrologica Sinica. 2021, 37, 599–618. [Google Scholar]
Ye, M.; Song, H.; Zhang, H.; Li, X. Determination of Fixed Carbon in Graphite Ore by High-Frequency Combustion Infrared Absorption Method. Chem. Anal. Meterage. 2023, 3, 66–70. [Google Scholar]
Li, J. Improvement on the Measurement Method of Fixed Carbon Content in Crystalline Graphite Ore. Chem. Eng. Des. Commun. 2022, 11, 91–93. [Google Scholar]
Kong, D.; Chen, C.; Zhu, Y.; Wu, D.; Zheng, Y.; Long, B. Experimental Study on Mineral Processing of Crystalline Graphite Ore from Sichuan. Min. Metall. Eng. 2022, 41, 55–58. [Google Scholar]
Fu, Y.; Aldrich, C. Deep Learning in Mining and Mineral Processing Operations: A Review. IFAC-PapersOnLine. 2020, 53, 11920–11925. [Google Scholar] [CrossRef]
Okada, N.; Nozaki, H.; Nakamura, S.; Manjate, E.P.A.; Gebretsadik, A.; Ohtomo, Y.; Arima, T.; Kawamura, Y. Optimizing Multi-Spectral Ore Sorting Incorporating Wavelength Selection Utilizing Neighborhood Component Analysis for Effective Arsenic Mineral Detection. Sci. Rep. 2024, 14, 11544. [Google Scholar] [CrossRef] [PubMed]
Es-sahly, S.; Elbasbas, A.; Naji, K.; Lakssir, B.; Faqir, H.; Dadi, S.; Rabie, R. NIR-Spectroscopy and Machine Learning Models to Pre-concentrate Copper Hosted Within Sedimentary Rocks. Mining Metall. Explor. 2024, 41, 1979–1995. [Google Scholar] [CrossRef]
Theerthagiri, P.; Ruby, A.U.; Chaithanya, B.N.; Patil, R.R.; Jain, S. D-Resnet: Deep Residual Neural Network for Exploration, Identification, and Classification of Beach Sand Minerals. Multimed. Tools Appl. 2024, 83, 14539–14563. [Google Scholar] [CrossRef]
Krebbers, L.T.; Lottermoser, B.G.; Liu, X. Computed Tomography of Flake Graphite Ore: Data Acquisition and Image Processing. Minerals 2023, 13, 247. [Google Scholar] [CrossRef]
Huang, X.; Shi, H.; Liu, Y.; Lu, H. Image Classification Algorithm for Graphite Ore Carbon Grade Based on Multi-Scale Feature Fusion. In Mobile Networks and Management; Wu, C., Chen, X., Feng, J., Wu, Z., Eds.; Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering; Springer: Cham, Switzerland, 2024; Volume 559, pp. 1–15. [Google Scholar]
Shu, W.; Wang, J.; Huang, X. Research on Graphite Ore Grade Classification Based on the Integration of Multi-Level Features from ResNet and Transformer. In Proceedings of the International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024), Zhengzhou, China, 18 November 2024; Volume 13403. 13403IM. [Google Scholar]
Huang, X.; Pan, R.; Wang, J. A Graphite Ore Grade Recognition Method Based on Improved Inception-ResNet-v2 Model. IEEE Access. 2025, 13, 48007–48018. [Google Scholar] [CrossRef]
Ultralytics. YOLO11. 2024. Available online: https://docs.ultralytics.com/zh/models/yolo11/ (accessed on 12 May 2025).
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A Better Design Paradigm of Detector Architectures for Autonomous Vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H. Run Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Wkentaro. Labelme. Available online: https://github.com/wkentaro/labelme (accessed on 12 May 2025).
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2017, arXiv:1703.06870. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Cheng, B.; Schwing, A.G.; Kirillov, A. Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv 2021, arXiv:2107.06278. [Google Scholar]
Cheng, B.; Misra, I.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. arXiv 2022, arXiv:2112.01527. [Google Scholar]
Ultralytics. YOLOv5: An Open-Source Object Detection Model. Available online: https://github.com/ultralytics/yolov5 (accessed on 12 May 2025).
Ultralytics. YOLOv8: The State-of-the-Art in Object Detection. Available online: https://github.com/ultralytics/ultralytics (accessed on 12 May 2025).
Wang, C.Y.; Yeh, I.H.; Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
Lithium Mineral Microscopy Dataset. Available online: https://aistudio.baidu.com/datasetdetail/342470 (accessed on 8 June 2025).
Actinolite Dataset. Available online: https://github.com/xiaozesaigao/Actinolite-Thin-Section-Microscopic-Image-Dataset (accessed on 12 May 2025).

Figure 1. Schematic diagram of YOLO11-seg network architecture.

Figure 2. Schematic diagram of the GS-YOLO-seg network architecture.

Figure 3. Structural schematic of the GSConv.

Figure 4. Structural schematic of the C3k2-Faster and Faster Block.

Figure 5. A comparison of PConv and standard convolution, with “∗” indicating the convolution operation. (a) Diagram depicting standard convolution; (b) diagram depicting PConv.

Figure 6. Schematic diagram of the head architecture incorporating the Segment-Efficient module.

Figure 7. Representative graphite ore images captured under different exposure settings. (a) Short exposure; (b) medium exposure; (c) long exposure.

Figure 8. Training performance of GS-YOLO-seg and baseline models. (a)

{m A P 50}_{m a s k}

-epochs curve; (b) seg loss-epochs curve.

Figure 8. Training performance of GS-YOLO-seg and baseline models. (a)

{m A P 50}_{m a s k}

-epochs curve; (b) seg loss-epochs curve.

Figure 9. Performance trends during ablation study. (a)

{m A P 50}_{m a s k}

-Params curve; (b) FPS-FLOPs curve.

Figure 9. Performance trends during ablation study. (a)

{m A P 50}_{m a s k}

-Params curve; (b) FPS-FLOPs curve.

Figure 10. Lightweight metric comparison among various models.

Figure 11. Comparative visualization of segmentation results. Green arrows indicate key comparative regions.

Figure 12. Visual comparison across additional datasets. (a) Lithium mineral microscopy dataset; (b) petrographic mineral microscopy dataset.

Table 1. Comparison of scaling factors and related parameters in YOLO11-seg models.

Model	Width Factor	Depth Factor	Repeats	Params (M)
YOLO11n-seg	0.25	0.5	2	5.75
YOLO11s-seg	0.5	0.5	2	19.5
YOLO11t-seg	0.25	0.25	1	3.59

Table 2. Details of dataset partitioning.

Category	Training Set	Validation Set	Test Set	Total
Total	15,568 (79.37%)	2118 (10.80%)	1928 (9.83%)	19,614 (100%)
Below	9046 (79.00%)	1253 (10.95%)	1150 (10.05%)	11,449 (58.39%)
Low	6522 (79.88%)	865 (10.59%)	778 (9.52%)	8165 (41.61%)

Note: The sum along each row reflects the total instances belonging to a specific category, while the sum of each column shows the total instances contained in the corresponding set. The accompanying percentages indicate the proportion of each category within the corresponding dataset split.

Table 3. Evaluation results of module ablation. “√” indicates that the corresponding module or strategy has been integrated into the model.

Model	A	B	C	$P_{b o x}$ (%)	$P_{m a s k}$ (%)	${m A P 50}_{b o x}$ (%)	${m A P 50}_{m a s k}$ (%)	FLOPs(G)	FPS	Params (M)	Size (MB)
YOLO11n-seg				91.2	89.4	92.6	86.6	10.2	85	2.83	5.75
YOLO11t-seg				90.9	89.0	91.2	85.8	9.3	88	1.98	3.59
Transition Model	√			92.3	90.6	92.8	86.8	8.5	85	1.46	3.12
		√		91.0	89.5	91.9	86.0	8.8	86	1.58	3.34
			√	90.7	88.7	91.2	85.3	8.1	90	1.48	3.08
	√	√		91.2	90.1	92.4	86.3	8.3	87	1.33	2.87
	√		√	91.5	89.7	92.1	86.2	7.4	89	1.21	2.65
		√	√	90.9	89.2	92.0	85.7	7.7	91	1.35	2.89
GS-YOLO-seg	√	√	√	91.4	90.0	92.7	86.4	7.1	92	1.16	2.53

Table 4. Comparison with other segmentation models.

Model	$P_{b o x}$ (%)	$P_{m a s k}$ (%)	${m A P 50}_{b o x}$ (%)	${m A P 50}_{m a s k}$ (%)	FLOPs(G)	FPS	Params (M)	Size (MB)
Mask R-CNN [32]	91.2	90.6	92.9	86.7	229.2	42	43.8	363
U-Net [33]	91.0	91.2	93.0	87.3	194.3	53	30.7	129
MaskFormer [34]	92.5	91.9	92.8	86.9	76.5	37	41.4	142
Mask2Former [35]	93.4	91.7	93.3	87.6	82.7	29	45.2	168
YOLOv5n-seg [36]	88.5	87.8	90.2	83.7	7.3	93	1.89	5.24
YOLOv8n-seg [37]	90.8	89.1	91.8	86.0	12.0	88	3.26	6.49
YOLOv9c-seg [38]	91.2	90.1	93.0	86.5	156.7	38	27.3	53.6
YOLO11n-seg	91.2	89.4	92.6	86.6	10.2	85	2.83	5.75
YOLO11s-seg	91.7	91.0	93.2	86.8	35.3	70	10.1	19.5
YOLOv12n-seg [39]	91.6	89.7	92.2	86.1	10.2	87	2.81	5.80
YOLO11t-seg	90.9	89.0	91.2	85.8	9.3	88	1.98	3.59
GS-YOLO-seg	91.4	90.0	92.7	86.4	7.1	92	1.16	2.53

Table 5. Comparison results on additional datasets.

Dataset	Model	$P_{m a s k} (%)$	${m A P 50}_{m a s k} (%)$	FLOPs (G)	FPS	Params (M)	Size (MB)
Lithium Mineral Microscopy Dataset	YOLO11n-seg	90.3	89.2	10.1	68	2.83	5.75
	YOLO11t-seg	89.5	88.8	9.3	73	1.69	3.59
	GS-YOLO-seg	90.2	89.0	7.2	75	1.09	2.38
Petrographic Mineral Microscopy Dataset	YOLO11n-seg	71.8	65.7	10.2	78	2.84	5.74
	YOLO11t-seg	70.9	64.8	9.3	84	1.71	3.58
	GS-YOLO-seg	71.2	65.0	7.3	90	1.25	2.59

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, Z.; Huang, X.; Sun, Z.; Li, S.; Wang, J. GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg. Sustainability 2025, 17, 5663. https://doi.org/10.3390/su17125663

AMA Style

Qiu Z, Huang X, Sun Z, Li S, Wang J. GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg. Sustainability. 2025; 17(12):5663. https://doi.org/10.3390/su17125663

Chicago/Turabian Style

Qiu, Zeyang, Xueyu Huang, Zhaojie Sun, Sifan Li, and Jionghui Wang. 2025. "GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg" Sustainability 17, no. 12: 5663. https://doi.org/10.3390/su17125663

APA Style

Qiu, Z., Huang, X., Sun, Z., Li, S., & Wang, J. (2025). GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg. Sustainability, 17(12), 5663. https://doi.org/10.3390/su17125663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. The Original YOLO11-Seg Network

3.2. The Improved GS-YOLO-Seg Network

3.3. YOLO11t-Seg

3.4. GSConv

3.5. C3k2-Faster

3.6. Segment-Efficient

4. Data and Experimental Preparation

4.1. Dataset Construction

4.2. Experimental Environment

4.3. Evaluation Criteria

5. Experiments and Discussion

5.1. Model Training

5.2. Ablation Study

5.3. Comparison Study

5.4. Visualization Study

5.5. Generalization Study

5.6. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI