Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method

Zhang, Di; Qian, Xiaolong; Shi, Chenyang; Zhang, Yuang; Qian, Yining; Zhou, Shengyue

doi:10.3390/fi17120553

Open AccessArticle

Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method

by

Di Zhang

¹,

Xiaolong Qian

^1,*,

Chenyang Shi

¹,

Yuang Zhang

¹,

Yining Qian

² and

Shengyue Zhou

¹

College of Information Science and Engineering, Northeastern University, Shenyang 110819, China

²

College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(12), 553; https://doi.org/10.3390/fi17120553 (registering DOI)

Submission received: 27 October 2025 / Revised: 18 November 2025 / Accepted: 25 November 2025 / Published: 1 December 2025

(This article belongs to the Special Issue Algorithms and Models for Next-Generation Vision Systems)

Download

Browse Figures

Versions Notes

Abstract

Iron ore image classification is essential for achieving high production efficiency and classification precision in mineral processing. However, real industrial environments face classification challenges due to small samples, inter-class similarity, and on-site noise. Existing methods are limited by single-view approaches that provide insufficient representation, difficulty in achieving adaptive balance between performance and complexity through manual or fixed feature selection and fusion, and susceptibility to overfitting with poor robustness under small sample conditions. To address these issues, this paper proposes the evolutionary deep fusion framework EDF-NSDE. The framework introduces multi-view feature extraction that combines lightweight and classical convolutional neural networks to obtain complementary features. Additionally, it was utilized to design evolutionary fusion that utilizes NSGA-II and differential evolution for multi-objective search to adaptively balance accuracy and model complexity while reducing overfitting and enhancing robustness through a generalization penalty and adaptive mutation. Furthermore, to overcome data limitations, we constructed a six-class dataset including hematite, magnetite, ilmenite, limonite, pyrite, and rock based on real production scenarios. The experimental results show that on our self-built dataset, EDF-NSDE achieves 84.86%/88.38% on original/augmented test sets, respectively, comprehensively outperforming other models. On a public seven-class mineral dataset, it achieves 92.51%, validating its generalization capability across different mineral types and imaging conditions. In summary, EDF-NSDE provides an automated feature fusion solution that achieves automated upgrading of the mineral classification process, contributing to the development of intelligent manufacturing technology and the industrial internet ecosystem.

Keywords:

iron ore image classification; multi-view feature selection; feature fusion; deep learning; evolutionary algorithms

1. Introduction

Ore classification represents a critical stage in mineral processing, directly influencing production efficiency, resource utilization, and processing costs [1,2,3]. Traditional manual, experience-driven sorting methods suffer from inherent limitations, including low efficiency, high costs, and strong subjectivity, rendering them inadequate for modern industrial scenarios [4]. Accurate and reliable classification serves not only as the foundation for ore quality monitoring and grade assessment but also as a prerequisite for flowsheet optimization and efficient resource utilization. With the advent of intelligent manufacturing, modern mineral processing imposes increasingly stringent technical requirements on classification models, particularly emphasizing strong adaptability under small sample conditions, an essential capability to address the challenges of complex sample distributions and limited data availability in practical industrial settings [5,6].

As a major branch of machine learning, deep learning has emerged as a core technology for intelligent recognition of ore images, demonstrating stable applicability and strong representational capacity across multiple minerals and scenarios. In multi-class mineral recognition, improved convolutional neural networks have achieved high classification accuracy on public datasets containing seven classes such as malachite and bornite, confirming the ability of deep features to capture fine-grained mineral differences [7]. For the discrimination of polymetallic sulfide tailings, multimodal strategies that fuse imagery and spectroscopy significantly improve reliability while maintaining efficiency, meeting field requirements for both timeliness and accuracy [8]. In coal-related ore classification, attention mechanisms enhance responses to key textures and color regions, thereby improving feature extraction quality and classification performance [9].

Despite significant progress in iron ore image classification—including optical texture descriptors supporting particle-level recognition [10], deep learning methods significantly improving accuracy [11], and multi-feature fusion networks further optimizing both accuracy and robustness [12]—practical deployment still faces several critical bottlenecks that can be categorized into three main aspects:

Limited feature views: Single networks or single-scale approaches struggle to simultaneously capture local textures and global morphology while handling background interference, resulting in insufficient robustness in complex field conditions. This limitation becomes particularly pronounced in iron ore classification, where subtle inter-class differences and significant intra-class variations require comprehensive feature representation. Feature selection and fusion: Current methods predominantly rely on manual or fixed fusion architectures, overlooking the complementary and interactive relationships between different architectures. This often leads to redundancy and conflicts, thereby limiting feature utilization efficiency and final performance [13]. The lack of adaptive fusion strategies becomes a critical barrier when dealing with the heterogeneous nature of iron ore samples. Insufficient data foundation: There is an absence of standardized datasets that closely approximate field conditions and are specifically designed for fine-grained iron ore subcategories constrains algorithm validation, generalization assessment, and reproducible comparisons. Existing approaches demonstrate limited generalization capabilities under small-sample conditions [14]. Furthermore, the lack of specialized image datasets for multi-class iron ore makes it difficult to support the development and performance validation of multi-class iron ore image classification algorithms under small-sample conditions.

To address these limitations, we propose an evolutionary deep fusion method named EDF-NSDE for multi-class iron ore image classification under small-sample conditions. This method centers on multi-view feature selection and feature fusion, integrating complementary representations from lightweight and classical convolutional neural networks and employs multi-objective search based on NSGA-II and differential evolution to adaptively balance accuracy and model complexity. Additionally, it introduces a generalization penalty and adaptive mutation to mitigate overfitting and enhance robustness. To provide a reliable data foundation, we constructed a six-class iron ore image dataset based on real production scenarios, comprising hematite, magnetite, ilmenite, limonite, pyrite, and rock categories, addressing the shortage of specialized data resources for this field.

The main contributions of this work are as follows:

Multi-view feature extraction: We propose a comprehensive feature extraction framework that integrates complementary representations from heterogeneous network backbones (ResNet, MobileNet, etc.) to capture both local textures and global morphology, enhancing robustness against background interference in complex field conditions.

Evolutionary deep fusion optimization: We introduce an EDF-NSDE method that employs multi-objective search based on NSGA-II and differential evolution to adaptively balance accuracy and model complexity while incorporating a generalization penalty and adaptive mutation to mitigate overfitting and enhance robustness under small-sample conditions.

Dataset construction: We constructed a six-class iron ore image dataset based on real production scenarios, providing a reliable data foundation for rigorous algorithm validation, generalization assessment, and reproducible comparisons in the field of iron ore image classification.

2. Related Work

2.1. Multi-View Feature Extraction

Feature extraction is a key component in many machine learning systems. A number of deep neural networks (DNNs), such as Inception [15], ResNet [16], and DenseNet [17], have been used for this purpose [18]. The development of multi-view feature extraction methods has been driven by the need to capture rich, multi-dimensional information more efficiently, evolving from manual feature engineering to automated, learning-based extraction. In the traditional machine learning era, feature construction relied heavily on expert knowledge: Researchers manually designed texture, morphology, and color descriptors and then trained classifiers such as Support Vector Machines (SVMs) or random forests. In ore classification, for instance, iron ore types have been identified by coupling SVMs with handcrafted color and texture features [2], and ore phases have been recognized via optical image analysis with manually set parameters to extract particle descriptors [19].

With the advent of deep learning, CNNs, by virtue of hierarchical representation learning, can capture multi-scale cues from local to global and thus have emerged as core tools for multi-feature extraction. These models exhibit tangible potential in ore classification. Chen et al. combined CNNs with Long Short-Term Memory (LSTM) to couple spatial feature extraction with temporal dependency modeling, effectively handling high-dimensional sequential correlations [20]. Huang et al. developed multi-scale convolutional attention networks that aggregate features via parallel kernels and emphasize salient signals through attention, improving both focus and noise suppression [21]. Jiang et al. enhanced multi-feature extraction by adopting an Asymptotic Feature Pyramid Network (AFPN) for multi-scale feature fusion and introducing BiFormer blocks into the backbone, further verifying the effectiveness of multi-scale fusion in deep learning feature extraction [22].

To address the limitations of single-view feature extraction, researchers have developed various multi-view feature fusion strategies that leverage complementary information from different network architectures and scales. Fang et al. [23] employed transfer learning and data fusion for hyperspectral image classification using three-dimensional asymmetric spatial networks, demonstrating the effectiveness of multi-view feature integration. Bi et al. [14] achieved data-efficient image classification via genetic-programming-driven evolutionary deep learning, while Wei et al. [24] proposed a hybrid evolutionary algorithm to search key hyperparameters of DenseNet-121 for improving accuracy and generalization stability. Kaur and Singh [25] used multi-objective differential evolution to optimize multimodal fusion in deep networks, and Yasar and Golcuk [26] combined whale optimization with minimum redundancy–maximum relevance for feature selection and fusion in crop variety identification. These approaches demonstrate the potential of combining different network architectures and optimization strategies to enhance feature representation capabilities.

Multi-view feature extraction has shown particular promise in ore classification tasks, where the complementary nature of different feature representations becomes crucial for accurate mineral identification. The single-view limitation of individual CNNs becomes particularly pronounced in this domain: ResNet excels at capturing global morphology but may miss fine local texture details in ore samples [27], while MobileNetV2 captures local cues effectively but struggles to establish long-range dependencies that are crucial for classes like magnetite, where both global compactness and local luster inform the decision boundary. However, applying multi-feature extraction to iron ore classification faces notable challenges: feature selection is often manually curated and not dynamically adapted to ore-specific differences—e.g., emphasizing global features for hematite versus local features for ilmenite—leading to cross-class generalization gaps when fixed combinations are used. Additionally, feature dimensionality and scaling mismatch are pronounced; descriptors from different methods can vary widely in magnitude and dimension, and standard normalization is insufficient to eliminate scale dominance, causing large-feature spaces to overshadow informative small-feature spaces during fusion [28]. Moreover, small-sample representativeness remains limited: data scarcity undercuts coverage of the feature space, especially for minority classes, impairing their classification accuracy relative to majority classes [29].

2.2. Evolutionary Deep Fusion

Realizing the full utility of multi-feature extraction hinges on effective fusion. Traditional fusion often relies on fixed rules and manual heuristics, constraining adaptability. Evolutionary deep fusion reframes the end-to-end process—feature selection, operator choice, and weight allocation—as an evolutionary search problem, leveraging GA, PSO, or DE to automate fusion design. For example, Fu et al. proposed a GA-based Trusted Evolutionary Fusion framework, enhancing cross-view interaction and boosting classification accuracy across datasets [30]. Liang et al.’s evolutionary deep fusion (EDF) encodes fusion designs as chromosomes, improving chemical structure recognition accuracy by 3.08–3.35% over manual fusion [31]. Yang et al. applied DE to optimize multi-omics fusion weights, lifting agricultural phenotypic prediction accuracy significantly [32], while Zhang et al. used PSO for adaptive multimodal fusion in gesture recognition, outperforming manual strategies [33].

Single-objective evolutionary fusion is limited in complex tasks; multi-objective evolutionary fusion has emerged to balance accuracy and complexity. Kaur and Singh [25] introduced a multi-objective DE-based medical image fusion technique that uses non-dominated sorting to select solutions with high discriminability and low fusion loss, obviating handcrafted rules and outperforming single-objective counterparts in preserving modality-specific information. Huang et al. [34] proposed a DRL-guided multi-objective evolutionary algorithm with a serial–parallel mechanism, achieving 2.5–5.9% improvements in IGD+ over traditional methods on IDMP benchmarks.

Despite these advances, current evolutionary deep fusion methods face several fundamental challenges that limit their effectiveness in complex classification tasks, particularly when applied to iron ore classification. First, search efficiency becomes inadequate when dealing with large design spaces induced by multi-feature inputs. With n pretrained models, the search encompasses

2^{n}

model subsets and continuous fusion weights, creating an exponentially complex optimization landscape where baseline evolutionary strategies are prone to local minima and exhibit runtime characteristics that far exceed real-time constraints. Second, encoding schemes lack adaptability: binary encodings combining feature selection and weights fail to accommodate view-wise dimensional heterogeneity, undermining complementarity—e.g., global features from ResNet-50 blend optimally via averaging with local features from MobileNetV2, whereas DenseNet-121’s dense features may fuse more effectively via max pooling with InceptionV3’s multi-scale features. These limitations become particularly pronounced in iron ore classification tasks, where the unique characteristics of ore images—including complex mineralogical variations, diverse visual characteristics across different iron ore types, and the need for precise discrimination of subtle differences in texture, color, and morphology—exacerbate the challenges of search efficiency and encoding adaptability. Additionally, recent advances in multi-modal fusion have demonstrated significant improvements across various domains, with electroencephalography-based emotion recognition showing particular promise through spatio-temporal representation fusion learning networks (STRFLNet) that employ hierarchical transformer fusion modules, while systematic analysis reveals persistent challenges, including noise susceptibility, individual differences, and limited dataset diversity that highlight the need for adaptive fusion mechanisms [35,36].

To address these limitations, we propose EDF-NSDE, an evolutionary deep fusion framework that integrates multi-objective NSGA-II with differential evolution. First, we introduce DE-based vector-difference mutation to enhance population diversity and optimize the fusion weight matrix, effectively avoiding local minima while improving search efficiency in large design spaces. Second, we design an adaptive encoding scheme that jointly represents model-selection vectors, operator sequences, and weight matrices, accommodating cross-view dimensional heterogeneity and enabling end-to-end fusion of multi-feature representations. This approach directly addresses the computational complexity challenges while maintaining the complementarity benefits of diverse feature representations for iron ore classification.

3. Methodology

3.1. Overall Framework

To address the challenge of effective multi-view feature fusion in iron ore image classification tasks, this paper presents an intelligent feature fusion approach based on evolutionary deep fusion with NSGA-II and Differential Evolution (EDF-NSDE). This approach draws on the core framework of multi-view feature extraction and evolutionary search for optimal fusion strategies from the literature [31,37], with three critical distinctions from the existing evolutionary deep fusion (EDF) paradigm. First, unlike the original EDF method that optimizes for classification accuracy as a single objective, our approach incorporates multi-objective optimization that simultaneously maximizes classification accuracy and minimizes model complexity. Second, it integrates differential evolution (DE) to boost search efficiency, a strategy not highlighted in the EDF framework. Third, it develops a heuristic population initialization strategy tailored to the feature diversity of iron ore images, an element absent from EDF’s general multi-view fusion design. Building on these enhancements, EDF-NSDE further expands the multi-objective optimization dimensions, strengthens search efficiency through DE, and refines the initialization strategy to meet the generalization requirements of classification tasks, thereby enabling automated feature fusion and classification.

The overall framework of the method is illustrated in Figure 1 and consists of two core stages: multi-view feature extraction and preprocessing and evolutionary fusion network search. Specifically, EDF-NSDE operates through these two interconnected stages to realize end-to-end feature fusion and classification. In the first stage of multi-view feature extraction, a number of deep neural networks (DNNs), such as Inception, ResNet, and DenseNet (The dots represent other selected networks, each corresponding to a different color), are utilized to characterize the same iron ore image, generating feature views with distinct focuses and thus forming a complementary candidate feature set. Moving to the second stage of feature selection and fusion, we unify three key elements as optimizable variables: the selection of view features, the fusion operator for combining these features, and the allocation of fusion weights across different views. These variables are solved through multi-objective evolutionary search, which integrates the non-dominated sorting selection mechanism of NSGA-II and the vector differential mutation strategy of differential evolution. This integrated search mechanism not only improves classification accuracy but also suppresses overfitting under small-sample conditions, thereby obtaining more stable fused representations for multi-category discrimination of iron ore.

The core objective of EDF-NSDE is to automatically search for the optimal feature subset selection–fusion operator combination–fusion weight allocation scheme from iron ore image features extracted by multiple pre-trained deep models while simultaneously optimizing two critical objectives: classification accuracy (maximization) and model complexity (minimization of selected model count).

The EDF-NSDE method operates through two primary stages. First, multi-view feature extraction utilizes eight pre-trained deep models, namely, ResNet50, DenseNet121, EfficientNet-B0, MobileNetV2, MobileNetV3, ShuffleNetV2, AlexNet, and InceptionV3, to extract complementary representations from iron ore images. Features from each model m are represented as

F_{m} \in R^{d_{m} \times N}

, where

d_{m}

denotes the feature dimension and N represents the sample count across training, validation, and test sets. Second, evolutionary optimization and fusion employs NSGA-II-based multi-objective optimization combined with differential evolution to automatically search for optimal feature subset selection, fusion operator sequences, and weight allocation schemes. This process ultimately produces fused features

F_{f u s e d} \in R^{D \times N}

that are evaluated using logistic regression for iron ore classification performance assessment, where D represents the dimensionality of the fused feature representation.

3.2. Multi-View Feature Extraction and Preprocessing

3.2.1. Feature Extraction

To capture the multi-dimensional characteristics of iron ore images, eight pre-trained deep learning models (ResNet50, DenseNet121, EfficientNet-B0, MobileNetV2, MobileNetV3, ShuffleNetV2, AlexNet, and InceptionV3) are employed as complementary feature extractors.

Each model is fine-tuned on the iron ore dataset: the original classification layer is replaced to adapt to K categories, with a learning rate

η = 0.001

and early stopping (10 epochs without validation loss improvement). The penultimate layer activations are extracted as multi-view features:

F_{m} = f_{m} (X)

(1)

where

m = 1, 2, \dots, 8

,

f_{m}

denotes the feature extraction function of model m, and

X

represents input iron ore images. Feature dimensions

d_{m} \in {1280, 2048}

(e.g., ResNet50:

d_{m} = 2048

; MobileNetV2:

d_{m} = 1280

).

3.2.2. Feature Preprocessing

Standardization: Each view is Z-Score normalized using training set statistics. For training set features

F_{m}^{t r a i n} = {f_{m, i}^{t r a i n}}_{i = 1}^{N_{t r}}

(

N_{t r}

: number of training samples),

f_{m, i}^{n o r m} = \frac{f_{m, i}^{t r a i n} - μ_{m}^{t r a i n}}{σ_{m}^{t r a i n}}

(2)

where

μ_{m}^{t r a i n}

and

σ_{m}^{t r a i n}

are the mean and standard deviation of

F_{m}^{t r a i n}

, respectively.

Dimensional Alignment: Zero-padding is used for dynamic alignment. For two features to be fused

V_{1} \in R^{d_{1} \times N}

and

V_{2} \in R^{d_{2} \times N}

, the maximum dimension is defined:

D = max (d_{1}, d_{2})

(3)

The padded features are

V_{1}^{p a d} = ZeroPad (V_{1}, D), V_{2}^{p a d} = ZeroPad (V_{2}, D)

(4)

where

ZeroPad (V, D)

pads

D - d

zeros to

V

(if

d < D

) for compatibility with element-wise fusion operators.

The pseudocode of this process is given in Algorithm 1.

Algorithm 1: Multi-view Feature Extraction and Preprocessing.

3.3. Evolutionary Fusion Network Design

3.3.1. Individual Encoding and Decoding

The evolutionary individual is encoded as a three-component structure

p = [M S, O S, W M]

, representing feature selection, the fusion operator sequence, and weight matrix allocation, as illustrated in Figure 2. The individual components are defined as follows:

(1) Encoding Structure

The evolutionary individual

p = [M S, O S, W M]

consists of three components. The model selection vector MS is a binary vector of length 8, i.e.,

M S \in {0, 1}^{8}

, where

M S [m] = 1

indicates that the features of the m-th model are selected, and 0 indicates exclusion. The constraint on the number of selected models k satisfies

2 \leq k \leq 8

, i.e.,

\sum_{m = 1}^{8} M S [m] = k

(5)

The fusion operator sequence OS is an integer vector of length

k - 1

, i.e.,

O S \in {0, 1, 2, 3}^{k - 1}

, where each element corresponds to four basic fusion operators: 0: element-wise addition (add), 1: element-wise maximum (max), 2: element-wise minimum (min), and 3: weighted average (avg).

The fusion weight matrix WM is a floating-point matrix of shape

(k - 1, 2)

, i.e.,

W M \in R^{(k - 1) \times 2}

, where each row corresponds to the weights of two input features for one operator in

W M

, and satisfies that the sum of weights in each row equals 1:

\sum_{j = 1}^{2} W M [i, j] = 1

(6)

(2) Decoding Process

Individual p is mapped to a specific fusion network through the following steps:

Feature Filtering: Extract standardized features of selected models according to

M S

, denoted as

{U_{i}}_{i = 1}^{k}

(where i is the selected model index);

Dimensional Alignment: Perform zero-padding on filtered features to obtain aligned features

{U_{i}^{p a d}}_{i = 1}^{k}

, where

U_{i}^{p a d} \in R^{D \times N}

,

D = max (d_{1}, d_{2}, \dots, d_{k})

;

Sequential Fusion: Starting from

U_{1}

, sequentially fuse

U_{i + 1}

through

O S [i]

and

W M [i]

, defining the fusion iteration formula:

If

O S [i] = 0

(add),

C_{i} = C_{i - 1} + U_{i + 1}

(7)

If

O S [i] = 1

(max),

C_{i} = max (C_{i - 1}, U_{i + 1})

(8)

If

O S [i] = 2

(min),

C_{i} = min (C_{i - 1}, U_{i + 1})

(9)

If

O S [i] = 3

(avg),

C_{i} = W M [i, 1] \cdot C_{i - 1} + W M [i, 2] \cdot U_{i + 1}

(10)

where

C_{0} = U_{1}

,

i = 1, 2, \dots, k - 1

, and the final fused features are

C = C_{k - 1}

(11)

Classification Mapping: Input the fused features

C

into a logistic regression classifier, outputting iron ore category probabilities:

P = softmax (W C + b)

(12)

where

W \in R^{K \times D}

is the weight matrix,

b \in R^{K}

is the bias term, K is the number of iron ore categories, and

P \in R^{K \times N}

is the category probability distribution.

3.3.2. Multi-Objective Fitness Function Design

The objectives are to maximize the classification accuracy with generalization penalty

F_{1}

and minimize the number of models

F_{2}

:

\{\begin{matrix} max F_{1} \\ min F_{2} \end{matrix}

(13)

(1) Classification Accuracy with Generalization Penalty

The performance evaluation framework employs a composite scoring mechanism that integrates cross-validation accuracy with validation set accuracy as the primary performance metric. To address model generalization concerns, a penalty term is incorporated that combines the training–validation performance gap with cross-validation standard deviation. Additionally, a minimum threshold criterion is established to filter out invalid individuals that fail to meet basic performance requirements.

Basic Performance Metric: The basic performance metric

P_{b a s e}

integrates 5-fold cross-validation accuracy and validation set accuracy through weighted fusion, thereby balancing training stability and validation set generalization capability:

P_{b a s e} = 0.7 \cdot C V_{a c c} + 0.3 \cdot V a l_{a c c}

(14)

where

C V_{a c c}

is the average cross-validation accuracy of fused features on the training set, defined as

C V_{a c c} = \frac{1}{5} \sum_{k = 1}^{5} Accuracy (C_{t r a i n}^{(k)}, Y_{t r a i n}^{(k)})

(15)

where

C_{t r a i n}^{(k)}

is the fused features of the training set in the k-th fold of cross-validation,

Y_{t r a i n}^{(k)}

is the corresponding sample labels, and

Accuracy (\cdot)

is the classification accuracy calculation function.

V a l_{a c c}

is the accuracy of fused features on the independent validation set and is defined as

V a l_{a c c} = \frac{1}{N_{v a l}} \sum_{i = 1}^{N_{v a l}} I ({\hat{y}}_{i} = y_{i})

(16)

where

N_{v a l}

is the total number of validation set samples,

{\hat{y}}_{i}

is the predicted category of the i-th validation sample,

y_{i}

is the true category, and

I (\cdot)

is the indicator function (takes value 1 when the condition is satisfied, otherwise 0).

Generalization Penalty Term: The generalization penalty term

P_{p e n a l t y}

is designed to mitigate overfitting by comprehensively considering the training–validation accuracy gap and cross-validation standard deviation:

P_{p e n a l t y} = | T r a i n_{a c c} - V a l_{a c c} | + σ_{C V}

(17)

The accuracy of fused features on the full training set

T r a i n_{a c c}

, defined as

T r a i n_{a c c} = \frac{1}{N_{t r}} \sum_{i = 1}^{N_{t r}} I ({\hat{y}}_{i}^{t r a i n} = y_{i}^{t r a i n})

(18)

where

N_{t r}

is the total number of training set samples, and

{\hat{y}}_{i}^{t r a i n}

and

y_{i}^{t r a i n}

are the predicted and true categories of training samples, respectively.

The standard deviation of cross-validation accuracy

σ_{C V}

, measuring training stability, is defined as

σ_{C V} = \sqrt{\frac{1}{5} \sum_{k = 1}^{5} {(C V_{a c c}^{(k)} - C V_{a c c})}^{2}}

(19)

Final Accuracy Metric: The final accuracy metric

F_{1}

is computed as the basic performance minus the generalization penalty, with a minimum threshold of 0.5 applied to filter invalid individuals.

F_{1} = max (0.5, P_{b a s e} - α \cdot P_{p e n a l t y})

(20)

(2) Model Count

The model count metric

F_{2}

directly quantifies the number of selected pre-trained models, reflecting the computational complexity of the fusion scheme, and is defined as

F_{2} = \sum_{m = 1}^{M} M S [m]

(21)

where

M = 8

is the total number of pre-trained models;

M S [m]

is the 0-1 selection vector (

m = 1

indicates the m-th model is selected, and

m = 0

indicates not selected); and

M S \in {0, 1}^{8}

.

3.3.3. Multi-Objective Optimization and Differential Evolution Strategy

The NSGA-II algorithm is adopted to achieve multi-objective screening of accuracy and model count, introducing DE to enhance the search capability of mutation operators and balancing population convergence speed and diversity.

(1) NSGA-II Multi-Objective Non-Dominated Sorting

The dominance relationship is defined based on weighted accuracy and model count, with generalization capability indirectly constrained through the penalty term of

F_{1}

.

Dominance Relationship Definition: For any two individuals i and j in the population, if the following conditions are satisfied, then i dominates j:

(F_{1}^{(i)} \geq F_{1}^{(j)} \land F_{2}^{(i)} \leq F_{2}^{(j)}) \land (F_{1}^{(i)} > F_{1}^{(j)} \lor F_{2}^{(i)} < F_{2}^{(j)})

(22)

The dynamic accuracy weight

w_{a c c}

, which increases linearly with iteration count to strengthen attention to classification performance in later stages, is defined as

w_{a c c} = 0.5 + 0.5 \cdot \frac{g e n}{G_{m a x}}

(23)

in which

g e n

is the current iteration count and

G_{m a x} = 30

is the maximum iteration count.

Sorting Process: First front screening: The non-dominated individual set in the population that is not dominated by any individual is identified, serving as the current optimal candidate set;

Crowding distance calculation: For individuals within the same front, their neighborhood density in the two-dimensional space of weighted accuracy and model count is calculated with the formula:

C D_{i} = \sum_{d = 1}^{2} \frac{f_{d}^{(i + 1)} - f_{d}^{(i - 1)}}{f_{d}^{m a x} - f_{d}^{m i n} + ϵ}

(24)

where

f_{d}^{(i)}

is the function value of the d-th objective dimension (

d = 1

for weighted accuracy,

d = 2

for model count);

f_{d}^{m a x}

and

f_{d}^{m i n}

are the maximum and minimum values of that dimension, respectively; and

ϵ

is a small value to avoid zero denominator. Boundary individuals are assigned infinite crowding distance (∞) to ensure diversity priority retention.

Selection Strategy: Individuals are sorted by front priority in descending order, with crowding distance used for secondary sorting within the same front. This strategy ultimately retains individuals that exhibit high accuracy, low model count, and high diversity.

(2) Differential Evolution-Enhanced Mutation Operator

To prevent the population from becoming trapped in local optima, the DE strategy is introduced on the basis of traditional mutation. The core steps and formulas are as follows:

Parent Selection: Select parents based on the distance of

M S

vectors to ensure parent diversity: Calculate the distance

d (x, y)

between individual x and other individuals in the population, and select the 3 individuals with the largest distance as parents

{p_{1}, p_{2}, p_{3}}

.

Adaptive DE Weight: Dynamically adjust the DE weight F according to population diversity to balance global exploration and local exploitation:

F = F_{b a s e} \cdot (1 + β \cdot d i v)

(25)

where

F_{b a s e}

is the base weight;

d i v

is population diversity, defined as

d i v = \frac{1}{N_{p o p}} \sum_{i = 1}^{N_{p o p}} d (x_{i}, \bar{x})

, where

\bar{x} = \frac{1}{N_{p o p}} \sum_{i = 1}^{N_{p o p}} x_{i}

(

N_{p o p}

is the population size).

Weight Matrix Mutation: Update the fusion weight matrix

W M

based on the DE strategy, defined as

W M_{n e w} = W M_{p_{1}} + F \cdot (W M_{p_{2}} - W M_{p_{3}})

(26)

where

W M_{p_{1}}

,

W M_{p_{2}}

, and

W M_{p_{3}}

are the weight matrices of the 3 parents, respectively; if matrix row numbers are inconsistent, first align to the maximum row number through zero-padding.

Post-processing: Clip

W M_{n e w}

to

[0, 1]

and normalize to ensure weights are in a reasonable range and the row sums equal 1.

(3) Single-Point Crossover Operator

Single-point crossover is adopted to generate offspring with crossover probability

P_{c}

. The crossover logic for core components is as follows:

Model selection crossover: For $M S$ , select parent 2’s value with probability $P_{c}$ ; otherwise, select parent 1, and correct $M S$ after crossover;
Operator sequence crossover: For $O S$ , select parent 2’s index with probability $P_{c}$ and randomly complete when length is insufficient;
Weight matrix crossover: For $W M$ , select parent 2’s weights with probability $P_{c}$ and randomly generate and normalize when length is insufficient (row sum equals 1).

The pseudocode of this process is given in Algorithm 2.

Algorithm 2: EDF-NSDE Evolutionary Fusion Process.

4. Experiments

To validate the effectiveness of the proposed EDF-NSDE method in iron ore image classification tasks, a series of systematic experiments were conducted in this study. This section provides detailed descriptions of experimental settings, dataset characteristics, comparative methods, evaluation metrics, and in-depth result analysis, with emphasis on validating the superiority of EDF-NSDE in multi-view feature fusion, multi-objective optimization, and generalization capability.

4.1. Dataset Description

The experiments and evaluations in this study were conducted based on two datasets. The first is a self-constructed iron ore dataset, designed to simulate multi-category iron ore identification tasks in real production environments. The second is a public mineral identification dataset, used to verify the generalization ability and transferability of the proposed method across different mineral types and imaging conditions.

(1) Self-Constructed Iron Ore Dataset

We first built a dataset tailored for multi-category iron ore classification tasks, with the detailed sample distribution shown in Table 1. This dataset is based on the public MineralImage5k dataset [38] and further supplemented with on-site images captured by cameras at open-pit mines in Anshan City, Liaoning Province, China. The original public dataset contains five typical iron ore categories: hematite, ilmenite, limonite, magnetite, and pyrite. However, it exhibits three critical limitations that hinder its applicability in practical research: firstly, severe class imbalance exists as categories like ilmenite and limonite have insufficient samples (fewer than 100 images per class), which is inadequate for stable training of deep learning models; secondly, the imaging conditions are singular, with most images captured in controlled laboratory lighting environments, failing to cover complex factors in real mining scenarios such as variable illumination, contamination, and crushed morphologies; thirdly, it lacks the rock category—a non-ore class essential for real production processes, where distinguishing between ore and non-ore materials is crucial for improving production efficiency.

To address these limitations, we expanded and reconstructed the dataset through two key approaches: we systematically collected supplementary images from the mindat.org platform for ore categories with insufficient samples (e.g., ilmenite and limonite) and captured on-site images of iron ore and rock samples using cameras at open-pit mines in Anshan City, incorporating images under various practical working conditions, including different ore batches, varying crushing sizes, different contamination attachment states, and complex background interferences. Meanwhile, we added the rock category to align the dataset with the actual needs of mining processes. A flowchart of iron ore dataset construction and comparison of the class distribution is shown in Figure 3. This flowchart intuitively visualizes two core contents: first, the sequential workflow of dataset construction, from the input of three data sources to the implementation of key processing steps (quality filtering, deduplication, cross-source verification, standardization, and integration), and second, the comparative distribution of categories before and after enhancement, clearly highlighting the sample increment of ilmenite and limonite as well as the newly added rock category.

Subsequently, all images underwent rigorous cleaning and integration: duplicate samples were removed, low-quality images with poor resolution or unclear mineral features were filtered out, and image formats and dimensions were standardized. Cross-verification was performed on samples from different sources to ensure consistency in category labeling. Manual verification of mineral authenticity was conducted for samples collected online, comparing them with geological reference maps to confirm their accuracy. After these processes, the final self-constructed iron ore dataset consisted of 1420 images across six categories: hematite, magnetite, ilmenite, limonite, pyrite, and rock. Representative images from this dataset are shown in Figure 4.

(2) Public Mineral Dataset

To further validate the effectiveness and transferability of the proposed method across a broader range of mineral types and imaging conditions, we introduced a public mineral image dataset (available on the Kaggle platform: https://www.kaggle.com/datasets/asiedubrempong/minerals-identification-dataset, accessed on 20 September 2025), with the detailed sample distribution presented in Table 2. This dataset includes seven mineral categories: biotite, bornite, chrysocolla, malachite, muscovite, pyrite, and quartz. Compared with the self-constructed iron ore dataset, this public dataset exhibits greater cross-mineral diversity in terms of mineral species, colors, and texture morphologies. Additionally, its imaging conditions and collection backgrounds differ from those of the iron ore scenario, making it a powerful supplement for evaluating the model’s cross-scenario generalization ability. For consistency, we adopted the same training strategies and data augmentation methods as those used for the self-constructed iron ore dataset on this public dataset.

(3) Data Splitting and Data Augmentation

Both datasets follow the same splitting strategy: samples are divided into training, validation, and test sets in a 6:2:2 ratio. Despite the expansion of the self-constructed dataset, 1420 images still constitute a small-sample scale for deep learning, with residual class imbalance. To enhance model generalization and mitigate overfitting, data augmentation (DA) was applied to the training sets of both datasets.

The augmentation strategy retains the original images to preserve basic features while generating variants with appearance perturbations to enrich data distribution diversity. Specifically, it includes horizontal and vertical flipping to increase perspective variations, random small-angle rotation ranging from −15° to 15° to simulate the appearance of minerals with different placement or tilt states, and random optical perturbations with brightness and contrast adjusted within the range of 0.7–1.3 to simulate mineral appearances under varying lighting conditions.

After augmentation, the self-constructed iron ore dataset had 3642 training images, 284 validation images, and 284 test images, totaling 8520 images. The public mineral dataset had 5718 augmented training images, with the validation and test sets remaining unchanged from the original split. The comprehensive dataset construction and augmentation strategy addresses the key challenges of limited sample size and class imbalance commonly encountered in mineral classification tasks. Through systematic data collection, rigorous quality control, and targeted data augmentation, the final datasets provide a robust foundation for training deep learning models capable of accurate mineral identification across diverse geological samples and imaging conditions.

4.2. Experimental Settings

To ensure experimental reproducibility and efficiency, the following software and hardware configuration conforming to mainstream deep learning research standards was adopted: The experimental graphics processing unit was an NVIDIA RTX 4090 manufactured by NVIDIA Corporation headquartered in Santa Clara, United States paired with an Intel Xeon Gold 6330 central processing unit produced by Intel Corporation based in Santa Clara, United States 128GB memory and the Ubuntu 22.04 LTS operating system. Software implementation relied on PyTorch 2.1.0 Scikit-learn 1.3.2 and Python 3.9.16.

The key parameters of EDF-NSDE were determined based on the method description in Section 3 and heuristic tuning to balance performance and computational efficiency. The hyperparameter configuration includes a population size of 30 individuals, maximum iterations of 30 generations, mutation probability of 0.25, crossover probability of 0.8, differential evolution weight of 0.9, and accuracy weight of 1.5. These parameters were carefully selected to ensure optimal convergence speed while maintaining population diversity and search efficiency.

4.3. Evaluation Metrics

Using logistic regression as the classifier to validate the effectiveness of the optimal fusion scheme, the core metrics are as follows:

A c c u r a c y = \frac{1}{N_{t e}} \sum_{i = 1}^{N_{t e}} I ({\hat{y}}_{i} = y_{i})

(27)

P r e c i s i o n = \frac{T P}{T P + F P}

(28)

R e c a l l = \frac{T P}{T P + F N}

(29)

F 1 = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(30)

where

T P

(True Positives) represents the number of positive instances correctly predicted by the model,

F P

(False Positives) is the count of negative instances incorrectly predicted as positive, and

F N

(False Negatives) represents the number of positive instances incorrectly predicted as negative.

Given the practical requirements of mineral classification in mining operations, where accurate identification is crucial for production efficiency and resource optimization, this study primarily focuses on accuracy as the primary evaluation metric. While precision, recall, and F1-score provide valuable insights into model performance, accuracy serves as the most direct indicator of overall classification effectiveness for the multi-class mineral identification task.

4.4. Experimental Results and Analysis

4.4.1. Performance Comparison Between EDF-NSDE and Other Fusion Baselines

As shown in Table 3, EDF-NSDE demonstrates superior performance across all evaluation metrics. On the original test set, EDF-NSDE achieves an accuracy of 84.86% and F1-score of 84.76%, outperforming all eight other models. On the data-augmented test set, EDF-NSDE attains an accuracy of 88.38%, precision of 87.05%, recall of 88.76%, and F1-score of 87.86%, achieving optimal or tied optimal performance across all four metrics. These results indicate that evolutionary-driven multi-view feature fusion effectively enhances classification reliability and robustness.

Regarding model complexity, EDF-NSDE has a speed of 11.5G FLOPs and 29.31M Params, comparable to high-capacity models like InceptionV3 (11.5G and 29.29M) and ResNet50 (8.27G and 26.14M). Despite this moderate complexity, its performance gains are substantial: a 2.11% accuracy improvement over DenseNet121 (the best single model on original data) and a 1.70% F1-score improvement over InceptionV3. This trade-off is justified as evolutionary fusion integrates complementary representations from diverse base learners while mitigating single model biases. For industrial real-time systems, EDF-NSDE’s complexity is deployable on edge devices with moderate computing resources, making it feasible for on-site mineral detection where precise classification outweighs marginal complexity increases.

On the original dataset, EDF-NSDE achieves significant improvements over individual models. Compared to the best single model DenseNet121 (accuracy 82.75%), EDF-NSDE demonstrates an accuracy improvement of 2.11%. Relative to InceptionV3 (F1-score 83.06%), EDF-NSDE achieves an F1-score improvement of 1.70%. Compared to lightweight models, EDF-NSDE shows substantial gains: 5.63% over MobileNetV3 (accuracy 79.23%), 4.93% over ResNet50 (accuracy 79.93%), and 7.75% over AlexNet (accuracy 77.11%). These improvements demonstrate that evolutionary search effectively integrates complementary representations from diverse base learners while mitigating performance bottlenecks caused by single-model bias.

After data augmentation, EDF-NSDE maintains its competitive advantage. Compared to the best single model ResNet50 (accuracy 87.32% and F1-score 86.62%), EDF-NSDE achieves improvements of 1.06 and 1.24% in accuracy and F1-score, respectively. Relative to DenseNet121 (accuracy 86.97% and F1-score 86.77%), the improvements are 1.41 and 1.09%, respectively. The recall improvement is particularly notable, with EDF-NSDE achieving 88.76% compared to ResNet50’s 86.74%, indicating more robust detection of minority classes. These results suggest that the complementarity of multi-view features is further amplified after augmentation as evolutionary search automatically identifies sub-models with different sensitivity to augmentation and adaptively configures operators and weights.

The performance analysis reveals distinct advantages across different model categories. Compared to lightweight models (ShuffleNetV2 and MobileNetV3), EDF-NSDE achieves larger F1-score improvements, indicating better inter-class separation on decision boundaries of easily confused classes. Compared to high-capacity models (ResNet50 and InceptionV3), EDF-NSDE demonstrates more stable advantages in recall and F1-score, reflecting superior generalization with data distribution changes. The method’s ability to maintain higher consistency (precision) and coverage (recall) under noise perturbation conditions and appearance variations demonstrates the effectiveness of the evolutionary multi-objective optimization framework in balancing performance and complexity constraints.

Unlike conventional large convolutional neural networks (CNNs) that demand computationally intensive end-to-end fine-tuning, EDF-NSDE operates exclusively at the feature level and requires only a lightweight linear classifier to render the final decision. This design effectively minimizes computational and storage overhead, making it highly suitable for online inference and rapid iteration in industrial environments with limited resources. Figure 5 illustrates how the selected backbones first extract features in parallel, followed by being subjected to the sequential element by element fusion steps detailed in the figure caption. This figure depicts the optimal fusion architecture identified using the original dataset and demonstrates how EDF-NSDE automatically acquires efficient feature combination strategies.

To quantitatively evaluate the fused-feature classifier, we analyzed its confusion matrix (refer to Figure 6). The model attains an overall accuracy of approximately 86.62%, indicating strong capability in distinguishing the six classes.

At the class level, Class 0 achieves a precision of 89.66% and a recall of 87.64%, reflecting robust recognition. Class 3 also performs well with a precision of 88.04% and a recall of 84.38%. Classes 2, 4, and 5 exhibit even higher precision (85.71%, 92.31%, and 91.67%) and recall (96.00%, 88.89%, and 95.65%), suggesting very high separability for these categories. In contrast, Class 1 shows comparatively lower precision (73.91%) and recall (70.83%), with notable confusion against Classes 0 and 3—likely due to intrinsic feature similarities. Minor misclassifications are also observed between Classes 2 and 3, as well as among Classes 0, 1, and 3, pointing to avenues for enhancing the feature discriminability of these specific class pairs. Overall, the fused-feature model delivers strong and balanced performance across most classes.

To intuitively evaluate whether the feature representations learned by each model have good class separability, we used t-SNE to reduce high-dimensional test features to two dimensions and compare the distributions of AlexNet, DenseNet121, EfficientNet, and EDF-NSDE features in the same plot. This visualization can demonstrate intra-class compactness and inter-class separation, thereby verifying whether the fusion strategy effectively improves representation quality and helping to identify major confusion classes for subsequent model improvement and ablation analysis.

As observed from Figure 7, for AlexNet (Figure 7a), the feature distribution is relatively scattered with significant overlaps between clusters, and inter-class boundaries are unclear. Specifically, hematite and ilmenite, as well as magnetite, show obvious overlapping phenomena; pyrite and rock are relatively independent but still have some scattered points close to other classes. For DenseNet121 (Figure 7b), the class structure begins to separate, and tighter clusters emerge. Magnetite, pyrite, and rock form relatively tight clusters with clearer boundaries, while hematite and ilmenite still overlap, remaining a pair of difficult-to-distinguish classes. For EfficientNet (Figure 7c), the overall separability is further improved. Multiple classes are distributed along manifolds, and the clusters become tighter compared with DenseNet121. Pyrite and rock are relatively concentrated with good inter-class separation; the boundaries between limonite and magnetite are significantly clearer than those in AlexNet.

For our EDF-NSDE model (Figure 7d), the features present the tightest class clusters and the best inter-class separation. Pyrite (purple) and rock (brown) are completely independent, and limonite (green) forms clear and compact clusters on the right side of the figure. The aggregation of hematite (blue) is significantly improved, but there is still local overlap with ilmenite (orange) and some magnetite (red), which is the main source of classification confusion. Visually, the features of EDF-NSDE have the largest inter-class intervals and the smallest intra-class variance, indicating that the proposed fusion strategy effectively improves the separability and robustness of feature representations.

4.4.2. Ablation Study

From Table 4, EDF-NSDE achieves the highest performance on the original dataset with accuracy of 84.86% and F1-score of 84.76%, demonstrating the effectiveness of the integrated optimization framework. The ablation study reveals the individual contributions of each component to the overall performance.

Single-objective optimization results in performance degradation of 2.82 and 2.95% in accuracy and F1-score, respectively, compared to the complete EDF-NSDE. This substantial decline demonstrates that multi-objective optimization, which simultaneously considers accuracy, model count, and the generalization gap, is significantly superior to pursuing accuracy alone, confirming the effective balancing role of NSGA-II in the optimization process.

Removing differential evolution leads to the largest performance decline, with accuracy and F1-score decreasing by 3.52% and 3.65%, respectively. This result indicates that differential evolution contributes most substantially to exploring solution diversity and enhancing search efficiency, making it a critical component of the optimization framework.

The removal of the generalization penalty results in relatively modest performance decreases of 1.76 and 1.73% in accuracy and F1-score, respectively. Although the quantitative impact is smaller, the generalization penalty plays a crucial role in improving model generalization and effectively suppressing overfitting, particularly important for robust performance in real-world scenarios.

Random initialization leads to performance degradation of 2.74 and 1.81% in accuracy and F1-score, respectively. This result demonstrates that heuristic initialization provides a superior starting point for the evolutionary process, accelerating convergence and improving the quality of the final solution.

The fixed mutation strategy results in performance decreases of 3.17% and 3.19% in accuracy and F1-score, respectively. This outcome confirms that adaptive mutation is superior to fixed strategies as it dynamically adjusts the exploration–exploitation balance according to different evolutionary stages.

In summary, all components contribute substantially to the overall performance, with differential evolution and adaptive mutation having the greatest impact, followed by multi-objective optimization and heuristic initialization. The generalization penalty, while showing smaller quantitative improvements, stably enhances generalization capability. The complete EDF-NSDE achieves optimal performance through the synergistic integration of multi-objective optimization, differential evolution, generalization regularization, heuristic initialization, and adaptive mutation strategies.

4.4.3. Training Dynamics, Overfitting Analysis, and Generalization Performance

Small-sample scenarios tend to exacerbate the risk of overfitting, especially for fusion strategies integrating multiple high-capacity backbones. Although EDF-NSDE enhances recognition accuracy, its training behavior requires verification to confirm that the performance gains do not originate from memorization of the limited training set. Therefore, we monitored the joint evolution of loss and accuracy to characterize the convergence and generalization performance at the given data scale.

Figure 8a demonstrates that the training loss declines sharply in the initial epochs and stabilizes near 0.04, while the validation loss reaches its minimum at approximately epoch 15 (around 0.52) before oscillating within a narrow range of

0.55 \pm 0.03

. The limited separation between the two curves indicates that EDF-NSDE rapidly internalizes the training distribution while mitigating the generalization gap; no divergence in the validation loss is detected after the early optimum, suggesting that extreme overfitting is avoided despite the small sample size.

Figure 8b corroborates these findings in terms of accuracy. The training accuracy saturates at close to

100 %

within five epochs, whereas the validation accuracy improves more gradually and stabilizes between

87 %

and

89 %

. The persistent gap of approximately 10 percentage points quantifies the extent of residual overfitting, but the plateaued validation curve illustrates stable predictive capability on unseen data. Taken together, the two trajectories confirm that EDF-NSDE converges efficiently while retaining acceptable generalization performance in the small sample regime.

To address the generalization success of the proposed method for different sample sizes, we conducted comprehensive experiments using training ratios of 100%, 60%, and 40% of the original training set. The experiments were performed with stratified sampling to maintain the class distribution, ensuring fair comparison across different sample sizes. As illustrated in Figure 9, the method demonstrates robust performance across all sample sizes.

As shown in Figure 9a, when reducing training samples from 100% (3,642 samples) to 40% (1456 samples), the test accuracy decreased by 0.03 (3.40%), from 0.8827 to 0.8527. Similarly, the test F1-score decreased from 0.8781 to 0.8513, representing a reduction of 0.0268 (3.05%). The test accuracy at 60% training ratio (2,185 samples) reached 0.8692, demonstrating that the method maintains strong performance even with moderate sample reduction. These results demonstrate strong robustness of the proposed method across different training sample sizes.

4.4.4. Performance Comparison Between EDF-NSDE and Fusion Baseline Methods

Different fusion operators exhibit distinct inductive biases for feature statistics, including amplitude distribution, outlier robustness, and discriminative peak preservation, which directly influence downstream classification boundaries and generalization performance. To systematically evaluate operator compatibility and performance differences, we conducted controlled experiments by replacing fusion operators while maintaining identical model combinations and training configurations.

We selected four representative sub-models as base classifiers: DenseNet121, MobileNetV2, EfficientNet-B0, and MobileNetV3. Following the existing multi-operator fusion baseline scheme with the operator sequence (Max, Min, Max) and corresponding weight matrix [[0.192, 0.808], [0.684, 0.316], [0.583, 0.417]], we conducted single-operator fusion control experiments. These experiments isolated the influence of optimizer and weight search by changing only the terminal fusion operator to Add, Max, Min, or Avg while keeping all other training and validation settings consistent.

The results in Table 5 reveal significant performance differences among fusion operators. Min and Avg operators substantially outperform Add and Max operators in both accuracy and F1-score. Specifically, Min achieves 82.92% accuracy and 83.17% F1-score, while Avg attains 82.89% accuracy and 82.86% F1-score. In contrast, Max and Add operators achieve 82.29%/81.98% and 81.75%/80.48% in accuracy and F1-score, respectively.

These results indicate that for the current sub-model output distribution, more conservative aggregation strategies (Min) and smoother mean aggregation (Avg) better suppress individual model overconfidence and noise bias. Conversely, Max and Add operators are more susceptible to influence from single high-confidence but erroneous sub-models, leading to more significant F1-score degradation.

Although single-operator fusion already provides robust performance gains, the evolutionary search-based EDF-NSDE scheme achieves superior results, with 84.51% accuracy and 84.39% F1-score, exceeding the optimal single operator (Min) by approximately 1.6% and 1.2% in accuracy and F1-score, respectively. This advantage demonstrates that across different categories and sample difficulties, the optimal fusion strategy exhibits sample-adaptive characteristics, favoring conservative aggregation for easily confused categories while moderately amplifying the contribution of advantageous sub-models for high-confidence samples. This adaptive behavior is difficult to achieve with fixed single operators, highlighting the effectiveness of the evolutionary optimization approach.

4.4.5. Comparative Experiments on Public Datasets

To further validate the generalization capability of the proposed EDF-NSDE method, we conducted comprehensive comparisons with various other models on the data-augmented Kaggle 7-class mineral identification dataset. This cross-dataset evaluation demonstrates the method’s ability to adapt to different mineral types and imaging conditions beyond the original iron ore classification task. Evaluation metrics include accuracy, precision, recall, and F1-score, ensuring equal consideration of contributions from all classes under imbalanced conditions. As shown in Table 6, EDF-NSDE selected models DenseNet121, ResNet50, and MobileNetV2 with fusion operators Max and Min and fusion weights [[0.624,0.376], [0.871,0.129]], achieving optimal performance across all four metrics: accuracy 92.51%, precision 92.62%, recall 91.83%, and F1-score 92.44%.

Compared to the best single model MobileNetV2 (accuracy 91.98% and F1-score 91.03%), EDF-NSDE demonstrates improvements of 0.53% in accuracy and 1.41% in F1-score. Compared to EfficientNet-B0 (accuracy 89.84% and F1-score 89.35%), EDF-NSDE achieves improvements of 2.67 and 3.09% in accuracy and F1-score, respectively. Compared to lightweight models and early architectures, EDF-NSDE exhibits more substantial advantages, particularly with F1-score improvement over AlexNet (83.07%) exceeding 9.37%. These results indicate that the fusion strategy significantly mitigates single model performance degradation on minority classes, demonstrating the effectiveness of the evolutionary multi-objective optimization approach in handling class imbalance scenarios.

The above benefits mainly stem from two points: first, the complementarity between multi-view features is explicitly modeled and fully exploited during evolutionary fusion, forming robust discriminative representations across texture, morphology, and color dimensions; second, the joint search mechanism of NSGA-II and differential evolution automatically selects more representative sub-model sets and fusion operator sequences under multi-objective constraints of performance and complexity, thereby maintaining stable advantages for different class distributions and interference conditions. Overall, EDF-NSDE not only improves overall accuracy on average but also enhances macro precision and recall, indicating more balanced recognition capability and stronger generalization across all classes.

5. Conclusions

This work addresses the practical industrial demand for multi-category iron ore image classification under small-sample conditions, targeting limitations such as insufficient expressive power of single-view features and poor adaptability of manual/fixed-rule fusion. We propose an evolutionary deep fusion framework (EDF-NSDE) that jointly optimizes multi-view feature selection, fusion operators, and fusion weights, achieving a balance between classification accuracy and model complexity. By extracting complementary features from multiple deep neural networks and leveraging multi-objective search integrating NSGA-II and differential evolution, the framework adaptively fuses feature representations while suppressing overfitting in small-sample scenarios. Compared to other methods, EDF-NSDE exhibits superior performance and more stable classification consistency across diverse data regimes, thereby advancing smart manufacturing and industrial internet technologies.

To support rigorous evaluation, we constructed a six-category iron ore dataset derived from real open-pit mining environments, supplementing the scarcity of public iron ore data resources. Combined with targeted data augmentation to mitigate limited sample size and class imbalance, this dataset lays a solid foundation for method validation. The experimental results confirm that EDF-NSDE achieves higher classification accuracy than other models on the self-constructed dataset, with more stable performance on confusing ore types; its good generalization on a public mineral dataset further verifies that the multi-view feature and evolutionary fusion strategy effectively improve the overall reliability of iron ore image classification.

Future work will center on three directions oriented toward commercial applications: developing an end-to-end evolutionary fusion pipeline to simplify the integration into commercial on-site mineral detection equipment; introducing reinforcement learning-based adaptive multi-objective weighting to meet customized demands of diverse commercial scenarios (e.g., accuracy-prioritized high value sorting, optimized for latency real-time monitoring); and extending the framework to multimodal sensor fusion with enriched data sources, aiming to build a versatile commercial tool for mineral exploration, sorting, and quality control across the global mining industry.

Author Contributions

Conceptualization, X.Q. and D.Z.; methodology, D.Z., Y.Q. and S.Z.; software, D.Z.; validation, Y.Z.; formal analysis, D.Z. and Y.Q.; data curation, Y.Z. and C.S.; writing—original draft preparation, C.S. and D.Z.; visualization, D.Z.; supervision, X.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Funds of the National Natural Science Foundation of China (Grant Nos.62522309) and the Major Program of the National Natural Science Foundation of China (72192835).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Y.; Zhang, Z.; Liu, X.; Lei, W.; Xia, X. Ore image classification based on small deep learning model: Evaluation and optimization of model depth, model structure and data size. Miner. Eng. 2021, 172, 107020. [Google Scholar] [CrossRef]
Patel, A.K.; Chatterjee, S.; Gorai, A.K. Development of machine vision-based ore classification model using support vector machine (SVM) algorithm. Arab. J. Geosci. 2017, 10, 107. [Google Scholar] [CrossRef]
Singh, V.; Rao, S.M. Application of image processing and radial basis neural network techniques for ore sorting and ore classification. Miner. Eng. 2005, 18, 1412–1420. [Google Scholar] [CrossRef]
Perez, C.A.; Estévez, P.A.; Vera, P.A.; Castillo, L.E.; Aravena, C.M.; Schulz, D.A.; Medina, L.E. Ore grade estimation by feature selection and voting using boundary detection in digital image analysis. Int. J. Miner. Process. 2011, 101, 28–36. [Google Scholar] [CrossRef]
Ciregan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 3642–3649. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar] [CrossRef]
Zhou, W.; Wang, H.; Wan, Z. Ore image classification based on improved CNN. Comput. Electr. Eng. 2022, 99, 107819. [Google Scholar] [CrossRef]
Desta, F.; Buxton, M. Image and point data fusion for enhanced discrimination of ore and waste in mining. Minerals 2020, 10, 1110. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, Z.; Liu, X.; Lei, W.; Xia, X. Deep Learning Based Mineral Image Classification Combined with Visual Attention Mechanism. IEEE Access 2021, 9, 98091–98109. [Google Scholar] [CrossRef]
Donskoi, E.; Suthers, S.P.; Fradd, S.B.; Young, J.M.; Campbell, J.J.; Raynlyn, T.D.; Clout, J.M.F. Utilization of optical image analysis and automatic texture classification for iron ore particle characterisation. Miner. Eng. 2007, 20, 461–471. [Google Scholar] [CrossRef]
Kumar, U.; Mohapatra, S.; Sahoo, P.R. Iron ore image classification using deep learning. In Proceedings of the 2023 10th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 23–24 February 2023; pp. 698–703. [Google Scholar] [CrossRef]
Liu, J.; Zhang, J. DSSEMFF: A Depthwise Separable Squeeze-and-Excitation Based on Multi-feature Fusion for Image Classification. Sens. Imaging 2022, 23, 16. [Google Scholar] [CrossRef]
Rao, D.; Wu, X.-J.; Xu, T.; Chen, G. Unsupervised Image Fusion Method based on Feature Mutual Mapping. arXiv 2022, arXiv:2201.10152. [Google Scholar] [CrossRef]
Bi, Y.; Xue, B.; Zhang, M. Genetic programming-based evolutionary deep learning for data-efficient image classification. IEEE Trans. Evol. Comput. 2022, 28, 307–322. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Jiang, D.; Kong, L.; Wang, H.; Pan, D.; Li, T.; Tan, J. Precise control mode for concrete vibration time based on attention-enhanced machine vision. Autom. Constr. 2024, 158, 105232. [Google Scholar] [CrossRef]
Delbem, I.D.; Galéry, R.; Brandão, P.R.G.; Peres, A.E.C. Semi-automated iron ore characterisation based on optical microscope analysis: Quartz/resin classification. Miner. Eng. 2015, 82, 2–13. [Google Scholar] [CrossRef]
Chen, H.; Xia, M.; Zhang, Y.; Zhao, R.; Song, B.; Bai, Y. Iron Ore Information Extraction Based on CNN-LSTM Composite Deep Learning Model. IEEE Access 2025, 13, 42296–42311. [Google Scholar] [CrossRef]
Cai, Y.; Xu, D.; Shi, H. Rapid identification of ore minerals using multi-scale dilated convolutional attention network associated with portable Raman spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 267 Part 2, 120607. [Google Scholar] [CrossRef] [PubMed]
Jiang, D.; Wang, H.; Li, T.; Gouda, M.A.; Zhou, B. Real-time tracker of chicken for poultry based on attention mechanism-enhanced YOLO-Chicken algorithm. Comput. Electron. Agric. 2025, 237, 110640. [Google Scholar] [CrossRef]
Fang, B.; Liu, Y.; Zhang, H.; He, J. Hyperspectral image classification based on 3D asymmetric inception network with data fusion transfer learning. Remote Sens. 2022, 14, 1711. [Google Scholar] [CrossRef]
Wei, P.; Zou, R.; Gan, J.; Li, Z. Hybrid Algorithms Based on Two Evolutionary Computations for Image Classification. Biomimetics 2025, 10, 544. [Google Scholar] [CrossRef]
Kaur, M.; Singh, D. Multi-modality medical image fusion technique using multi-objective differential evolution based deep neural networks. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 2483–2493. [Google Scholar] [CrossRef]
Yasar, A.; Golcuk, A. Deep learning and evolutionary intelligence with fusion-based feature extraction for classification of wheat varieties. Eur. Food Res. Technol. 2025, 251, 1603–1616. [Google Scholar] [CrossRef]
Huang, X.; Pan, R.; Wang, J. A Graphite Ore Grade Recognition Method Based on Improved Inception-ResNet-v2 Model. IEEE Access 2025, 13, 48007–48018. [Google Scholar] [CrossRef]
Yurdakul, M.; Uyar, K.; Taşdemir, Ş. Enhanced ore classification through optimized CNN ensembles and feature fusion. Iran. J. Comput. Sci. 2025, 8, 491–509. [Google Scholar] [CrossRef]
Zhang, Y.; Cheng, L.; Peng, Y.; Xu, C.; Fu, Y.; Wu, B.; Sun, G. FasterOreFSDet: A Lightweight and Effective Few-shot Object Detector for Ore Images. arXiv 2023. [Google Scholar] [CrossRef]
Liang, X.; Fu, P.; Qian, Y.; Guo, Q.; Liu, G. Trusted Multi-View Classification via Evolutionary Multi-View Fusion. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025; Available online: https://openreview.net/forum?id=M3kBtqpys5 (accessed on 10 September 2025).
Liang, X.; Guo, Q.; Qian, Y.; Ding, W.; Zhang, Q. Evolutionary Deep Fusion Method and its Application in Chemical Structure Recognition. IEEE TRansactions Evol. Comput. 2021, 25, 883–893. [Google Scholar] [CrossRef]
Wu, B.; Xiong, H.; Zhuo, L.; Xiao, Y.; Yan, J.; Yang, W. Multi-view BLUP: A promising solution for post-omics data integrative prediction. J. Genet. Genom. 2025, 52, 839–847. [Google Scholar] [CrossRef]
Xia, Y.; Song, S.; Hou, Z.; Xu, J.; Zou, J.; Liu, Y.; Yang, S. An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition. arXiv 2024, arXiv:2403.18208. [Google Scholar] [CrossRef]
Huang, Y.; Cao, X.; Zhou, B.; Li, W.; Yang, S.; Shafi, S.M.; Yang, Z. A deep reinforcement learning-guided multimodal multi-objective evolutionary algorithm with a serial-parallel mechanism. Expert Syst. Appl. 2026, 298 Part B, 129581. [Google Scholar] [CrossRef]
Chen, J.; Cui, Y.; Wei, C.; Polat, K.; Alenezi, F. Advances in EEG-based emotion recognition: Challenges, methodologies, and future directions. Appl. Soft Comput. 2025, 180, 113478. [Google Scholar] [CrossRef]
Hu, F.; He, K.; Wang, C.; Zheng, Q.; Zhou, B.; Li, G.; Sun, Y. STRFLNet: Spatio-Temporal Representation Fusion Learning Network for EEG-Based Emotion Recognition. IEEE Trans. Affect. Comput. 2025, 1–16. [Google Scholar] [CrossRef]
Qian, Y.; Su, L.; Xu, M.; Tang, L. Enhanced Protein Secondary Structure Prediction Through Multi-View Multi-Feature Evolutionary Deep Fusion Method. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 3352–3363. [Google Scholar] [CrossRef]
Nesteruk, S.; Agafonova, J.; Pavlov, I.; Gerasimov, M.; Latyshev, N.; Dimitrov, D.; Kuznetsov, A.; Kadurin, A.; Plechov, P. MineralImage5k: A benchmark for zero-shot raw mineral visual recognition and description. Comput. Geosci. 2023, 178, 105414. [Google Scholar] [CrossRef]

Figure 1. Method framework diagram.

Figure 2. Encoding and decoding.

Figure 3. Flowchart of iron ore dataset construction and comparison of class distribution.

Figure 4. Image examples from iron ore dataset.

Figure 5. EDF-NSDE optimal fusion tree on original dataset: backbones extract features in parallel, followed by sequential element-wise fusion.

Figure 6. EDF-NSDE confusion matrix.

Figure 7. T-SNE visualization of feature separability across different models (a) AlexNet: Dispersed clusters with significant overlaps. (b) DenseNet121: Emergence of tighter clusters. (c) EfficientNet: Further improved separability. (d) EDF-NSDE (Ours): The tightest clusters and best separation.

Figure 8. (a) Loss analysis for EDF-NSDE. (b) Accuracy analysis for EDF-NSDE.

Figure 9. Generalization analysis for different training sample sizes. (a) Test accuracy and F1-score performance across different training ratios. (b) Number of training samples used for each training ratio.

Table 1. Iron ore dataset sample distribution.

Category	ID	Original	Merged	DA
Hematite	0	454	472	2832
Ilmenite	1	82	114	684
Limonite	2	94	131	786
Magnetite	3	402	456	2736
Pyrite	4	89	122	732
Rock	5	-	125	750
Total	-	-	1420	8520

Table 2. Public ore dataset sample distribution.

Category	ID	Original	DA
Biotite	0	68	408
Bornite	1	171	1026
Chrysocolla	2	162	972
Malachite	3	235	1410
Muscovite	4	77	462
Pyrite	5	98	588
Quartz	6	142	852
Total	-	953	5718

Table 3. Performance comparison between EDF-NSDE and other models.

Methods	Original Data (%)		Data Augmentation (%)				FLOPs	Params
	Accuracy	F1	Accuracy	Precision	Recall	F1	(G)	(M)
ResNet50	79.93	79.62	87.32	86.58	86.74	86.62	8.27	26.14
DenseNet121	82.75	82.72	86.97	87.05	86.61	86.77	5.80	8.27
EfficientNet-B0	81.34	81.22	82.39	82.02	81.13	81.54	0.82	5.65
MobileNetV2	81.69	81.53	78.17	79.30	71.90	74.00	0.64	3.87
MobileNetV3	79.23	79.17	83.80	83.68	82.56	83.01	0.46	4.21
ShuffleNetV2	78.17	78.40	83.10	84.30	83.05	83.58	0.31	2.57
AlexNet	77.11	76.80	77.46	77.35	75.19	76.10	1.32	2.81
InceptionV3	82.10	83.06	79.93	80.79	78.54	79.12	11.5	29.29
EDF-NSDE	84.86	84.76	88.38	87.05	88.76	87.86	11.5	29.31

Table 4. Ablation study results.

Methods	Accuracy (%)	F1 (%)
Single-objective optimization	82.04	81.81
Without differential evolution	81.34	81.11
Without generalization penalty	83.1	83.03
Random initialization	82.12	82.95
Fixed mutation	81.69	81.57
EDF-NSDE	84.86	84.76

Table 5. Single operator fusion performance comparison.

Methods	Accuracy (%)	F1 (%)
Add	81.75	80.48
Max	82.29	81.98
Min	82.92	83.17
Avg	82.89	82.86
EDF-NSDE	84.51	84.39

Table 6. Performance comparison between EDF-NSDE and other models on public dataset.

Methods	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
ResNet50	88.24	88.6	85.5	85.5
DenseNet121	87.7	88.89	84.53	84.92
EfficientNet-B0	89.84	91.23	88.95	89.35
MobileNetV2	91.98	91.71	91.05	91.03
MobileNetV3	87.7	87.51	86.78	86.56
ShuffleNetV2	89.3	88.65	87.54	87.54
AlexNet	85.56	83.11	83.95	83.07
InceptionV3	86.1	86.48	85.22	85.43
EDF-NSDE	92.51	92.62	91.83	92.44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, D.; Qian, X.; Shi, C.; Zhang, Y.; Qian, Y.; Zhou, S. Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method. Future Internet 2025, 17, 553. https://doi.org/10.3390/fi17120553

AMA Style

Zhang D, Qian X, Shi C, Zhang Y, Qian Y, Zhou S. Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method. Future Internet. 2025; 17(12):553. https://doi.org/10.3390/fi17120553

Chicago/Turabian Style

Zhang, Di, Xiaolong Qian, Chenyang Shi, Yuang Zhang, Yining Qian, and Shengyue Zhou. 2025. "Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method" Future Internet 17, no. 12: 553. https://doi.org/10.3390/fi17120553

APA Style

Zhang, D., Qian, X., Shi, C., Zhang, Y., Qian, Y., & Zhou, S. (2025). Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method. Future Internet, 17(12), 553. https://doi.org/10.3390/fi17120553

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Iron Ore Image Recognition Through Multi-View Evolutionary Deep Fusion Method

Abstract

1. Introduction

2. Related Work

2.1. Multi-View Feature Extraction

2.2. Evolutionary Deep Fusion

3. Methodology

3.1. Overall Framework

3.2. Multi-View Feature Extraction and Preprocessing

3.2.1. Feature Extraction

3.2.2. Feature Preprocessing

3.3. Evolutionary Fusion Network Design

3.3.1. Individual Encoding and Decoding

3.3.2. Multi-Objective Fitness Function Design

3.3.3. Multi-Objective Optimization and Differential Evolution Strategy

4. Experiments

4.1. Dataset Description

4.2. Experimental Settings

4.3. Evaluation Metrics

4.4. Experimental Results and Analysis

4.4.1. Performance Comparison Between EDF-NSDE and Other Fusion Baselines

4.4.2. Ablation Study

4.4.3. Training Dynamics, Overfitting Analysis, and Generalization Performance

4.4.4. Performance Comparison Between EDF-NSDE and Fusion Baseline Methods

4.4.5. Comparative Experiments on Public Datasets

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI