Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design

Dong, Haipei; Wang, Fuli; He, Dakuo; Liu, Yan

doi:10.3390/min15040353

Open AccessArticle

Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design

¹

College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China

²

College of Information Science and Engineering, Northeastern University, Shenyang 110004, China

^*

Author to whom correspondence should be addressed.

Minerals 2025, 15(4), 353; https://doi.org/10.3390/min15040353

Submission received: 17 February 2025 / Revised: 21 March 2025 / Accepted: 24 March 2025 / Published: 27 March 2025

(This article belongs to the Special Issue Application of Machine Learning in Mining, Mineral Processing and Extractive Metallurgy)

Download

Browse Figures

Versions Notes

Abstract

The intelligentization of flotation design plays a crucial role in enhancing industrial competitiveness and resource efficiency. Our previous work established a mapping relationship between the flotation backbone process graph and label vectors, enabling an intelligent design of the copper flotation backbone process through multilabel classification. Due to the insufficient quantity of training samples in historical databases, traditional feature selection methods perform poorly, owing to insufficient learning. To address the label-specific feature selection problem for this design, this study proposes correlation and knowledge-based joint feature selection (CK-JFS). In this proposed method, label correlations ensure that features specific to strongly related labels are prioritized, while domain knowledge further refines the selection process by applying specialized knowledge to copper flotation. This mode of data and knowledge integration significantly reduces the reliance of label-specific feature selection on the number of training samples. The results demonstrate that CK-JFS achieves significantly higher accuracy and computational efficiency compared to traditional multilabel feature selection algorithms in the context of copper flotation backbone process design.

Keywords:

multilabel classification; label-specific features; copper flotation; label correlation; domain knowledge

1. Introduction

Copper flotation plays a vital role in mineral processing and has long been a key focus of academic research [1,2]. Our previous study [3,4] mentioned that copper flotation backbone process design refers to training a multilabel classifier using a copper flotation history database such that the classifier can convert the input features of a new copper flotation sample into a prediction label vector. The prediction label vector can be transformed into a flotation backbone process diagram. The offline modeling and online application process of the intelligent design system are shown in Figure 1. A multilabel classifier usually suffers from “the curse of dimensionality”, which increases computational costs and model complexity [5]. Feature selection is an effective tool for preventing overfitting and improving the accuracy of model performance [6,7,8,9,10]. The input features of a new copper flotation sample were obtained by detecting the natural properties of copper ore. Thus, screening unnecessary features can save human labor and financial resources consumed in the detection process. This study filters the original features before training the multilabel model, which is highlighted in the blue box in Figure 1.

Previously, multilabel feature selection has primarily selected the common feature space shared by all sublabels [11,12,13,14,15], but the discriminative features are different for each label. Therefore, label-specific features have been proposed. Label-specific feature selection refers to learning discriminative features relative to each label from the original features. The objective function of efficient and robust feature selection introduces the L₂₁ norm based on the loss function to guide feature sparsity and realize the selection (ERFS) [16] of label-specific features. Studies [17,18,19] have reported that exploiting label correlation can improve multilabel classifier performances, such as learning label-specific features (LLSF) [20] and joint feature selection and classification (JFSC) [21]. By incorporating a feature-sharing function into the objective function of a linear classifier, LLSF prevents two weakly correlated labels from sharing similar label-specific features. JFSC introduces the Fisher discriminant method based on LLSF to realize a large inter-class distance between positive and negative examples and a small inner-class distance between positive and negative examples. However, the label correlation functions of LLSF and JFSC cannot punish the condition in which the label-specific features of two strongly correlated labels are dissimilar. Furthermore, current multilabel feature selection algorithms cannot use domain knowledge. In the design of the copper flotation process, although there is a certain accumulation of domain knowledge, it is not exhaustive, and the number of historical database cases is limited. The combination of data and knowledge is expected to overcome the difficulty of “insufficient learning” and significantly improve the effect of label-specific feature selection.

In response to the deficiencies of the existing methods described above, the novel contributions of this paper are as follows: (1) the proposed CK-JFS utilizes the label correlation of training set samples through a correlation-based function, which renders the label-specific features of two strongly correlated labels similar; (2) the proposed CK-JFS fully utilizes domain knowledge. The knowledge-based function makes the label-specific features conform to the domain knowledge.

The remainder of this paper is organized as follows: In Section 2, we introduce the application background and theoretical basis of the proposed CK-JFS. In Section 3, we elaborate on the objective function and optimization method of the proposed CK-JFS. In Section 4, empirical evaluations are presented to demonstrate the superiority of the proposed CK-JFS by comparing it with benchmark multilabel feature selection algorithms. Finally, the conclusions are presented in Section 5.

2. Preliminaries

2.1. Notations

The original features are all detected as natural properties of copper ores in historical copper flotation samples. Suppose that N samples and M original features are present. Denote the feature matrix as X = [x_1:, x_2:, …, x_N:]^T ∈

R

^N×M, where x_n_: = [X_n₁, X_n₂, …, X_nM] ∈

R

^1×M is the feature vector of sample n. In the design of the copper flotation backbone process as a multilabel problem, each copper flotation sample can be assigned with multiple sublabels simultaneously [22,23]. Let Y = [y_1:, y_2:, …, y_N_:]^T ∈ {−1,1}^N×K be the label matrix, where K is the label number, and y_n_: = [Y_n₁, Y_n₂, …, Y_nK] ∈ {−1,1}^1×K is the label vector of sample n. Y_nk is the value of sample n at label k. If the sample n is positive at the label k, Y_nk = 1; otherwise, Y_nk = −1. Our purpose with the proposed CK-JFS is to optimize the coefficient matrix W = [w_:1, w_:2, …, w_:K] ∈

R

^M×K, where w_:k = [W_1k, W_2k, …, W_Mk] ∈

R

^M×¹ is the coefficient vector of label k, and W_mk is the coefficient of feature m to label k. A larger absolute value of W_mk means that the feature m is more important for the label k.

The historical database of N = 228 copper flotation samples was provided by a certain research institute. The samples have M = 85 original features and K = 15 sublabels.

2.2. Sublabels

Three tasks must be completed in the design: the product scheme, the flotation scheme, and the grinding scheme. The product scheme includes four sublabels, which indicate the presence of molybdenum, lead, zinc, and sulfur concentrates in the product. The flotation scheme includes six sublabels, which indicate whether copper requires asynchronous flotation and whether molybdenum, lead, zinc, sulfur, and copper require mixed flotation. The grinding scheme includes five sublabels, which indicate whether the grinding stage of molybdenum is more than two and whether the grinding stages of lead, zinc, sulfur, and copper exceed one. As aforementioned, the three tasks contain 15 sublabels [4]. Because each sublabel is a binary classification problem, samples that do not belong to a positive class are considered negative classes.

2.3. Original Features

Eighty-five original features are listed in Table 1. These features include chemical composition, phase analysis, ore content, gangue content, symbiosis relationship, and particle size. Among these, the symbiotic relationship is not a numerical feature that must be converted into numerical features. Thus, we used 0 to denote the symbiosis between the ore and gangue and 1 to denote that between the ore and other ores.

3. Proposed Approach

3.1. Objective Function

Aiming at local correlation and domain knowledge sensitivity, we optimize W such that the objective function, as shown in Equation (1), is minimized, where λ₁, λ₂, λ₃, and λ₄ are hyperparameters. The larger the

|W_{m k}|

is, the more important feature m is to label k.

O b j (W) = \frac{1}{2} L (W) + λ_{1} {‖W‖}_{1} + \frac{λ_{2}}{2} F (W) + \frac{λ_{3}}{4} Ψ (W) + λ_{4} Φ (W)

(1)

3.1.1. Non-Unique Terms

The first three items in Equation (1) are consistent with JFSC. The least-square loss has been widely used as a loss function, which can be illustrated as

{‖X W - Y‖}_{F}^{2}

, where

{‖\begin{matrix} \begin{matrix} \cdot \end{matrix} \end{matrix}‖}_{F}

denotes the Frobenius norm of the matrix.

{‖W‖}_{1}

is employed to model the sparsity of label-specific features, where

{‖\begin{matrix} \begin{matrix} \cdot \end{matrix} \end{matrix}‖}_{1}

denotes the L1 norm of the matrix. The Fisher discriminant F(W) is used to maximize the inter-class distances and minimize the intra-class distance. Suppose that

P_{k} = \{x_{n :} |Y_{n k} = 1\}

is a positive set and

N_{k} = \{x_{n :} |Y_{n k} = - 1\}

is a negative set.

{\bar{x}}_{k}^{+} = [{\bar{x}}_{k 1}^{+}, {\bar{x}}_{k 2}^{+}, \dots, {\bar{x}}_{k M}^{+}]

and

{\bar{x}}_{k}^{-} = [{\bar{x}}_{k 1}^{-}, {\bar{x}}_{k 2}^{-}, \dots, {\bar{x}}_{k M}^{-}]

are the mean vectors of positive and negative sets in the original feature space, where

{\bar{x}}_{k m}^{+} = \frac{1}{|P_{k}|} \sum_{x_{n :} \in P_{k}} X_{n m}

and

{\bar{x}}_{k m}^{-} = \frac{1}{|N_{k}|} \sum_{x_{n :} \in N_{k}} X_{n m}

.

{S_{k}^{+}}^{2} = {w_{: k}}^{T} d i a g (s_{k}^{+}) w_{: k}

and

{S_{k}^{-}}^{2} = {w_{: k}}^{T} d i a g (s_{k}^{-}) w_{: k}

are the intra-class distance of positive samples and negative samples, respectively, where

s_{k}^{+}

and

s_{k}^{-}

are calculated via Equations (2) and (3).

{S_{k}}^{2} = {w_{: k}}^{T} d i a g (s_{k}) w_{: k}

is the inter-class distance between positive and negative samples, where

s_{k}

is calculated using Equation (4).

s_{k}^{+} = \sum_{x_{n :} \in P_{k}} (x_{n :} - {\bar{x}}_{k}^{+}) ⊙ (x_{n :} - {\bar{x}}_{k}^{+})

(2)

s_{k}^{-} = \sum_{x_{n :} \in N_{k}} (x_{n :} - {\bar{x}}_{k}^{-}) ⊙ (x_{n :} - {\bar{x}}_{k}^{-})

(3)

s_{k} = ({\bar{x}}_{k}^{+} - {\bar{x}}_{k}^{-}) ⊙ ({\bar{x}}_{k}^{+} - {\bar{x}}_{k}^{-})

(4)

The derivative of L(W) with respect to W can be calculated using Equation (5), and the derivative of F(W) with respect to W can be calculated using Equation (6), where C ∈

R

^M^×K with column vector c:k is defined in Equation (7).

\frac{\partial L (W)}{\partial W} = 2 (X^{T} X W - X^{T} Y)

(5)

\frac{\partial F (W)}{\partial W} = 2 C ⊙ W

(6)

c_{: k} = s_{k}^{+} + r_{k} s_{k}^{-} - |P_{k}| s_{k}

(7)

3.1.2. Unique Terms

The last two terms of Equation (1) are unique to the proposed CK-JFS. For machine learning with small samples, weak label correlation often originates from noise and is not worth utilizing. The correlation-based function

Ψ (W)

is used to replace the correlation-based function in JFSC because the proposed

Ψ (W)

uses strong correlation, while the latter uses strong correlation. In addition, the knowledge-based function

Φ (W)

is introduced into the objective function to fully utilize the domain knowledge.

3.2. Correlation-Based Function

To render the prediction of two strongly positively correlated labels consistent to the extent possible, and the prediction of two strongly negatively correlated labels opposite to the extent possible, the correlation-based function is introduced.

To filter out weak label correlations, a correlation threshold, θ, is employed as a hyperparameter. Set a prediction matrix, F = XW. Set a diagonal matrix, A(1) ∈

R

^K^×K, with element A(1)_kk_’ defined as Equation (8), where Cor(k, k’) is the Pearson correlation coefficient between labels k and k’, as shown in Equation (9).

A {(1)}_{k k ’} = \{\begin{matrix} C o r (k, k ’) & \begin{matrix} k ’ \neq k & and & C o r (k, k ’) > θ \end{matrix} \\ - C o r (k, k ’) & \begin{matrix} k ’ \neq k & and & C o r (k, k ’) < - θ \end{matrix} \\ 0 & otherwise \end{matrix}

(8)

C o r (k, k ’) = C o r (k ’, k) = \frac{\sum_{n = 1}^{N} (Y_{n k} - \bar{y_{: k}}) (Y_{n k ’} - \bar{y_{: k ’}})}{\sqrt{\sum_{n = 1}^{N} {(Y_{n k} - \bar{y_{: k}})}^{2} \sum_{n = 1}^{N} {(Y_{n k ’} - \bar{y_{: k ’}})}^{2}}}

(9)

When Cor(k, k’) > θ, labels k and k’ are regarded as positively correlated, their corresponding prediction vectors f_:k and f_:k’ are similar, and the value of F_nk–F_nk_’ is expected to be close to 0; when Cor(k’, k) < −θ, labels k and k’ are regarded as negatively correlated, their corresponding prediction vectors, f_:k and f_:k’, are opposite, and the value of F_nk + F_nk_’ is expected to be close to 0. Therefore, the correlation-based function

Ψ (W)

can be formulated as Equations (10)–(12), where the diagonal matrix A(2) ∈

R

^K^×K with element A(2)_kk_’ is defined via Equation (12).

Ψ (W) = \sum_{k, k' = 1}^{K} ψ_{k k ’} (W)

(10)

ψ_{k k'} (W) = A {(1)}_{k k ’} {‖f_{: k} + A {(2)}_{k k ’} f_{: k ’}‖}_{2}^{2}

(11)

A {(2)}_{k k'} = \{\begin{matrix} - 1 & C o r (k, k ’) \geq θ \\ 1 & otherwise \end{matrix}

(12)

The correlation-based function can be revised as Equation (13), where the diagonal matrix D^(A1) ∈

R

^K^×K with element D^(A1)_kk is defined as Equation (14), where the diagonal matrix

A (3) = A (1) ⊙ A (2)

.

\begin{matrix} Ψ (W) & = \sum_{k, k ’ = 1}^{K} ψ_{k k ’} (W) \\ = \sum_{k, k ’ = 1}^{K} A {(1)}_{k k ’} {‖f_{: k} + A {(2)}_{k k ’} f_{: k ’}‖}_{F}^{2} \\ = \sum_{k, k ’ = 1}^{K} A {(1)}_{k k ’} {(f_{: k} + A {(2)}_{k k ’} f_{: k ’})}^{T} (f_{: k} + A {(2)}_{k k ’} f_{: k ’}) \\ = \sum_{k, k ’ = 1}^{K} A {(1)}_{k k ’} ({f_{: k}}^{T} f_{: k} + A {(2)}_{k k ’} {f_{: k ’}}^{T} f_{: k} + A {(2)}_{k k ’} {f_{: k ’}}^{T} f_{: k} + {(A {(2)}_{k k ’})}^{2} {f_{: k ’}}^{T} f_{: k ’}) \\ = 2 (\sum_{k = 1}^{K} {f_{: k}}^{T} f_{: k} {D^{(A 1)}}_{k k} + \sum_{k, k ’ = 1}^{K} A {(3)}_{k k ’} {f_{: k}}^{T} f_{: k ’}) \\ = 2 (Tr (F^{T} F D^{(A 1)}) + Tr (F^{T} F A (3)) \end{matrix}

(13)

{D^{(A 1)}}_{k k} = \sum_{k ’ = 1}^{K} A {(1)}_{k k ’}

(14)

The derivative of the correlation-based function with respect to the coefficient matrix is calculated according to Equation (15).

\begin{matrix} \frac{\partial Ψ (W)}{\partial W} & = 2 (\frac{Tr (F^{T} F D^{(A 1)})}{\partial W} + \frac{Tr (F^{T} F A (3))}{\partial W} \\ = 2 (2 X^{T} X W D^{(A 1)} + 2 X^{T} X W A (3)) \\ = 4 X^{T} X W (D^{(A 1)} + A (3)) \end{matrix}

(15)

3.3. Knowledge-Based Function

Some domain knowledge in copper flotation has been established. For example, if the zinc concentrate is subjected to bulk flotation, the sulfur concentrate is also subjected to bulk flotation. That can be rewritten as k = 9 and k’ = 8. If Y_n₈ = 1, F_n₉ is expected to be close to 1.

The knowledge-based function

Φ (W)

can be formulated as Equations (16) and (17). ω_nkk_’ is used to determine whether the condition for the impact of label k’ on label k is satisfied. If the condition is satisfied, ω_nkk_’ = 1; otherwise, ω_nkk_’ = 0. ν_nkk_’(W) is used to measure the gap between the predicted value and domain knowledge.

Φ (W) = \sum_{n = 1}^{N} \sum_{k, k ’ = 1}^{K} ϕ_{n k k ’} (W)

(16)

ϕ_{n k k ’} (W) = ω_{n k k ’} ν_{n k k ’} (W)

(17)

ω_nkk_’ is defined in Equations (18) and (19). The knowledge-based impacts can be divided into the following two situations: (a) when label k’ is a positive class, ω_nkk_’ = 1, and otherwise, ω_nkk_’ = 0; (b) when label k’ is a negative class, ω_nkk_’ = 1, and otherwise, ω_nkk_’ = 0. Thus, we set B(1)_kk_’ = 1 in situation (1) and set B(1)_kk_’ = −1 in situation (2). In example (1), B(1)₉₈ = 1 and

ω_{n 98} = \frac{1}{2} (1 + Y_{n 8})

. Notably, B(1)∈

R

^K^×K with element B(1)_kk_’ is defined as Equation (19).

ω_{n k k ’} = \frac{1}{2} (1 + B {(1)}_{k k ’} Y_{n k ’})

(18)

B {(1)}_{k k ’} = \{\begin{matrix} 1 & situation (a) \\ - 1 & situation (b) \end{matrix}

(19)

ν_nkk_’(W) is defined by Equations (20) and (21). If the impact renders label k positive, we set B(2)_kk_’ = −1 to render F_nk close to 1. If the impact renders label k negative, we set B(2)_kk_’ = 1 to make F_nk close to −1. In the example, B(1)₉₈ = −1 and

ν_{n 98} (W) = {(1 - F_{n 9})}^{2}

. Notably, B(2) ∈

R

^K^×K with element B(2)_kk_’ is defined as Equation (21).

ν_{n k k ’} (W) = {(1 + B {(2)}_{k k ’} F_{n k})}^{2}

(20)

B {(2)}_{k k ’} = \{\begin{matrix} - 1 & the impact make label k to be positive \\ 1 & otherwise \end{matrix}

(21)

The knowledge-based function can be expressed using Equation (22), where the diagonal matrix

B (3) = B (1) ⊙ B (2)

.

\begin{matrix} Φ (W) & = \frac{1}{2} \sum_{k, k ’ = 1}^{K} {(1^{(N * 1)} + B {(1)}_{k k ’} y_{: k ’})}^{T} ((1^{(N * 1)} + B {(2)}_{k k ’} f_{: k}) ⊙ (1^{(N * 1)} + B {(2)}_{k k ’} f_{: k})) \\ = \frac{1}{2} \sum_{k, k ’ = 1}^{K} (1^{(N * 1)} + B {(1)}_{k k ’} y_{: k ’})^{T} (1^{(N * 1)} + 2 B {(2)}_{k k ’} f_{: k} + f_{: k} ⊙ f_{: k}) \\ = \frac{1}{2} \sum_{k, k ’ = 1}^{K} (N + 2 B {(2)}_{k k ’} {f_{: k}}^{T} 1^{(N * 1)} + 1^{(1 * N)} (f_{: k} ⊙ f_{: k}) + B {(1)}_{k k ’} {y_{: k ’}}^{T} 1^{(N * 1)} \\ + 2 B {(3)}_{k k ’} {f_{: k}}^{T} y_{: k ’} + B {(1)}_{k k ’} {y_{: k ’}}^{T} (f_{: k} ⊙ f_{: k})) \\ = \frac{1}{2} (N K^{2} + Tr (Y^{T} 1^{(N * K)} B {(1)}^{T}) + 2 Tr (F^{T} 1^{(N * K)} B {(2)}^{T}) \\ + K Tr ((F ⊙ F) 1^{(K * N)}) + 2 Tr (F^{T} Y B {(3)}^{T}) + Tr ((F ⊙ F) B (1) Y^{T})) \end{matrix}

(22)

Set

G = 1^{(K * N)} (F ⊙ F) \in

R

^K^×K, and its diagonal element G_kk is calculated using Equation (23).

\frac{\partial Tr (G)}{\partial W} \in

R

^M^×K with element is calculated using Equation (24). Set a matrix H(1) ∈

R

^M^×K, and its row vector h(1)_m_: ∈

R

^1×K is defined by Equation (25). Thus, the third term in Equation (22) with respect to the coefficient matrix is obtained using Equation (26).

G_{k k} = \sum_{n = 1}^{N} {F_{n k}}^{2}

(23)

{(\frac{\partial Tr (G)}{\partial W})}_{m k} = \frac{\partial \sum_{k = 1}^{K} G_{k k}}{\partial W_{m k}} = \frac{\partial \sum_{n = 1}^{N} {X_{n m}}^{2} {W_{m k}}^{2}}{\partial W_{m k}} = 2 \sum_{n = 1}^{N} {X_{n m}}^{2} W_{m k}

(24)

h {(1)}_{m :} = \sum_{n = 1}^{N} {X_{n m}}^{2} 1^{(1 * K)}

(25)

\frac{\partial K Tr (G)}{\partial W} = 2 K H (1) ⊙ W

(26)

Set

J = B (1) Y^{T} (F ⊙ F) \in

R

^K^×K, and its diagonal element J_kk is calculated using Equation (27).

\frac{\partial Tr (J)}{\partial W} \in

R

^M^×K with element is calculated as Equation (28). Set a matrix, H(2) ∈

R

^M^×K, and its element is defined as Equation (29). Thus, the fourth term in Equation (22) with respect to the coefficient matrix is obtained using Equation (30).

J_{k k} = \sum_{n = 1}^{N} \sum_{k ’ = 1}^{K} B {(1)}_{k k ’} Y_{n k ’} {F_{n k}}^{2}

(27)

\begin{matrix} {(\frac{\partial Tr (J)}{\partial W})}_{m k} & = \frac{\partial \sum_{k = 1}^{K} J_{k k}}{\partial W_{m k}} \\ = \frac{\partial \sum_{n = 1}^{N} \sum_{k' = 1}^{K} B {(1)}_{k k'} Y_{n k ’} {X_{n m}}^{2} {W_{m k}}^{2}}{\partial W_{m k}} \\ = 2 \sum_{n = 1}^{N} \sum_{k' = 1}^{K} B {(1)}_{k k ’} Y_{n k ’} {X_{n m}}^{2} W_{m k} \end{matrix}

(28)

H {(2)}_{m k} = \sum_{n = 1}^{N} \sum_{k ’ = 1}^{K} B {(1)}_{k k ’} Y_{n k ’} {X_{n m}}^{2}

(29)

\frac{\partial Tr (J)}{\partial W} = 2 H (2) ⊙ W

(30)

The derivative of the knowledge-based function with respect to the coefficient matrix is obtained using Equation (31), and

\frac{\partial Φ (W)}{\partial W} \in

R

^M^×K.

\begin{matrix} \frac{\partial Φ (W)}{\partial W} & = \frac{\partial Tr (F^{T} 1^{(N * K)} B {(2)}^{T})}{\partial W} + \frac{\partial Tr (F^{T} Y B {(3)}^{T})}{\partial W} + \frac{1}{2} (\frac{\partial Tr (G)}{\partial W} + \frac{\partial Tr (J)}{\partial W}) \\ = X^{T} 1^{(N * K)} B {(2)}^{T} + X^{T} Y B {(3)}^{T} + (K H (1) + H (2)) ⊙ W \end{matrix}

(31)

3.4. Optimization via Accelerated Proximal Gradient

The objective function is convex but nonsmooth because of the L1 norm. We address this issue using the accelerated proximal gradient method [24], which can be expressed using Equation (32), where g(W) is a smooth part, and h(W) is a non-smooth part. g(W) and h(W) can be defined as Equations (33) and (34), respectively.

\min {o b j = g (W) + h (W)}

(32)

g (W) = \frac{1}{2} L (W) + \frac{λ_{2}}{2} F (W) + \frac{λ_{3}}{4} Ψ (W) + λ_{4} Φ (W)

(33)

h (W) = λ_{1} {‖W‖}_{1}

(34)

The derivative of the smooth part with respect to the coefficient matrix is obtained using Equation (35), and

\nabla g (W) \in

R

^M^×K.

\begin{matrix} \nabla g (W) & = \frac{1}{2} \frac{\partial L (W)}{\partial W} + \frac{λ_{2}}{2} \frac{\partial F (W)}{\partial W} + \frac{λ_{3}}{4} \frac{\partial Ψ (W)}{\partial W} + λ_{4} \frac{\partial Φ (W)}{\partial W} \\ = X^{T} X W - X^{T} Y + λ_{2} C ⊙ W + λ_{3} X^{T} X W (D^{(A 1)} + A {(3)}^{T}) \\ + λ_{4} (X^{T} 1^{(N \times K)} B {(2)}^{T} + X^{T} Y B {(3)}^{T} + (K H (1) + H (2) ⊙ W) \end{matrix}

(35)

The further Lipschitz continuity of the smooth part is proven using Equation (36), where

Δ W = W_{1} - W_{2}

.

\begin{matrix} {‖\nabla g (W_{1}) - \nabla g (W_{2})‖}_{F}^{2} \\ = {‖\begin{array}{l} X^{T} X Δ W + λ_{2} C ⊙ Δ W + λ_{3} X^{T} X Δ W (D^{(A 1)} + A {(3)}^{T}) \\ + λ_{4} (K H (1) + H (2)) ⊙ Δ W) \end{array}‖}_{F}^{2} \\ \leq {‖Δ W‖}_{F}^{2} ({‖X^{T} X‖}_{F}^{2} + λ_{2} \max {{C_{m k}}^{2}} + λ_{3} {‖X^{T} X‖}_{F}^{2} {‖D^{(A 1)} + A {(3)}^{T}‖}_{F}^{2} \\ + λ_{4} (\max {K H {(1)}_{m k}^{2}} + \max {H {(2)}_{m k}^{2}}) \end{matrix}

(36)

Therefore, the Lipschitz constant L_f is obtained via Equation (37).

L_{f} = \sqrt{\begin{array}{l} {‖X^{T} X‖}_{F}^{2} + λ_{2} \max {{C_{m k}}^{2}} + λ_{3} {‖X^{T} X‖}_{F}^{2} {‖D^{(A 1)} + A (2) A {(3)}^{T}‖}_{F}^{2} \\ + λ_{4} (\max {K H {(1)}_{m k}^{2}} + \max {H {(2)}_{m k}^{2}}) \end{array}}

(37)

In the accelerated proximal gradient method, the proximal gradient is iterated as shown in Equations (38) and (39). The proximal operator associated with h(W) =

λ_{1} {‖W‖}_{1}

is the soft-thresholding operator as expressed in Equation (40).

W^{(t)} = W_{t} + \frac{b_{t - 1} - 1}{b_{t}} (W_{t} - W_{t - 1})

(38)

W_{t + 1} = p r o x_{L_{f}, h} (W^{(t)} - \frac{1}{L_{f}} \nabla g (W^{(t)}))

(39)

{(p r o x_{L_{f}, h} (W))}_{m k} = \{\begin{array}{l} W_{m k} - \frac{λ_{1}}{L_{f}} & W_{m k} \geq \frac{λ_{1}}{L_{f}} \\ 0 & |W_{m k}| \leq \frac{λ_{1}}{L_{f}} \\ W_{m k} + \frac{λ_{1}}{L_{f}} & W_{m k} \leq - \frac{λ_{1}}{L_{f}} \end{array}

(40)

Algorithm 1 provides the optimization process of the proposed CK-JFS.

Algorithm 1 Optimization process of CK-JFS

Input: training set{X, Y}, hyperparameter λ₁, λ₂, λ₃, λ₄, θ, η.
Initialization:

t \leftarrow 1

b_{0}, b_{1} \leftarrow 1

W_{0}, W_{1} \leftarrow {(X^{T} X + η I)}^{- 1} X^{T} Y

calculate the matrix C as Equation (7)
calculate the diagonal matrix A(1) as Equation (8)
calculate the diagonal matrix A(2) as Equation (12)
calculate the diagonal matrix D^(A1) as Equation (14)
calculate the matrix B(1) as Equation (19)
calculate the matrix B(2) as Equation (21)
calculate the matrix

\nabla g (W)

as Equation (35)
repeat
update W_t as Equation (39)

b_{t + 1} \leftarrow \frac{1 + \sqrt{4 b_{t - 1}^{2} + 1}}{2}

t \leftarrow t + 1

until stop criterion reached

W \leftarrow W_{t}

Output: W

In addition to being a multilabel feature selection algorithm, CK-JFS can be used as a multilabel classifier. The trained coefficient matrix W can be used directly for multilabel classification. For a new copper flotation sample, i, if F_ik = x_i_:w_:k exceeds the classification threshold of label k, the prediction value Pre_ik = 1; otherwise, Pre_ik = −1.

4. Experiments

Owing to the small number of samples in the copper flotation database, leave-one-out cross-validation was adopted. Each sample in the database was used as the only test sample in turn, and the other samples were used as the training set. The number of training and testing sets was equal to the number of samples in the database. Because the knowledge-based function was set according to the domain knowledge of copper flotation, the experiments used only the copper flotation database.

4.1. Label-Specific Feature Selection

In this section, the area under the curve (AUC), which is the area under the receiver operating characteristic curve (ROC), was employed as the label-specific feature selection metric. The ROC is a probability curve that presents the relationships between the true positive rates and the false positive rate with different classification thresholds. A large AUC implies high classifier performance.

The label-specific features of each label were selected in parallel to Algorithm 1. For label k,

|W_{m k}|

represents the importance of feature m to label k. Hence, all elements in the coefficient vector w_:k are sorted according to their absolute values, and the features with higher importance are preferred. As the number of features increases, the AUC first increases and then decreases because extremely few features will lead to insufficient useful information, and extremely many features will lead to redundancy. The number with the maximum AUC is considered the optimal feature number of label k, and the corresponding features are considered label-specific features of label k. The results of the different feature numbers for each label are shown in Figure 2. The selection results of the label-specific features are presented in Table 2. The number in the third column represents the serial number m corresponding to the feature, and the front feature is significantly important to the label.

As presented in Table 2, the union of the label-specific features is [1, 2, 3, 4, 5, 7, 8, 9, 11, 12, 13, 14, 16, 17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 36, 37, 42, 43, 45, 46, 55, 56, 57, 59, 60, 63, 66, 69, 72, 74, 75, 77, 78, 81]. The number of union features is 44, 41 less than the number of original features. For a new copper flotation sample, only 44 natural properties must be detected and input into the trained multilabel classifier, reducing manpower and saving financial resources consumed in the detection.

4.2. Benchmark Multilabel Feature Selection Algorithm

To evaluate the feature selection effect of the proposed CK-JFS on the copper flotation backbone flowchart recommendation, we compared it with the following multilabel feature selection algorithms. Except for the F-score, the others are ablation experiments of CK-JFS.

AllFea: the label-specific features of each label are original features without any selection.

F-score [25]: for each label, a high F-score indicates that the feature exhibits a significant difference between positive samples and negative samples and a small difference between positive samples or negative samples.

LLSF [20]: by incorporating label correlations into the objective function of a linear classifier, two weakly correlated labels cannot share similar label-specific features.

JFSC [21]: a Fisher discriminant is introduced based on LLSF to realize a large inter-class distance between positive and negative samples and a small intra-class distance for the positive and negative samples.

C-JFS: The correlation-based function is introduced into the objective function. Unlike CK-JFS, C-JFS has no knowledge-based function.

K-JFS: The knowledge-based function is introduced into the objective function. Unlike CK-JFS, K-JFS has no correlation-based function.

4.3. Experimental Result

The multilabel feature selection algorithm performances of five evaluation metrics are displayed in Table 3, Table 4, Table 5, Table 6 and Table 7, where the columns indicate the different multilabel feature selection algorithms, and the rows indicate the different classifiers and decision tasks. The classifiers include multilabel k-nearest neighbor (MLKNN) [26], multilabel support vector machine (MLSVM) [27], CK-JFS, classification tree (CT) [28], and extreme gradient boosting (XGBoost) [29]. The latter two classifiers are transformed into multilabel classifiers using a classifier chain (CC) [30]. We rank the performance of the algorithms for each metric and highlight the best performance in a bold font.

Using 75 cases (five metrics × five classifiers × three tasks), as presented in Table 3, Table 4, Table 5, Table 6 and Table 7, the following conclusions are drawn.

(1): AllFea performs best in 1 case and worst in 51 cases. This is because dimension disasters reduce the performance of the classifier, indicating the necessity of feature selection.
(2): F-score performs best in 1 case and worst in 12 cases. Although the F-score reduces the feature dimensions, it does not use the correlation between labels.
(3): LLSF, JFSC, and C-JFS perform best in two, five, and eight cases, respectively, because they use label correlations; however, they do not use domain knowledge.
(4): K-JFS performs best in 17 cases because it uses domain knowledge, but it does not use label correlation.
(5): The proposed CK-JFS achieves the best performance in 41 cases and is ranked second in 20 cases. This is because the algorithm not only utilizes domain knowledge but also utilizes label correlation. The fusion of data and domain knowledge notably mitigates the reliance on label-specific feature selection on the quantity of training samples, enabling the proposed method to achieve superior performance under small-sample conditions.
(6): Thus, the proposed CK-JFS outperforms the benchmark multilabel feature selection algorithms.

4.4. Hypothesis Test

The Bonferroni–Dunn test [31] and t-test [32] are used in this paper to analyze the performance of the benchmark algorithms.

4.4.1. Bonferroni–Dunn Test

To determine the superiority of the proposed CK-JFS, the Bonferroni–Dunn test is executed. The critical distance (CD) is defined by Equation (41). We can obtain q_α = 2.394 and CD_α = 1.888 at the significance level α = 0.1.

C D_{α} = q_{α} \sqrt{\frac{G (G + 1)}{6 D}}

(41)

Using the proposed CK-JFS as the control algorithm, the CD diagrams for each metric are obtained, as in Figure 3. In each sub figure, the average ranking of each algorithm is indicated along the axis, and the red line in represents CD_α.

Using Figure 1, the following conclusions are drawn:

(1): The average rankings of the proposed CK-JFS on all metrics are marked on the rightmost side of the axis, indicating that the proposed CK-JFS achieves superior performance over the entire metric to those of the benchmark algorithms.
(2): CK-JFS achieves statistically superior performance to that of the benchmark algorithms AllFea, F-score, and LLSF.
(3): CK-XGBoost does not achieve superior performance to that of JFSC in recall rates, but it achieves superior performance to that of JFSC in the other four metrics.
(4): CK-JFS achieves a superior performance to that of C-JFS in terms of the Hamming loss, but it does not achieve superior performance to that of C-JFS in the other four metrics.
(5): The statistical differences between CK-JFS and K-JFS cannot be proven for all metrics.

4.4.2. T-Test

A paired t-test was adopted for algorithms that were not proven to be significantly different from CK-JFS in the Bonferroni–Dunn test. The original hypothesis of the paired t-test is that “two algorithms have the equal performance”. We calculated τ_t using Equation (42), where μ and σ are the mean and the standard deviations, respectively, of the differences in the metrics between the proposed CK-JFS and a benchmark algorithm.

τ_{t} = |\frac{\sqrt{D} μ}{σ}|

(42)

The results of the paired t-test between the proposed CK-JFS and the benchmark algorithms are present in Table 8, where the blank space indicates that the algorithm has been proven to be significantly different from CK-JFS for the metric in the Bonferroni–Dunn test. Suppose a significance level of α = 0.1. If the p-value < 0.1, the original hypothesis is rejected.

Table 8 indicates that, except for K-JFS with the recall rate (bold significant), the other algorithms are significantly different from CK-JFS.

5. Conclusions

This study has presented a novel approach to addressing the challenge of multilabel feature selection in copper flotation backbone process design. The proposed correlation and knowledge-based joint feature selection (CK-JFS) method integrates two critical components: the correlation-based function that prioritizes features associated with strongly related labels and the knowledge-based function that incorporates domain-specific insights for flotation optimization. The results demonstrate the effectiveness of CK-JFS in enhancing feature selection processes within the context of flotation design.

From a theoretical perspective, the proposed method overcomes the challenge of insufficient learning for small-sample cases. The integration of data and knowledge serves to alleviate the dependence of label-specific feature selection on the volume of training samples. Furthermore, it establishes a novel framework for enhancing the efficacy of small-sample machine learning. From an applied perspective, the proposed method contributes to the broader field of artificial intelligence applications in mineral processing, offering a promising framework for improving operational efficiency and decision-making in complex industrial systems.

However, this study only applies to multilabel classification and not to multilabel regression. Overcoming this limitation will be the subject of our future work.

Author Contributions

Conceptualization, F.W.; methodology, F.W.; software, H.D.; validation, H.D.; formal analysis, H.D.; investigation, F.W.; resources, D.H.; data curation, D.H.; writing—original draft preparation, H.D.; writing—review and editing, H.D.; visualization, F.W.; supervision, Y.L.; project administration, D.H.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Technologies Research and Development Program of China (2021YFF0602404; 2021YFC2902703), the National Natural Science Foundation of China (Grant No. 62073060; 61973057; 62173078), and the Innovative Research Group Project of the National Natural Science Foundation of China (Grant No. 61621004), for which the authors express their appreciation.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Feng, Q.; Zhang, Y.; Zhang, G.; Han, G.; Zhao, W. A novel sulfidization system for enhancing hemimorphite flotation through Cu/Pb binary metal ions. Int. J. Min. Sci. Technol. 2024, 34, 1741–1752. [Google Scholar] [CrossRef]
Shen, Q.; Wen, S.; Hao, J.; Feng, Q. Flotation separation of chalcopyrite from pyrite using mineral fulvic acid as selective depressant under weakly alkaline conditions. Trans. Nonferrous Met. Soc. China 2025, 35, 313–325. [Google Scholar] [CrossRef]
Dong, H.; Wang, F.; He, D.; Liu, Y. The intelligent decision-making of copper flotation backbone process based on CK-XGBoost. Knowl.-Based Syst. 2022, 243, 108429. [Google Scholar] [CrossRef]
Dong, H.; Wang, F.; He, D.; Liu, Y. Decision system for copper flotation backbone process. Eng. Appl. Artif. Intell. 2023, 123, 106410. [Google Scholar] [CrossRef]
Siblini, W.; Kuntz, P.; Meyer, F. A Review on Dimensionality Reduction for Multi-label Classification. IEEE Trans. Knowl. Data Eng. 2019, 33, 839–857. [Google Scholar] [CrossRef]
Dong, H.; Sun, J.; Sun, X.; Ding, R. A many-objective feature selection for multi-label classification. Knowl.-Based Syst. 2020, 208, 106456. [Google Scholar] [CrossRef]
Liu, J.; Li, Y.; Weng, W.; Zhang, J.; Chen, B.; Wu, S. Feature selection for multi-label learning with streaming label. Neurocomputing 2020, 387, 268–278. [Google Scholar] [CrossRef]
Liu, J.; Lin, Y.; Li, Y.; Weng, W.; Wu, S. Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit. 2018, 84, 273–287. [Google Scholar] [CrossRef]
Huang, J.; Li, G.; Huang, Q.; Wu, X. Learning label specific Features and Class-Dependent Labels for Multi-Label Classification. IEEE Trans. Knowl. Data Eng. 2016, 28, 3309–3323. [Google Scholar] [CrossRef]
Zhao, Z.; Morstatter, F.; Sharma, S.; Alelyani, S.; Anand, A.; Liu, H. Advancing Feature Selection Research–ASU Feature Selection Repository; Technical Report; Arizona State University: Tempe, AZ, USA, 2011. [Google Scholar]
Yu, K.; Yu, S.-P.; Tresp, V. Multi-label informed latent semantic indexing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 15–19 August 2005; pp. 258–265. [Google Scholar]
Ji, S.-W.; Tang, L.; Yu, S.-P.; Ye, J.-P. Extracting shared subspace for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 381–389. [Google Scholar]
Zhang, M.-L.; Pe, J.M.; Robles, V. Feature selection for multilabel naive bayes classification. Inf. Sci. 2009, 179, 3218–3229. [Google Scholar] [CrossRef]
Kong, X.-N.; Yu, P.S. Multi-label feature selection for graph classification. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW), Sydney, Australia, 13 December 2010; pp. 274–283. [Google Scholar]
Zhang, Y.; Zhou, Z.-H. Multilabel dimensionality reduction via dependence maximization. Knowl. Discov. Data ACM 2010, 4, 14. [Google Scholar] [CrossRef]
Nie, F.; Huang, H.; Ding, C. Efficient and Robust Feature Selection via Joint L2,1-Norms Minimization. In Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A., Eds.; Curran Associates, Inc.: San Francisco, CA, USA, 2010; pp. 1813–1821. [Google Scholar]
Jalali, A.; Sanghavi, S.; Ruan, C.; Ravikumar, P.K. A Dirty Model for Multi-Task Learning. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 964–972. [Google Scholar]
Kim, S.; Sohn, K.A.; Xing, E.P. A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 2009, 25, 204–212. [Google Scholar]
Pan, S.; Wu, J.; Zhu, X.; Long, G.; Zhang, C. Task sensitive feature exploration and learning for multitask graph classification. IEEE Trans. Cybern 2016, 47, 744–758. [Google Scholar]
Huang, J.; Li, G.-R.; Huang, Q.-M.; Wu, X.-D. Learning label specific features for multi-label classification. In Proceedings of the IEEE International Conference on Data Mining, Atlantic City, NJ, USA, 14–17 November 2015; pp. 181–190. [Google Scholar]
Huang, J.; Li, G.-R.; Huang, Q.-M.; Wu, X.-D. Joint Feature Selection and Classification for Multilabel Learning. IEEE Trans. Cybern. 2018, 48, 876–889. [Google Scholar] [CrossRef]
Zhang, J.; Luo, Z.; Li, C.; Zhou, C.; Li, S. Manifold regularized discriminative feature selection for multi-label learning. Pattern Recognit. 2019, 95, 136–150. [Google Scholar]
Li, F.; Miao, D.; Pedrycz, W. Granular multi-label feature selection based on mutual information. Pattern Recognit. 2017, 67, 410–423. [Google Scholar]
Lin, Z.-C.; Ganesh, A.; Wright, J.; Wu, L.-Q.; Chen, M.-M.; Ma, Y. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix; UIUC Technical Report UILU-ENG-09-2214; University of Illinois at Urbana-Champaign: Urbana, IL, USA, 2009. [Google Scholar]
Duda, R.; Hart, P.; Stork, D. Pattern Classification, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2001. [Google Scholar]
Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi label learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar] [CrossRef]
Elisseeff, A.; Weston, J. A kernel method for multi labelled classification. Adv. Neural Inf. Process. Syst. 2002, 14, 681–687. [Google Scholar]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees (CART). Biometrics 1984, 40, 358. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier chains for multi label classification. Mach. Learn. 2011, 85, 333–359. [Google Scholar] [CrossRef]
Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Dunn, O.J. Multiple Comparisons among Means. J. Am. Stat. Assoc. 1961, 56, 52–64. [Google Scholar]

Figure 1. The position of this study in the intelligent design of copper flotation backbone processes.

Figure 2. Results of feature selection.

Figure 3. Comparison of CK-JFS against the benchmark algorithms with the Bonferroni–Dunn test.

Table 1. Original features.

Feature Grouping		m	Feature Meanings	m	Feature Meanings	m	Feature Meanings
Chemical composition content		1	Mo	2	Cu	3	Pb
		4	Zn	5	S	6	Co
		7	Fe	8	Au	9	Ag
		10	Mn	11	SiO₂	12	Al₂O₃
		13	CaO	14	MgO	15	K₂O
		16	Na₂O	17	C	--	--
Chemical phase analysis		18	Molybdenum oxide content	19	Molybdenum sulfide content	20	Free copper oxide content
		21	Bound copper oxide content	22	Total copper oxide content	23	Primary copper sulfide content
		24	Secondary copper sulfide content	25	Total copper sulfide content	26	Lead oxide content
		27	Lead sulfide content	28	Zinc oxide content	29	Zinc sulfide content
Ore content		30	Chalcopyrite	31	Bornite	32	Chalcocite
		33	Malachite	34	Cuprite	35	Tetrahedrite
		36	Pyrite and marcasite	37	Hematite	38	Magnetite
		39	Pyrrhotite	40	Phosphorite	41	Ilmenite
		42	Galena	43	Sphalerite	44	Molybdenite
Gangue content		45	Quartz	46	Muscovite	47	Biotite
		48	Feldspar	49	Talc	50	Calcite
		51	Chlorite	52	Apatite	53	Garnet
		54	Kaolinite	55	All gangue contents	--	--
Symbiosis relationship		56	Molybdenum ore	57	Copper ore	58	Lead ore
Symbiosis relationship		59	Zinc ore	60	Sulfur ore	61	Iron ore
Particle size	Upper limit	62	Molybdenum ore	63	Copper ore	64	Lead ore
	Upper limit	65	Zinc ore	66	Sulfur ore	67	Iron ore
	Lower limit	68	Molybdenum ore	69	Copper ore	70	Lead ore
	Lower limit	71	Zinc ore	72	Sulfur ore	73	Iron ore
	Proportion of +0.074 mm	74	Molybdenum ore	75	Copper ore	76	Lead ore
	Proportion of +0.074 mm	77	Zinc ore	78	Sulfur ore	79	Iron ore
	Proportion of −0.074 mm	80	Molybdenum ore	81	Copper ore	82	Lead ore
	Proportion of −0.074 mm	83	Zinc ore	84	Sulfur ore	85	Iron ore

Table 2. Label-specific features of each label.

k	Optimal Feature Number	Label-Specific Features
1	17	[2, 1, 56, 4, 3, 11, 21, 66, 55, 16, 25, 5, 57, 69, 8, 9, 17]
2	5	[3, 2, 26, 5, 22]
3	18	[4, 2, 60, 9, 8, 57, 28, 59, 69, 11, 22, 12, 5, 42, 29, 20, 36, 55]
4	5	[5, 3, 75, 22, 2]
5	12	[5, 3, 13, 25, 24, 78, 45, 8, 2, 75, 72, 1]
6	8	[2, 11, 5, 1, 56, 3, 21, 16]
7	6	[3, 4, 26, 25, 20, 77]
8	9	[5, 2, 77, 59, 29, 3, 20, 36, 69]
9	16	[5, 9, 12, 3, 24, 36, 2, 75, 81, 69, 22, 13, 37, 59, 46, 29]
10	13	[3, 12, 2, 14, 17, 5, 9, 43, 7, 22, 36, 13, 24]
11	13	[2, 74, 5, 16, 12, 25, 9, 14, 55, 1, 3, 72, 8]
12	5	[3, 27, 28, 29, 13]
13	11	[60, 5, 28, 2, 21, 13, 4, 8, 12, 55, 14]
14	11	[5, 12, 9, 4, 25, 3, 24, 2, 20, 75, 63]
15	16	[5, 1, 8, 4, 2, 23, 12, 22, 36, 11, 75, 3, 55, 21, 14, 7]

Table 3. Hamming loss (↓) of seven multilabel feature selection algorithms.

Classifier	DECISION TASK	AllFea	F-Score	LLSF	JFSC	C-JFS	K-JFS	CK-JFS
MLKNN	Product scheme	0.1721	0.1250	0.1042	0.0932	0.1151	0.0877	0.0998
	Flotation scheme	0.3648	0.3092	0.2719	0.2463	0.2215	0.2113	0.2251
	Grinding scheme	0.3675	0.3439	0.3298	0.2368	0.2237	0.2360	0.2123
MLSVM	Product scheme	0.1228	0.1140	0.1162	0.1075	0.1096	0.1009	0.0921
	Flotation scheme	0.3136	0.2734	0.2478	0.2332	0.2427	0.2251	0.2018
	Grinding scheme	0.3614	0.3404	0.2719	0.2342	0.2368	0.2219	0.2026
CK-JFS	Product scheme	0.1151	0.1228	0.1020	0.1217	0.1031	0.0932	0.0899
	Flotation scheme	0.2975	0.2610	0.2558	0.2434	0.2244	0.2135	0.1981
	Grinding scheme	0.3596	0.3447	0.2623	0.2360	0.2272	0.2272	0.1974
CC-CT	Product scheme	0.1173	0.1184	0.0844	0.0943	0.0976	0.0746	0.0910
	Flotation scheme	0.2617	0.2259	0.2222	0.2230	0.2003	0.2230	0.2010
	Grinding scheme	0.3114	0.2465	0.2333	0.2289	0.2061	0.1904	0.1781
CC-XGBoost	Product scheme	0.1020	0.0987	0.1009	0.0910	0.0866	0.0822	0.0855
	Flotation scheme	0.2018	0.2083	0.2003	0.1981	0.2105	0.1879	0.1601
	Grinding scheme	0.2132	0.2298	0.1956	0.1702	0.1772	0.1456	0.1649
Average ranking		6.600	6.067	4.533	3.667	3.600	1.867	1.600

Table 4. Ranking loss (↓) of seven multilabel feature selection algorithms.

Classifier	Decision Task	AllFea	F-Score	LLSF	JFSC	C-JFS	K-JFS	CK-JFS
MLKNN	Product scheme	0.1179	0.0736	0.0767	0.0712	0.0731	0.0677	0.0655
	Flotation scheme	0.2682	0.2080	0.2013	0.2056	0.1973	0.2027	0.1770
	Grinding scheme	0.2523	0.2174	0.1966	0.2078	0.1998	0.1814	0.1720
MLSVM	Product scheme	0.0769	0.0778	0.0732	0.0596	0.0667	0.0679	0.0627
	Flotation scheme	0.2317	0.2037	0.2068	0.2007	0.1897	0.1921	0.1640
	Grinding scheme	0.2191	0.2012	0.2099	0.2094	0.1887	0.1787	0.1660
CK-JFS	Product scheme	0.0756	0.0714	0.0749	0.0676	0.0691	0.0694	0.0637
	Flotation scheme	0.2199	0.2085	0.2010	0.1965	0.1962	0.1836	0.1679
	Grinding scheme	0.2156	0.1959	0.1996	0.2009	0.1839	0.1739	0.1503
CC-CT	Product scheme	0.0727	0.0716	0.0596	0.0641	0.0688	0.0694	0.0587
	Flotation scheme	0.1989	0.1985	0.1956	0.1859	0.1757	0.1605	0.1539
	Grinding scheme	0.2061	0.1879	0.1860	0.1595	0.1550	0.1543	0.1501
CC-XGBoost	Product scheme	0.0693	0.0656	0.0571	0.0592	0.0535	0.0654	0.0577
	Flotation scheme	0.1929	0.1852	0.1578	0.1575	0.1471	0.1505	0.1420
	Grinding scheme	0.1717	0.1652	0.1593	0.1619	0.1474	0.1547	0.1449
Average ranking		6.933	5.600	4.533	3.933	2.800	3.000	1.200

Table 5. Average precision (↑) of seven multilabel feature selection algorithms.

Classifier	Decision Task	AllFea	F-Score	LLSF	JFSC	C-JFS	K-JFS	CK-JFS
MLKNN	Product scheme	0.7525	0.8079	0.8369	0.8180	0.8331	0.8486	0.8417
	Flotation scheme	0.4244	0.4355	0.4667	0.4423	0.4422	0.4454	0.4915
	Grinding scheme	0.3476	0.4322	0.4595	0.4602	0.4582	0.4851	0.5064
MLSVM	Product scheme	0.7994	0.7936	0.8282	0.8593	0.8413	0.8260	0.8480
	Flotation scheme	0.4042	0.4354	0.4290	0.4400	0.4946	0.4625	0.4924
	Grinding scheme	0.4355	0.4490	0.4361	0.4615	0.4781	0.4888	0.4973
CK-JFS	Product scheme	0.8036	0.8085	0.8376	0.8482	0.8234	0.8229	0.8460
	Flotation scheme	0.4360	0.4288	0.4390	0.4453	0.4848	0.5005	0.4925
	Grinding scheme	0.4310	0.4698	0.4665	0.4733	0.4795	0.4922	0.5370
CC-CT	Product scheme	0.8398	0.8354	0.8458	0.8345	0.8269	0.8332	0.8497
	Flotation scheme	0.4438	0.4421	0.4436	0.4614	0.4843	0.5347	0.5499
	Grinding scheme	0.4410	0.4620	0.4922	0.5019	0.5342	0.5303	0.5329
CC-XGBoost	Product scheme	0.8463	0.8525	0.8715	0.8626	0.8716	0.8373	0.8634
	Flotation scheme	0.4653	0.5100	0.5098	0.5662	0.5291	0.5799	0.5522
	Grinding scheme	0.4952	0.5167	0.4962	0.5000	0.5453	0.5463	0.5712
Average ranking		6.400	5.600	4.400	3.600	3.333	3.000	1.667

Table 6. Precision rate (↑) of seven multilabel feature selection algorithms.

Classifier	Decision Task	AllFea	F-Score	LLSF	JFSC	C-JFS	K-JFS	CK-JFS
MLKNN	Product scheme	0.9575	0.9546	0.9467	0.9420	0.9561	0.9406	0.9496
	Flotation scheme	0.9317	0.9416	0.9375	0.9372	0.9311	0.9320	0.9442
	Grinding scheme	0.8907	0.9041	0.9127	0.9126	0.9096	0.9245	0.9213
MLSVM	Product scheme	0.9467	0.9475	0.9523	0.9436	0.9530	0.9527	0.9500
	Flotation scheme	0.9334	0.9320	0.9331	0.9316	0.9407	0.9443	0.9395
	Grinding scheme	0.9130	0.9094	0.9127	0.9133	0.9208	0.9177	0.9335
CK-JFS	Product scheme	0.9462	0.9538	0.9440	0.9567	0.9524	0.9497	0.9516
	Flotation scheme	0.9335	0.9312	0.9373	0.9386	0.9360	0.9373	0.9424
	Grinding scheme	0.9079	0.9161	0.9164	0.9134	0.9148	0.9356	0.9245
CC-CT	Product scheme	0.9529	0.9585	0.9378	0.9495	0.9544	0.9466	0.9600
	Flotation scheme	0.9331	0.9386	0.9411	0.9436	0.9361	0.9563	0.9530
	Grinding scheme	0.9188	0.9251	0.9227	0.9220	0.9227	0.9252	0.9313
CC-XGBoost	Product scheme	0.9539	0.9530	0.9585	0.9557	0.9536	0.9568	0.9593
	Flotation scheme	0.9310	0.9374	0.9439	0.9449	0.9557	0.9535	0.9543
	Grinding scheme	0.9121	0.9367	0.9293	0.9243	0.9293	0.9189	0.9338
Average ranking		5.667	4.400	4.333	4.533	3.667	3.333	2.067

Table 7. Recall rate (↑) of nine multilabel feature selection algorithms.

Classifier	Decision Task	AllFea	F-Score	LLSF	JFSC	C-JFS	K-JFS	CK-JFS
MLKNN	Product scheme	0.7907	0.8688	0.9012	0.9290	0.8785	0.9398	0.9163
	Flotation scheme	0.6236	0.6792	0.7240	0.7538	0.7922	0.8048	0.7749
	Grinding scheme	0.6770	0.6836	0.6937	0.7649	0.7957	0.7541	0.8034
MLSVM	Product scheme	0.8902	0.8997	0.8832	0.9203	0.8918	0.8991	0.9172
	Flotation scheme	0.6793	0.7330	0.7616	0.7812	0.7620	0.7657	0.8100
	Grinding scheme	0.6315	0.6860	0.7241	0.7655	0.7598	0.7875	0.7750
CK-JFS	Product scheme	0.8997	0.8740	0.9188	0.8723	0.8961	0.9155	0.9185
	Flotation scheme	0.6994	0.7473	0.7454	0.7562	0.7838	0.7968	0.8088
	Grinding scheme	0.6617	0.6735	0.7346	0.7703	0.7888	0.7395	0.8139
CC-CT	Product scheme	0.8785	0.8700	0.9526	0.9138	0.9014	0.9512	0.9049
	Flotation scheme	0.7476	0.7701	0.7734	0.7712	0.8156	0.7562	0.7880
	Grinding scheme	0.6765	0.7411	0.7647	0.7837	0.8092	0.8192	0.8271
CC-XGBoost	Product scheme	0.8949	0.9022	0.8998	0.9126	0.9224	0.9238	0.9164
	Flotation scheme	0.8182	0.8018	0.8040	0.8048	0.7729	0.8053	0.8342
	Grinding scheme	0.8167	0.7534	0.8010	0.8374	0.8249	0.8813	0.8386
Average ranking		6.200	5.733	4.600	3.400	3.600	2.533	1.933

Table 8. The paired t-test results of CK-JFS compared with benchmark algorithms.

		JFSC	C-JFS	K-JFS
Hamming loss	t_t			1.8156
Hamming loss	p-value			0.0909
Ranking loss	t_t		4.7252	5.6296
Ranking loss	p-value		0.0003	0.0001
Average precision	t_t		3.9182	3.1113
Average precision	p-value		0.0015	0.0077
Precision rate	t_t		3.0950	1.8145
Precision rate	p-value		0.0079	0.0911
Recall rate	t_t	3.7129	2.8005	0.7939
Recall rate	p-value	0.0023	0.0142	0.4405

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, H.; Wang, F.; He, D.; Liu, Y. Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design. Minerals 2025, 15, 353. https://doi.org/10.3390/min15040353

AMA Style

Dong H, Wang F, He D, Liu Y. Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design. Minerals. 2025; 15(4):353. https://doi.org/10.3390/min15040353

Chicago/Turabian Style

Dong, Haipei, Fuli Wang, Dakuo He, and Yan Liu. 2025. "Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design" Minerals 15, no. 4: 353. https://doi.org/10.3390/min15040353

APA Style

Dong, H., Wang, F., He, D., & Liu, Y. (2025). Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design. Minerals, 15(4), 353. https://doi.org/10.3390/min15040353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Correlation and Knowledge-Based Joint Feature Selection for Copper Flotation Backbone Process Design

Abstract

1. Introduction

2. Preliminaries

2.1. Notations

2.2. Sublabels

2.3. Original Features

3. Proposed Approach

3.1. Objective Function

3.1.1. Non-Unique Terms

3.1.2. Unique Terms

3.2. Correlation-Based Function

3.3. Knowledge-Based Function

3.4. Optimization via Accelerated Proximal Gradient

4. Experiments

4.1. Label-Specific Feature Selection

4.2. Benchmark Multilabel Feature Selection Algorithm

4.3. Experimental Result

4.4. Hypothesis Test

4.4.1. Bonferroni–Dunn Test

4.4.2. T-Test

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI