Next Article in Journal
Order Lot Sizing: Insights from Lattice Gas-Type Model
Previous Article in Journal
Entropy Accumulation Under Post-Quantum Cryptographic Assumptions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Ranking on Small Samples: A Bayes-Based Approach

School of Translational Information Technologies, ITMO University, 197101 St. Petersburg, Russia
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(8), 773; https://doi.org/10.3390/e27080773
Submission received: 14 May 2025 / Revised: 4 July 2025 / Accepted: 11 July 2025 / Published: 22 July 2025
(This article belongs to the Section Multidisciplinary Applications)

Abstract

In the modern world, there is a need to provide a better understanding of the importance or relevance of the available descriptive features for predicting target attributes to solve the feature ranking problem. Among the published works, the vast majority are devoted to the problems of feature selection and extraction, and not the problems of their ranking. In this paper, we propose a novel method based on the Bayesian approach that allows us to not only to build a methodically justified way of ranking features on small datasets, but also to methodically solve the problem of benchmarking the results obtained by various ranking algorithms. The proposed method is also model-free, since no restrictions are imposed on the model. We carry out an experimental comparison of our proposed method with the classical frequency method. For this, we use two synthetic datasets and two public medical datasets. As a result, we show that the proposed ranking method has a high level of self-consistency (stability) already at the level of 50 samples, which is greatly improved compared to classical logistic regression and SHAP ranking. All the experiments performed confirm our theoretical conclusions: with the growth of the sample, an increasing trend of mutual consistency is observed, and our method demonstrates at least comparable results, and often results superior to other methods in the values of self-consistency and monotonicity. The proposed method can be applied to a wide class of rankings of influence factors on small samples, including industrial tasks, forensics, psychology, etc.

1. Introduction

Today, people have to act in an uncertain environment and cannot be fully aware of each situation, let alone control it. Therefore, they need predictive modeling for a better assessment of the available descriptive attributes to predict target attributes, that is, to solve the feature ranking problem [1]. The feature ranking (FR) algorithm forms a list (also called feature ranking) of descriptive attributes ordered by their importance (relevance) in terms of the target attribute(s).
Although the FR problem has been reported for decades [2], it has been constantly expanding to new areas, including geological exploration [3,4], assessing safety risks and staff well-being in industry [5,6], organizing supply chains [7], protecting the network infrastructure of an enterprise [8,9], compensating for environmental damage caused by a particular enterprise [10,11], and the management of patients with comorbidities [12]. Here, it is the small size of the available dataset that becomes a critical limitation.
The FR task implies an adequate choice of the most effective statistical evaluation algorithm. Various criteria have been suggested for this choice, commonly related to the filter, embedding, and wrapper approaches.
Filter methods are data-driven; that is, the criteria for ranking attributes are based on the properties of the data selected by some external axiomatics or heuristics rather than on the results of a model. Filter methods would normally be applicable to large datasets. In particular, ANOVA, the Mann–Whitney test, the R2 test, and similar methods of descriptive statistics are widely used in medical statistics [13,14,15]. In practice, the well-known trade-off between the stability and the accuracy of results in medical applications of descriptive statistics is generally resolved in favor of accuracy. This stipulates using only large samples (cohorts) of patients for analysis. For example, Ref. [15] used a sample of 78,000 patients to rank features. The methods of the Feature Screening group [16] use heuristics aimed at processing ultra-high-dimensional datasets, which is beyond the scope of our work. Thus, Deep Feature Screening [17] uses a neural network for this processing. In terms of processing small samples, the method based on Vendi Score Importance [18] is worth discussing. It groups features by their level of influence on the target class, with subsequent selection of the most effective features. However, as the dataset size decreases, the grouping may become less informative. In addition, with a larger set of features, this method may experience “the curse of dimensionality” and result in insufficient and unstable separation of features by importance. The method described in [19] uses a permutation test, one of the few types of tests in classical statistics that can be adequately applied to small samples. On the other hand, the above tests can be overly conservative; that is, they might be biased towards the irrelevance of features.
The construction of filter methods also uses indicators based on information theory [20,21,22], including Information Gain (IG) [23], mutual information (MI) [24], and the Maximal Information Coefficient (MIC) [25]. To some extent, they exploit the idea of estimating the difference between the probability distributions p of features x = {xi}, i = 1, …d, for different classes x = {yk}, k = 1, …l, requiring no training of classifiers. However, the distribution is estimated directly from the labeled training dataset, which fundamentally reduces their applicability on small datasets. In addition, this estimate is indirect, that is, the calculated value p(x|y) indicates the contribution of a feature to a particular class. For example, in the MRMR method [24], which is one of the filter methods based on MI, the mutual information is estimated through a Parzen window, which can become suboptimal on small samples. Additionally, feature selection is performed using a greedy algorithm, which can also lead to suboptimal feature selection. As experimentally shown in [20], the results may still tend to degrade with a decreasing sample size, although this is less pronounced than in descriptive statistics.
Embedded methods are also a common choice for FR. They are based on the parameters or structure of regressions, decision trees, SVMs (Support Vector Machines), and other algorithms, providing an easily interpretable assessment of the variables’ importances. For example, in the case of logistic regression, y = σ(wx + b), the “organic” measure of the importance of feature fi is the absolute value of its weight coefficient, (fi) = |wi|. Due to this methodological transparency, embedded methods are widely used in applied medical solutions [26,27,28,29,30]. These methods appear to be simple; however, most often they are based on classical algorithms that may have problems with overfitting on small samples.
The effectiveness of embedded feature ranking methods depends on how adequately the models themselves work with small samples. For example, the Lasso [31] and ElasticNet [32] methods use regularization, increasing convergence on small samples. On the other hand, since only regression coefficients are used as a ranking feature, these methods do not appear to properly account for the relationships between variables. Tree-based models, such as Extreme Gradient Boosting (XGBoost) [33] or Random Forest [34], form a sample estimate of the data distribution in the subspace based on the frequency of using individual features, and the estimate may lose stability on small samples [35].
Wrapper methods evaluate the data distribution that results from applying the model that generates the target variable to the original data. Wrapper methods implement genetic algorithms [36,37,38], the particle swarm optimization algorithm [39], the Boruta method [40], the top-down greedy search algorithm [41], and other searching algorithms [42]. Recent publications have shown that wrapper methods can successfully be applied on small samples, especially in eliciting significant groups of genes [43]. However, their fundamental feature is their heuristic nature, which implies that the methods underlying them tend to have no strict mathematical basis. Thus, the results of ranking the features obtained by these methods are compared empirically rather than formally, which may be undesirable in terms of generalizability within the framework of evidence-based medicine.
It is the SHAP (SHapley Additive exPlanation) method that stands out in this respect. The SHAP feature selection method, proposed in [44], is based on Shapley values and was first used in game theory to determine how much each player in a collaborative super-additive game has contributed to its success. The SHAP method has been increasingly applied in feature ranking problems [45,46,47]. However, SHAP values do not only depend on the model, but also on the input data distribution, and even the features that are not used by the model in any way can have non-zero SHAP values. In particular, SHAP values are sensitive to high correlations among different features [48]. The applicability of SHAP to small samples does not appear to have been amply studied. The definition of SHAP is not clear about how exactly the average value E[f(x)|S] of the prediction algorithm f(x) is calculated when fixing the subset of features S under the conditions of a data sample limited in size. Additionally, Ref. [49] also showed that SHAP values and variable rankings based on them fluctuate when using different background datasets acquired from random sampling, and this fluctuation increases with the background dataset size decreasing.
Most of the methods discussed above are frequentist methods, in that they consider all parameters to be fixed and the data to be random. Conversely, Bayesian methods consider that both all parameters and the data are random (and therefore will have distributions). The very definition of Bayesian statistical methods reveals their significant advantage: they are not based on the assumption of big sample sizes or on the theorems about the limiting behavior of distributions. Therefore, they can be used with any sample size and take into account some additional information about the problem, which makes their conclusions more reliable (in the case of useful inductive bias), or estimates the degree of uncertainty of conclusions in the case when there is no additional information.
For example, Ref. [50] builds a generic approach for ensemble FR based on Bayesian models, which in many ways resembles the Boruta wrapper method [40]. In [51], the authors compare three Bayesian ranking methods for categorical predictors using a specialized Gibbs sampler. The authors in [52] built the minimum submodel from all possible predictors, with the performance matching the reference model within a standard error. To avoid overfitting, the submodels are compared to the reference model in cross-validated prediction accuracy, via the efficient Bayesian approximation of leave-one-out cross-validation. Ref. [53] developed a modification of the Bayesian variable selection method, the performance of which was compared to the Lasso method [31] through an extensive simulation study.
The fundamental results of mathematical statistics [54] suggest two ways to organize Bayesian methods for ranking features. First, almost any algorithm for ranking variables can basically be applied to Bayesian models. Second, any classical parametric model can be made Bayesian by introducing some distribution of the parameters p(w), as well as formalizing the model itself in the form of a likelihood function p(y|x,w). The latter is obtained by analyzing the formula that specifies the prediction and error functions. Recent papers related to the extraction of features on smaller samples using Bayesian methods have shown the vast majority of researchers to follow the first option, that is, to apply the algorithm for ranking variables to a ready-made Bayesian model.
In our work, we propose a wrapper method in which we convert the model to a Bayesian model as the first stage of the feature ranking pipeline.
The task of the method is to rank the features of a specific dataset processed by a specific model. We train the selected model on a specific dataset. The result is a trained model M θ ^ . This is followed by the Bayesianization procedure, which implies the following: each specific prediction of the model M θ ^ is calculated while adding Gaussian noise to the model parameters (should the model parameters allow this addition) or with another randomization method. For example, when using a tree-type model, a non-terminated transition to descendant nodes is performed in randomly selected ancestor nodes. The result is a Bayesian model M θ ε , θ ε , which provides sampling from probability distributions p y | x and p y | x i instead of deterministic outputs M θ ^ ( x ) . The resulting samples are used to calculate feature ranking based on the Kullbach–Leibler divergence estimate.
Our contribution is as follows:
(1)
Our method, unlike the alternatives described above, such as the Vendi Score [18] and MRMR [24], enables feature ranking based on a direct measure via the Kullbach–Leibler divergence K L p y x p y x i of the effect of feature absence on model prediction.
(2)
Since our method uses a sampling of predictions based on randomized model parameters, it is significantly less dependent on the amount of data.
(3)
Due to model randomization, our method is applicable to both Bayesian and non-Bayesian models.
(4)
In our method, unlike RFE [41], the model is trained only once.
The rest of the paper is organized as follows. In Section 2, we present the theoretical substantiation of the developed method, the accepted metrics for evaluating its effectiveness, and describe the datasets used in the experiments. In Section 3, we present and discuss experimental estimates of the statistical stability of the developed method on small samples, as well as the results of its comparison with other methods. Section 4 concludes the work and presents the prospects for further research.

2. Materials and Methods

2.1. Theoretical Substantiation of the Developed Method

First, we introduce some notations and conventions. As we consider supervised learning problems, we let D stand for a dataset containing sets of pairs of the form (feature vector, target value). For a given example from the dataset D, xi denotes the i-th value of the corresponding feature vector or the i-th feature, depending on the context, x i denotes the feature vector without an i-th value, and y stands for the target value, the class label, or a real number. Our method is a wrapper method for supervised learning models; thus, we have a training sample D =   { x i ,   y i } i = 1 N d . Our method uses Bayesian models; hence, there is a distinction between the distribution p ( y | x ) , which stands for the output distribution of the algorithm with the input features x, and the distribution p y | x i , which refers to the same, but without considering the feature with index i. In the perspective of this analysis, p y | x and p y | x i will always refer to distributions of Bayesian model predictions, the full forms of which are p y | x ,   M ,   p ( y | x i ,   M ) . However, including M for the model would make the formulas hardly readable. Also, for some derivation, we might change it to p ( y | x , θ ) to explicitly highlight the model parameters θ. The prior distribution of parameters will be denoted as p θ , and the posterior one as p ( θ | D ) .
To specify the feature relevance, we consider the axiomatics from [55]: the feature x i is considered relevant if there is such a value x i = c of that feature that for any value of the target variable y = b , the following is true:
p x i = c ,   S p S
where S is some assignment of values to some subset of variables that does not contain the attribute x i .
The above condition is intuitively clear as it points out that the feature is relevant if it somehow affects the prediction, but it appears to have limitations: it does not give any quantitative description of feature relevance, and the conditions imposed on the features are not strong enough to implement reasonable quantitative estimates. To overcome these limitations, we propose a method for ranking variables in Bayesian models, which is uniformly suitable for arbitrary Bayesian models while fully corresponding to the above condition. We will also show that the model having to be Bayesian is not a limitation, as we can convert classically trained models into Bayesian. The method consists of identifying the differences between the predictive distribution p y | x and the incomplete predictive distribution p y | x i .
We use the Kullback–Leibler divergence K L ( p | q ) as a difference measure [56]:
K L ( p | q )   =   p ( x )   l o g { p ( x ) / q ( x ) }
where p and q are probability distributions.
In the case of a discrete distribution at the output of the model, K L p y x p y x i refers to the amount of information lost when replacing the distribution p y | x with p y | x i , which is a fairly intuitive criterion for the information content of a feature. This interpretation can be directly applied to the case of a continuous output distribution because the difference in the entropy is still defined correctly (unlike the entropy itself).
To estimate K L p ( y | x ) | p y | x i , we first obtain p y | x and p y | x i . In practice, even when we have a complete description of the model, the exact analytical formula for p y | x can often be intractable or computationally hard, and such cases would require Monte Carlo sampling. For p y | x i , we can also use Monte Carlo estimation by replacing x i in the feature vector x with its alternative values from the dataset. The effects of a smaller dataset are compensated by the fact that we do not sample based only on its values, but also on the model parameter distribution.
The scheme of the method is shown in Figure 1.
For such models as logistic regression and ensembles of decision trees, we can use shortcuts to avoid relying solely on Monte Carlo methods, which reduces computational costs and further improves the stability on smaller samples. There shortcuts are discussed below.

2.2. Analytical Shortcuts for Logistic Regression

Exact Bayesian derivation of the posterior distribution to the parameters is impossible for logistic regression, as the resulting integral is inexpressible. Hence, we use the method of variational approximation. Namely, we can approximate p ( w | D ) by the multidimensional Gaussian distribution q ( w ) .
Thus, we perform an analytical derivation for K L p y | x i for Bayesian logistic regression. First, we need an expression for the incomplete predictive distribution p y | x i . It is obtained based on the elementary properties of the conditional probabilities:
p y , x i | x i = p y | x i ,   x i p x i = p y | x p ( x i )
Marginalization of p y , x i | x i by x i provides us with the following:
p   =   p y | x p x i d x i
In logistic regression we have
p y | x ,   θ = σ x θ
where σ is the sigmoid function, and θ is the weights vector of logistic regression.
To estimate the integral (4), we use the following probit approximation:
σ x θ Φ x θ
Then, we get the following derivation:
p ( y = 1 | x )   = p x , θ q θ d θ   = Φ θ x N θ M A P ,   Λ d θ = Φ ( a ) N ( a | θ M A P x ,   x T   Λ   x ) d a   = Φ ( θ M A P   x / 1   +   x T Λ   x ) ,  
where θ M A P is the value of the weights vector obtained after variational Bayesian inference, called the maximum a posteriori estimate, and Λ is a corresponding covariance matrix.
For marginalization, we also perform analytic inference:
p ^   =   p ( y   =   1   | x i )   = p y = 1 | x p x i d x i = Φ a N ( a   | θ M A P   x ,   x T   Λ   x ) p ( x i ) d a = Φ ( a ) N ( a | θ M A P   θ ^ ,   θ ^ T   Λ   θ ^ ) d a   = Φ ( θ M A P   x / 1   +     Λ   x
where x ^ is the feature vector x , in which x i is replaced by E x i .
In cases where marginalization (4) is not tractable, but the derivation like (7) is, p y | x i can be estimated by the Monte Carlo method:
p y | x i 1 | D |   Σ x D   p ( y | x i ,   x i )
where D is the size of the dataset D .
Substituting (7) and (8) into (2), we get the following expression for the KL divergence:
K L p y | x | p y | x i   =   p l o g   p / p ^   +   1 p l o g   1 p / 1 p ^
where p = p y | x , p ^ = p y | x i are computed by Formulas (7) and (8).
The resulting estimate is of independent significance, as it can calculate the relevance of a feature for the prediction at a specific value of x, which opens up opportunities for using our method for local explainability. To derive a criterion for the overall relevance of a feature, we average the following:
r e l i =   E x r e l i ,   x
where r e l i , x refers to Formula (10).
The resulting criterion avoids sampling the predictions of logistic regression and proceeds directly to averaging over the dataset D .

2.3. Shortcuts for Decision Tree Ensembles

Decision trees can be made Bayesian in a large variety of ways, but we choose here the minimalistic way inspired by the missing value imputation method in the C4.5 algorithm [23].
First, we have the following formula:
p y | x , T = p l e a f T x
where p l e a f T x is the probability assigned as a result of training to the leaf where x comes by its decision path.
To obtain p x i , T , we use the procedure from the C4.5 algorithm. Instead of following the full decision path for an example x , both directions are chosen in the nodes where the feature x i is used, determining each of the directions with the probability proportional to the number of samples in in the node:
p l e f t = n l e f t n l e f t + n r i g h t ,   p r i g h t = 1 p l e f t
By considering all such paths, we obtain the set of all the leaves of the tree, consistent with the values of the features of x except x i and weighted proportionally to typicality, and we obtain the following estimate for p y | x i , T :
p y | x i ,   T = l l e a v e s x i p l p l ,
where p l is the probability assigned to leaf l , and p l is the probability of the path leading to l , which is computed as the product of the expressions in (13) for each node with x i in it, whereas l e a v e s x i is the set of all leaves, consistent with x except for x i . An example of such a computation is shown in Figure 2 below.
Following the above manipulations, Expression (10) can be used directly.

2.4. Proposed Method as Related to SHAP on Larger Datasets

For the ablation to the conditions of classical statistics, we estimate the asymptotic behavior of the obtained criterion in the limit of large samples. Using the Bernstein–von Mises theorem [57], we estimate the posterior distribution (and, hence, its variational approximation), as follows:
p D δ ( w     w M A P )
where w M A P = a r g m a x w p ( w | D ) . Notably, on the other hand,
p y | x i Φ a δ a w M A P   x ^ d a Φ ( w M A P x ^ )
Thus, we get a result close to the classical one for logistic regression, while marginalization becomes equivalent to filling in the missing feature with its mean value.
For further evaluation, we note that H p y | x i does not depend on x i , so (11) can be written in a more convenient form for estimating asymptotics:
r e l ( i )   =   E x [ H p y | x i     H p y | x ]
We evaluate the relevance of a particular feature x i using three methods, namely classical logistic regression, SHAP applied to it, and Bayesian logistic regression. The relevance for Bayesian logistic regression is calculated by Expression (17). The native expression for relevance based on classical logistic regression is
r e l c l ( i ) = | w i | ,
and, for SHAP applied to classical logistic regression, it looks as follows:
r e l S H A P i = Σ S   1 , , N \ i S ! N S 1 ! | N | !   Δ f ( i , S )
where
Δ f ( i ,   S ) = E [ Φ w x | x S i ] E [ Φ ( w x )   |   x S ]
Notably, w M A P i   H p x   0 almost everywhere, and | w M A P i | is a modulus of i -th coordinate of w M A P (21).
From (21), it follows that
r e l ( i ) = E x   [ H ( p ( y | x i ) ) H ( p ( y | x ) ) ] = E x   [ H ( p ( y | x i ) ) E x   [ H ( p ( y | x ) ) ]
increases because H ( x ) is non-negative everywhere, so E x i r e l ( i ) is also. Thus, in the limit, (18) positively correlates with (17).
It is noteworthy that (17) positively correlates with (20) for similar reasons.
The analysis performed confirms that, in the limit of large samples, the ranking results obtained using the proposed algorithm correlate with those of the classical methods of classical logistic regression and SHAP.

2.5. Bayesianization Procedure Analysis

Essentially, the method of Bayesianization is to randomize the parameters of the model to make its predictions into a distribution. This is done both for models with additive parameters (generalized linear models, neural networks, etc.) and with non-additive ones (Random Forest), but with some changes. The main goal at this stage is to obtain, instead of p y | x = δ y f x , a distribution that will not take us out of the “training” zone of the model, but for which the KL divergence will be calculated well.
In the additive case with the model y = f x , θ , we add to the parameters the Gaussian noise ε N 0 , σ I , where σ is the selected parameter (noise variance), and I is the n × n identity matrix, where n is dimension θ , resulting in y | x f x , θ + ε . The validity of the method follows from Taylor expansions for the moments of functions of random variables [58]:
(1)
θ + ε N ( θ , σ ) .
(2)
f x ,   θ + ε f   +   x ,   θ ε + 1 2 2 f θ 2 x ,   θ ε 2 , the local behavior of the function based on its Taylor expansion.
(3)
E f x , θ + ε f   +   2 f θ 2 x , θ σ ε 2 —the bias introduced into the output is quadratic in σ ε , which guarantees its smallness for small σ ε .
(4)
v a r f x ,   θ + ε f θ x ,   θ 2 σ ε 2 + 1 2 2 f θ 2 x ,   θ 2 σ ε 4 + f θ x ,   θ 3 f θ 3 x ,   θ σ ε 4 —the quadratic and biquadratic dependencies on σ ε guarantee that the scatter of the predictions around E f x , θ + ε will be small for small σ ε .
We choose values of σ ε around 10 2 , so both the bias and variance are small, but not negligible.
In the non-additive case, randomization becomes more diverse, but we use for tree-like models the method from the C4.5 algorithm, which just averages predictions by all leaves consistent with x except for x i . The decision tree is a locally constant function, so averaging over subsets of leaves does not make its predictions out of the domain.

2.6. Datasets and Metrics

In our experiment, we used two synthetic datasets, two publicly available datasets describing the symptoms of COVID-19 disease, and three standard datasets for benchmarking feature ranking.
Both synthetic datasets were generated based on the function “make_classification” from the Python 3.12 library sklearn, which is the standard choice for checking the quality of feature ranking [59]. The first dataset, which we refer to as “sklearn_small”, contains 50 features, 15 of which are informative, 8 which are redundant, and 3 which are the duplicates of others. The other features are essentially Gaussian noise. The second dataset, that we denoted “sklearn_large”, consists of 300 features, only 25 of which are informative, 10 which are redundant, and 5 which are the duplicates of others.
The first public dataset [60] is based on publicly released data from the Israeli Ministry of Health, and is composed of the records from 51,831 tested individuals (of whom 4769 were confirmed to have COVID-19), from 22 March 2020 through 31 March 2020. Based on these data, the authors [60] developed a model that predicts COVID-19 test results using eight binary features (sex, age below or above 60 y.o., any known contact with an infected individual, and five initial clinical symptoms). In our experiments, we used all 16 features from the dataset instead of eight.
The second public dataset [61], provided by the Mexican government, contains anonymized patient-related information, including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. The target variable was the information about the death of a patient, based on the column “date of death”.
The first standard dataset [62] is a heart disease dataset which contains 11 features about patients. The second standard dataset [63] is a heart failure prediction dataset with 13 clinical features. The third dataset [64] refers to red wine quality. It contains 11 features about wine and its quality. All datasets contain both categorical and continuous features. Table 1 summarizes the properties of all datasets.
A smaller dataset size was emulated by randomly sampling the original datasets to the desired number of samples. If the size required for the experiment exceeded the size of the dataset (which was only the case in Heart1), then bootstrapping was used.
As there appears to be no gold standard in ranking problems, and in accordance with the condition (1), an essential indicator for assessing the quality of the proposed method is its statistical stability compared to other ranking methods. For this assessment, we used the following metrics:
Self-consistency was defined as follows:
S C ( n )   =   E X m < n ,   X n   C o r r ( R [ X m < n ] ,   R [ X n ] )
where R [ X m < n ] and R [ X n ] are the rankings achieved from subsamples X m < n and X n of the sizes m and n , respectively. This metric evaluates the mean correlation between the rankings achieved by the independent subsamples of the size n and the sizes of m < n . This implies that, when working with different samples from the general population that have the size as given or smaller, the resulting rankings should not differ radically. The closer the value of this metric to 1, the more similar the obtained rankings are.
Monotonicity, defined as
M n = E X m < n X n     C o r r ( R [ X m < n ] ,   R [ X n ] )
implies that when working with the samples from the general population with one being the extension of the other, and thus having different sizes, the resulting rankings in general should not be crucially different. On the other hand, the differences should decrease with the increasing volumes of pairs. Basically, this metric measures the extent to which the feature ranking stays the same when we expand the data.
Consistency with classical methods, defined as
M u t u a l C o n s i s t e n c y ( 1,2 , n ) = E X n   C o r r ( R 1 [ X n ] ,   R 2 [ X n ] )
means that under the most suitable conditions for classical methods (that is, with sufficiently large samples), the results obtained using the proposed method should be consistent with those of classical methods.
Verification of the presence of the above properties requires a criterion C o r r for comparing the lists of ranks. For this criterion, we chose the Kendall correlation coefficient as one of the most frequent and natural ways to evaluate the ranking similarity; thus, we can evaluate the rank consistency.
To quantify the applicability of our method, we designed an experiment for evaluating quality improvements and maintenance, following the benchmarking methodology [65]. It is performed on all dataset–model pairs as follows:
(1)
A total of 20% of the data goes into the validation sample. From the remaining part, a fixed number of samples (10, 100, or 1000) is selected randomly without replacement into the training dataset.
(2)
The model is trained on the remaining part of the data, and the f1-score is calculated on the test sample.
(3)
The list of the top n relevant features is obtained with the trained model and the training dataset (in the case of filter methods—only with the dataset).
(4)
The model with the same hyperparameters is trained in the same way as in step (2), but only with the features selected in step (3) left. Then, the f1-score on the test sample is calculated.
(5)
The ratio of the f1-scores from step (4) and step (2) is calculated.
This experiment was conducted for each model and each dataset, for 10, 100, and 1000 examples in the training data, and with 1, 3, 5, 7, 9, and 12 features left. To ensure statistical stability, we performed the experiment 30 times in each configuration and averaged the results.

3. Experimental Results and Discussion

3.1. Quality Improvement and Maintenance Experiments

The total results of the experiment for quality improvement evaluation are summarized into a table of the size of 75 × 35 cells. Therefore, we present only the most illustrative parts in Table 2, Table 3 and Table 4. The complete results are included in Appendix A, Table A1, Table A2, Table A3 and Table A4.
For brevity, the tables below show only the top three methods by model quality on the test sample. Some abbreviations were used: “embed” instead of “embedded_ranking”, “our” instead of “our_method”, and the rest of the names did not display the “_ranking” marker.
As Table 2 shows, in most cases, our method and SHAP provide the highest ratios of the metric after filtering to the original in the case of tree models, which means that both methods tend to be the most efficient to select the most relevant features for a particular model. However, the tables show that, while in the case of XGBoost, SHAP is comparable to our method, on Random Forest, our method appears to be more efficient in many more cases. This might mean that our proposed Bayesianization method is preferable on ensemble models where predictors are independent rather than on boosting, where they are dependent.
As Table 3 shows, when our method is used on linear models, it manifests the best results in most cases, showing its power in selecting the features specifically relevant for the model.
As Table 4 shows, the top performance was lost on the dataset “sklearn_small” with a high number of redundant features, and embedded methods together with MRMR ranking prevailed. This may indicate a direction for improving our model, because it shows that redundancy in features tends to affect the performance. The effectiveness of embedded methods is explained by the high robustness of tree-like ensembles to noisy redundant features, which reduces their influence. The effectiveness of MRMR on large datasets is explained by the fact that the correlation between noise features and the target will be near-zero, and those features will not be selected. On smaller samples, its effectiveness is more surprising but can be partially explained by the fast convergence of correlation estimation with Gaussian noise. On the other hand, our method focuses on the influence of features on the model predictions, and the model might capture spurious relations. This is partially mitigated by the Bayesianization scheme, which reduces the influence of irrelevant features, but it seems that in that case, the current scheme is too simple for a purely random features case.

3.2. Consistency Experiments

As SHAP appears to be the closest to the proposed method, we conducted consistency experiments with it. For ablation, we used classical logistic regression and Bayesian logistic regression embedded methods of feature ranking.
Using Expressions (17)–(19), we calculated the mean percent of features in the top ranking that are truly relevant for our synthetic dataset. The results are shown in Figure 3a,b, where our method is denoted as “bayes”. The mean percent was evaluated by generating 100 examples of synthetic datasets of different sizes (horizontal axis), ranking them, and then averaging the mean percent of features at the top of the ranking. The variability of results could fluctuate, but never surpassed 2% deviation.
All methods appear to provide effective results; however, our method performed slightly better than the others on both datasets, which appears to confirm the fundamental validity of our method.
We computed the mean values, 25th and 75th percentiles for self-consistency (22), monotonicity (23), and mutual consistency (24) for the rankings obtained by the following methods: classical linear regression, Bayesian linear regression, and SHAP-based ranking for classical and Bayesian regressions. The corresponding expressions were evaluated using the Monte Carlo method based on 100 iterations of the procedure for sampling the subsamples of corresponding sizes, training models on them, and calculating the corresponding rankings and the correlations between them.
The self-consistency results are presented in Figure 4a,d. The horizontal axis shows the dataset size, while the vertical one shows the value of the self-consistency metric (22).
Figure 4 shows that, on the first synthetic dataset at a sample size value of 50, the self-consistency value is extremely small compared to the bigger sample size values. This can be explained by the fact that, for a synthetic dataset of such volume, the sample may be nonrepresentative because of the formal relationship with the target variable. Our model appears to outperform others on average by the values of self-consistency. Figure 4b shows similar results in terms of value; however, it does not demonstrate an anomaly with an extremely small value of 50.
Figure 4c demonstrates a relatively high self-consistency value at a sample size value of 50 compared to the bigger sample size values. This can be explained by the fact that at such volumes, the sample may contain too little information, which results in the model failing to produce sufficiently biased conclusions.
For the classical logistic regression, a dip is manifested in the region of 500−5000, which clearly shows the problem of smaller samples: data volumes have to be large enough to obtain a high level of self-consistency.
SHAP, as a conventional method for quantifying the importance of factors, shows superior results in conjunction with classical regression, and it does not have a dip in the region of 500−1000, which may indicate a higher level of information accumulation than classical logistic regression.
Our method and SHAP, in conjunction with Bayesian regression, appear to show similar results, although our method demonstrates, on average, slightly higher values of the metrics. Both methods are also proven to outperform the others when starting with a sample size of 500.
Importantly, on the second public dataset, all methods show lower self-consistency values compared to the first one; however, our method outperforms the others by at least 0.5−1 units of the correlation coefficient.
The classical logistic regression shows extremely low self-consistency values compared to the other methods, while SHAP with both regressions and our method show similar results.
Figure 5 compares the top 15 features selected on samples of different sizes from the same dataset. The figure visually demonstrates the self-consistency of our method.
Figure 5a,d presents the results on monotonicity. Figure 5a,b shows that all methods demonstrate similar results on monotonicity, which can be explained by the synthetic nature of the corresponding datasets. Figure 5c manifests that, in terms of monotonicity, the same pattern is observed as with self-consistency: the smallest value is observed for logistic regression, then comes SHAP in relation to classical regression and Bayesian regression, and our method gives the highest value. In terms of monotonicity, the results of applying SHAP to both types of regression are not essentially different.
Our method appeared to demonstrate a significantly greater level of monotonicity, which means that it allows more consistent conclusions about a larger sample based on its subsamples. According to the results on the second public dataset that are presented in Figure 5d, all methods show a significantly lower level of monotonicity compared to the first one. However, the order in terms of the level of monotonicity remains the same, which confirms the previous conclusions. Table 5 presents the comparison of top 15 features selected on samples of different sizes from the same dataset.
Figure 6a,b presents the results on mutual consistency.
Figure 6a shows that, on small sample sizes (less than 1000), the mutual consistency of our method is always positive, which means that the methods do not contradict each other. With the growing sample size, the mutual consistency for all methods starts to drop, but then increases again. We suggest the following explanation: on smaller samples, the results of all algorithms can be similar because of the lack of information contained within the sample; however, with the sample size growing, new details may appear and algorithms start to differ.
The agreement with classical regression appears to remain low; however, with the sample growing, the metric’s value eventually tends to increase. The results correlate the closest with SHAP as applied to Bayesian regression, which shows that the methods have some features in common. With SHAP applied to classical regression, the self-consistency is somewhere “in the middle”, which confirms some similarity of our method with SHAP.
Figure 6b shows a similar pattern of correlations: the method correlates least of all with classical regression, and the closest of all with SHAP in relation to Bayesian regression, whereas SHAP with classical regression is “in the middle”. This also confirms the similarity of the results obtained using our method and using SHAP.
We have shown on both synthetic and public datasets that the new ranking method has a slightly higher level of self-consistency on both types of datasets compared to other methods, which is shown in Figure 4a,d. Figure 5a,d demonstrates its significantly higher level of monotonicity on the public datasets and comparable level on synthetic data. Additionally, within the studied sample sizes, our method proves to be the most consistent with the results of applying SHAP to Bayesian regression, less so with applying SHAP to classical regression, and least of all with classical regression. However, in all the three cases, it manifests an increase in consistency with the sample size growing, as shown in Figure 6a,b. On synthetic datasets, the percentage of truly relevant features among the top ranking was computed, and all methods showed similar results; however, our method was slightly better on smaller samples. It is noteworthy that on different datasets, the methods showed different qualities, although an average order was always observed: classical regression, SHAP to classical regression, SHAP to Bayesian regression, and our method.

4. Conclusions and Future Work

Based on the Bayesian approach, this paper proposes a solution that allows not only building a methodically justified way of ranking features on small datasets, but also methodically solving the problem of benchmarking the results obtained by various ranking algorithms.
In our work, we propose a wrapper method in which we convert the model to a Bayesian model as the first stage of the feature ranking pipeline. The result is a Bayesian model M θ ε , θ ε , which provides sampling from probability distributions p y x and p y x i instead of deterministic outputs M θ x . The resulting samples are used to calculate feature ranking based on the Kullbach–Leibler divergence estimate. In our method, the Kullbach–Leibler divergence is applied in an especial way: instead of quantifying the differences in the distribution of the predictor between the different values of the target, it quantifies the differences in the target distribution on a particular input example, given a particular predictor is removed or not removed. It provides a more model-specific feature ranking procedure, while remaining within the framework of the model-agnostic approach.
We have theoretically justified the validity of our method. We have demonstrated shortcuts for different types of models, improving the computational efficiency of the proposed method. We have theoretically confirmed the equivalence of the classical frequentist approach and the proposed approach.
We have carried out an experimental evaluation of our proposed approach with SOTA methods on a wide experimental base in terms of quality improvement or maintenance after the feature selection procedure, with the top n relevant features left. In most cases, our method manifested the best results, except on the data with a high number of redundant features. We consider this case to be the subject of our further research.
We have carried out an experimental comparison of our proposed approach with the classical method. We have experimentally evaluated the self-consistency, monotonicity, and mutual consistency of the rankings obtained by our methods and the closest SOTA method. All the experiments performed have confirmed our theoretical conclusions: with the growth of the sample, an increasing trend of mutual consistency was observed, and our method demonstrated at least comparable, and often superior, values of self-consistency and monotonicity to other methods.

Author Contributions

Conceptualization, A.V. and N.G.; methodology, I.T.; software, I.T.; investigation, A.V. and I.T.; writing—original draft preparation, N.G.; writing—review and editing, N.G.; visualization, I.T.; project administration, A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Science and Higher Education of the Russian Federation, Goszadanie (State Assignment) No. FSER-2025-0013.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Results of the experiments for tree-based models on the standard datasets.
Table A1. Results of the experiments for tree-based models on the standard datasets.
Model
Name
XGBoostRandom ForestDecision Tree
Method NameEmbedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Dataset NameData SizeNum Features
Heart1010.9980.7730.8470.7220.981.17 *0.7790.8070.7930.7830.7760.888 *1.0860.8570.8870.8071.0861.119 *
31.0611.0610.8820.9771.0351.156 *0.8380.8490.8380.9110.8540.992 *1.096 *0.9290.9520.991.071.07
61.0890.9531.0240.991.123 *1.1160.8750.9110.9040.880.891.023 *1.0460.9481.0021.0031.053 *1.034
91.0741.0591.0351.0711.096 *1.0580.9330.9610.9440.9480.9681.033 *1.0531.0490.9611.0071.056 *1.049
120.9930.9910.9811.0321.057 *1.0261.021 *1.0121.0011.0171.0011.0141.058 *1.0171.0061.0111.0461.058 *
10010.7690.8150.7450.7240.8380.922 *0.7270.7430.6990.6770.6940.892 *0.8730.8630.8130.7150.8590.995 *
30.8970.8820.8740.8690.8950.991 *0.8340.8280.8440.850.8160.939 *0.9670.8830.8580.860.9521.015 *
60.9430.9240.9540.979 *0.9470.9530.9430.9090.9190.954 *0.920.9530.9790.9580.9541.0081.015 *0.98
90.9750.9690.9611.0 *0.9830.9680.9920.994 *0.9620.9790.970.9670.9931.0120.9711.031 *1.0120.959
120.9980.990.9840.9891.004 *0.9931.002 *0.9910.9841.0010.9930.9980.9881.0010.9841.02 *0.9960.992
100010.7230.7010.740.6790.7830.804 *0.7170.7010.7350.6650.7310.774 *0.7190.6670.6860.6480.7460.774 *
30.8450.8030.8440.8620.830.87 *0.8910.790.8770.899 *0.8430.8560.870.8380.90.921 *0.8340.854
60.9150.9180.9370.947 *0.9360.9460.9750.9550.9550.978 *0.9460.9740.9970.9990.981.002 *0.9890.994
90.9560.9690.960.994 *0.9870.9710.9920.9981.001 *0.9940.9930.9931.0061.01.0031.007 *1.01.003
121.002 *0.9940.9870.9950.9940.9851.001 *0.9990.9961.01.00.9951.0031.0031.006 *1.0041.006 *1.005
Heart11010.9890.7731.0460.7890.9951.186 *0.7820.6070.8120.7070.7551.015 *0.9650.8270.8340.7790.9651.093 *
31.0790.8990.9250.8221.129 *1.0920.7830.7910.9020.7710.7430.984 *1.079 *0.8971.0120.8561.0341.074
61.1221.0160.9850.9321.138 *1.1220.9690.9430.9190.8630.976 *0.9441.0240.9850.9930.9241.0211.047 *
91.0391.0651.0070.9991.0621.155 *0.9860.9370.9640.9230.9760.988 *1.0191.0310.9590.951.031.054 *
121.042 *1.0190.9860.9561.0221.0380.9760.9910.9750.9730.9851.014 *1.0551.010.9650.9311.059 *1.021
10010.7730.7680.8780.7690.791.005 *0.7050.7120.7830.7280.70.975 *0.7340.7320.8940.7420.7521.068 *
30.8940.8030.9660.8140.8681.034 *0.8010.80.9140.7740.8060.973 *0.9160.7830.9350.7670.8951.06 *
60.9520.950.9710.880.961.0 *0.956 *0.8980.9420.8850.9530.9530.9750.9330.9760.8740.9711.049 *
90.9840.950.9840.9440.9670.996 *0.9730.9490.9830.9290.987 *0.9730.9690.9860.9790.9060.961.038 *
120.992 *0.9770.9850.9910.9880.9790.9850.996 *0.9930.9940.990.9870.9730.9790.9720.9740.9781.0 *
100010.7720.7720.7840.7720.7720.941 *0.7440.750.7540.7620.7540.935 *0.8280.8280.8280.8280.8281.047 *
30.8490.7820.950.8660.8290.959 *0.8250.8080.930.8460.810.947 *0.9860.7820.9870.8920.9341.052 *
60.9530.9470.965 *0.8830.9530.9520.9530.954 *0.9440.8930.9440.9491.0050.960.9830.8440.9591.041 *
90.9670.986 *0.9770.8980.9780.9740.9560.981 *0.9770.9240.981 *0.9630.9950.9930.9860.8930.9941.049 *
120.9830.9940.995 *0.9540.9920.990.9770.9830.9870.9660.993 *0.9820.9761.01 *1.0070.9630.9950.989
winequality-red1011520.0331729.6520.5475769.8971519.8695770.483 *13377.53117197.868 *9040.5527993.6239548.1397337.7920.474825.1621651.445 *952.6120.4740.574
35770.666 *0.9120.8685769.9345770.4431.3620.00.02443.174 *952.418357.329900.10.2750.4771651.739 *0.3570.690.549
61.2671072.711.4751.1371.5951072.948 *1111.2851322.3711663.641 *476.29465.216869.6650.4560.538 *0.4630.5210.4860.421
91072.6071072.3320.9821073.438 *1072.9541072.5451608.405 *0.0761321.305357.1820.10.00.5090.7460.5610.769 *0.3790.42
121.1710.7381.0791.293 *1.2790.933487.926 *0.1350.1357.2370.1360.10.5460.6340.8350.5590.5360.872 *
10010.6050.4280.4630.5840.4280.767 *0.4820.5310.5790.6130.5890.922 *0.4670.3350.4030.5830.4210.719 *
30.8110.7240.6751.0330.7581.039 *0.840.7660.5440.7870.7741.091 *0.987 *0.6520.5790.8210.8750.983
60.9060.9230.8850.9340.8091.038 *0.931.0750.8460.8060.8851.224 *0.9960.8720.8381.068 *0.9921.054
90.9550.990.9840.997 *0.9840.9771.081.0770.9441.0581.111.194 *1.074 *0.9590.9921.0351.0390.945
120.9941.0221.01.0210.9651.024 *1.041.081 *1.0781.0150.9611.0741.0111.0320.9320.9790.9731.058 *
100010.2290.2290.1980.2980.2730.426 *0.2090.1990.2080.2480.1770.429 *0.2130.1870.2660.363 *0.1870.292
30.7510.4290.540.5490.6940.822 *0.5870.4160.3320.330.5640.864 *0.7520.8210.7690.7310.8530.947 *
60.920.8960.7950.80.8780.937 *0.8820.9840.7680.7660.9450.994 *1.0051.01 *0.9680.9580.9760.986
90.9410.965 *0.9150.9390.930.9421.041 *1.0191.0060.9060.9861.0151.0361.063 *1.0481.0511.0091.003
120.9940.9970.9961.003 *0.9981.01.0250.9970.9931.0281.047 *1.0170.9951.011.029 *1.0220.9921.019
Note: In Table A1, Table A2, Table A3 and Table A4 the asterisk (*) indicates the element that corresponds to the maximum value in the group.
Table A2. Results of the experiments for linear models on the standard datasets.
Table A2. Results of the experiments for linear models on the standard datasets.
Model
Name
Logistic RegressionLassoRidgeElastic Net
Method
Name
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Dataset NameData SizeNum Features
Heart1010.7470.8770.7730.8140.7610.896 *0.7110.8120.6610.841 *0.7460.8110.5640.7330.530.4760.5640.752 *0.530.7730.7530.6930.821 *0.812
30.9680.9260.9340.8310.9151.024 *0.9150.9530.920.9030.9020.998 *1.0151.0160.9360.8131.0051.034 *0.9550.8770.9070.7830.8710.99 *
60.9350.8760.9670.9170.9120.998 *0.8680.9490.9570.8890.9091.037 *1.0370.9821.0070.9641.0451.05 *0.980.9530.9780.9260.9621.031 *
91.012 *0.9780.9461.0070.9770.9840.9410.9450.9620.9570.9460.989 *1.0631.0481.0210.9971.069 *1.0411.0090.9810.9990.9421.0121.013 *
121.013 *0.9940.9941.0061.013 *1.013 *0.980.980.9820.9930.9711.009 *1.056 *1.0061.0141.0071.056 *1.0291.0031.015 *1.0041.0071.0031.014
10010.7840.820.7670.7690.7750.926 *0.6960.860.8490.7820.7180.932 *0.8110.830.8410.7490.8410.949 *0.7550.8020.8280.790.7950.932 *
30.8470.8660.9020.8490.8530.942 *0.8730.8770.9110.880.8470.941 *0.9040.9190.8910.8930.9010.966 *0.8650.8720.9220.8960.90.945 *
60.9330.9080.9430.9340.9170.977 *0.9310.9140.9480.981 *0.910.9730.9510.9660.9740.984 *0.9510.970.9460.9660.9730.988 *0.940.983
90.9690.983 *0.9730.9810.9640.9710.9810.9620.9721.006 *0.9710.9630.9830.998 *0.998 *0.998 *0.9930.9710.9910.9960.9741.005 *0.9930.975
120.9930.9920.9911.0060.9941.008 *1.0120.9910.9821.0121.0121.016 *0.9921.008 *0.9990.9980.9921.0050.9881.0020.9861.01 *0.9881.005
100010.7280.7480.7830.7540.7110.882 *0.7610.8370.8540.7570.6680.898 *0.7240.7940.8250.7540.7260.902 *0.7490.7410.8240.7470.7330.896 *
30.8720.7560.8270.8610.8690.928 *0.8650.7910.8440.8720.890.925 *0.8910.7950.8420.8640.8940.928 *0.8690.7610.8580.880.8810.922 *
60.8910.9050.910.9530.8760.959 *0.8790.9060.930.9370.890.966 *0.9060.90.9280.9570.920.963 *0.8940.9010.9140.960.8970.968 *
90.9440.9690.9350.983 *0.9380.9490.9220.9630.9330.967 *0.9340.9490.9550.9590.9460.976 *0.9580.9490.9280.981 *0.9430.970.9360.947
120.9730.9940.9751.004 *0.9761.0020.9640.9890.9660.9860.9640.997 *0.9860.9970.9931.00.9871.005 *0.9731.00.9760.9930.9731.004 *
Heart11010.870.8660.7750.8310.8081.136 *0.680.7950.8340.8120.7720.979 *0.9210.4970.6780.5720.9211.193 *0.8930.8510.9140.8150.8790.935 *
30.8740.9271.0580.9010.9151.103 *0.7970.8190.9260.8440.7940.958 *1.1130.8161.0360.71.111.135 *0.8890.8750.950.8280.8771.033 *
61.0071.0150.940.8730.9421.035 *0.9370.953 *0.9420.880.9190.9490.9961.071.0040.9791.0031.08 *1.006 *0.9710.9790.9581.0021.004
90.9771.0140.9450.9630.9311.028 *0.9620.9660.972 *0.9250.950.9551.0121.0371.0271.0661.0111.101 *0.9950.9890.9880.9661.0051.029 *
120.9731.023 *0.9770.9340.971.0020.9860.9750.9840.9640.987 *0.9781.0091.0311.0211.038 *1.0081.0121.0030.9761.0010.9920.9841.016 *
10010.8490.8440.8690.8430.830.993 *0.8190.7920.9150.810.8360.986 *0.7960.8050.8650.8350.8471.004 *0.8990.8080.8420.8120.8830.977 *
30.9030.8570.9690.8520.8840.993 *0.8540.8130.9650.7990.8710.983 *0.9310.7720.9490.8190.9141.008 *0.9520.8390.9730.8170.9430.992 *
60.9620.960.9880.9360.970.989 *0.9540.9490.9670.9260.9510.974 *0.9920.9180.9720.8841.006 *0.9840.9730.950.9910.9060.9760.996 *
90.9891.0030.9850.9430.991.019 *0.9770.9810.9850.9310.9730.991 *0.9910.9890.9890.9380.9931.02 *0.9850.9770.9980.9140.9871.01 *
120.9970.991.009 *0.9841.0031.0030.9810.9940.9920.9810.9830.997 *0.9941.0010.9980.9861.0011.015 *0.9930.9890.9920.9940.9941.002 *
100010.7660.7850.8390.7850.7330.949 *0.8180.7850.8250.7850.7990.957 *0.780.7950.8430.7950.6760.957 *0.7470.7960.8330.7960.670.95 *
30.8540.7970.9440.8640.8480.952 *0.8730.8050.954 *0.8790.8710.9520.8580.8190.9350.8620.8540.953 *0.8740.8090.9230.8540.850.953 *
60.9530.9540.963 *0.9060.9380.9570.9520.9550.97 *0.9180.9360.9590.9450.967 *0.9520.940.9450.9620.97 *0.9610.960.9280.9490.956
90.9710.982 *0.9740.9090.9580.9810.9590.9860.9830.9250.9510.987 *0.9710.970.9880.9380.9650.99 *0.9690.9890.9790.9270.9760.992 *
120.9910.9960.9850.9590.9890.999 *0.990.9971.0 *0.9590.9841.0 *0.9870.9861.0 *0.9490.9820.9940.9820.998 *0.9890.9630.9910.997
winequality-red1010.5990.4350.780.9580.5991.495 *0.0890.3150.0840.1680.0890.344 *0.00.0160.0210.2120.0470.287 *3714.2860.00.0351584.3993714.2863714.554 *
30.9730.7920.7841.2281.0811.835 *465.489455.053425.803408.755465.579 *455.3980.30.033408.198 *0.3990.2810.5491970.529416.792416.9642178.225 *1970.529385.13
61.1030.80.9761.0061.0941.487 *0.593889.758 *0.5830.650.56435.8080.471454.704 *0.4410.4030.5130.6250.6472105.8 *1887.2741569.1860.6490.471
91.0490.7561.0180.870.961.089 *0.8460.951 *0.7350.880.8740.9350.571454.9370.580.5540.613851.681 *0.6171177.1461539.193 *0.6310.6170.565
121.0 *1.0 *1.0 *1.0 *1.0 *1.0 *0.9 *0.9 *0.9 *0.9 *0.9 *0.9 *0.60.601 *0.60.60.601 *0.60.8 *0.8 *0.8 *0.8 *0.8 *0.8 *
10010.060.0120.0780.1660.060.542 *0.0650.00.0430.0880.0650.342 *0.00.0290.0250.0890.00.479 *0.0230.00.00.0310.0230.245 *
30.3080.2260.3170.6750.3480.923 *0.290.1680.1280.4120.3150.518 *0.3120.1670.1150.3550.320.614 *0.160.0590.1020.5050.2440.629 *
60.6510.8460.7050.878 *0.6140.8650.718 *0.7020.6210.6590.6160.7040.810.4610.3670.6830.862 *0.7580.6360.7580.5210.6080.6380.881 *
90.8850.9220.9140.9530.9321.04 *0.8961.024 *0.9010.9180.8620.8921.006 *0.9210.7641.0030.9890.870.8381.035 *0.6970.9630.8980.962
121.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *
100010.0240.00.0090.00.0080.485 *0.00.00.0120.00.00.501 *0.00.00.00.00.00.513 *0.00.00.00.00.00.559 *
30.4690.2830.0890.1010.4810.708 *0.5030.2510.1170.2060.4780.765 *0.4450.1350.0360.0370.4390.77 *0.3660.1780.0920.0420.3160.744 *
60.7360.944 *0.5390.5130.7620.7820.8070.993 *0.6540.5540.8610.8110.7210.922 *0.4970.3810.7080.8050.580.969 *0.5720.530.4880.776
90.9310.961 *0.9210.8350.9290.9220.9580.973 *0.8960.8930.9550.9480.9130.988 *0.8340.8520.9060.8521.0091.038 *0.7540.8461.0350.936
121.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.01.002 *1.002 *1.002 *1.002 *1.0
Table A3. Results of the experiments for tree-based models on sklearn-generated datasets and on the public datasets.
Table A3. Results of the experiments for tree-based models on sklearn-generated datasets and on the public datasets.
Model
Name
XGBoostRandom ForestDecision Tree
Method
Name
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Dataset NameData SizeNum Features
covid1010.9891.0031.0011.0061.055 *1.0440.9980.9950.9960.9981.048 *1.0410.9941.0121.0061.0091.044 *1.042
30.9880.990.9930.981.037 *1.037 *0.9960.9950.9960.9931.0451.046 *0.9750.9960.9731.01.024 *1.024 *
60.9880.9910.9840.9831.037 *1.037 *0.9980.9960.9950.9961.048 *1.0460.9710.9881.0040.9951.02 *1.003
90.9830.9880.9880.9781.029 *1.0280.9970.9960.9950.9961.047 *1.0450.9580.991.0020.9781.0171.023 *
120.9820.9910.9960.991.0221.033 *0.9980.9970.9990.9971.048 *1.0460.9550.9961.004 *0.9851.0021.003
10010.9930.9910.9930.991.041 *1.041 *0.990.990.990.9881.038 *1.0370.9940.9941.0010.9981.047 *1.044
30.9870.9870.9920.9851.034 *1.0320.9840.9890.9890.991.0311.034 *0.9890.9931.0010.9891.045 *1.032
60.9860.9860.9920.9831.032 *1.032 *0.9830.9860.9860.9861.035 *1.0320.9810.980.9960.991.035 *1.03
90.9880.9850.9920.9831.033 *1.030.9870.9880.9850.9891.035 *1.0320.9790.9860.9960.9861.035 *1.029
120.9860.9910.9910.9891.032 *1.0290.9870.990.9860.9921.036 *1.0330.9760.9860.9910.9891.033 *1.026
100010.9690.9670.9670.9661.0141.015 *0.9760.9750.9750.9751.026 *1.0240.9720.9730.9730.9721.022 *1.021
30.9670.9650.9660.9711.015 *1.015 *0.9710.9730.9750.9771.024 *1.024 *0.9720.9710.9710.9771.02 *1.02 *
60.9670.9640.9680.9711.014 *1.0130.9730.9740.9790.9771.025 *1.0240.9690.9690.9720.9731.0161.017 *
90.9680.9660.9680.9741.015 *1.0140.9760.9750.9820.9791.027 *1.0230.9680.9620.9690.9711.016 *1.015
120.9680.9670.970.9751.016 *1.0130.9770.9770.9820.9841.029 *1.0240.9660.9630.9650.9721.0131.014 *
withmeds1010.0820.0930.2140.785 *0.0860.0864762.011 *4000.1950.1464467.6870.0372863.8810.120.3320.5261.13 *0.1260.126
30.1260.2210.5180.785 *0.1320.1323636.922 *1818.6972222.5713350.8852210.9070.2590.3510.3430.4831.044 *0.3690.369
60.6720.4660.5140.785 *0.7170.7243636.9383334.1313333.7463350.8632211.1275250.59 *0.9170.4290.9011.045 *0.9450.964
90.727 *0.5350.6050.7180.6860.6792222.8792667.2944615.69 *2978.6881615.9233818.7360.8530.4161.0351.045 *0.820.817
120.5490.570.6270.689 *0.5730.4995455.078 *2857.8992857.6653350.821500.6153000.7120.950.9441.065 *1.0250.9130.715
10010.00.1040.0890.73 *0.00.00.2740.0650.0750.73 *0.1290.00.00.0180.1350.734 *0.00.0
30.1180.2880.2680.73 *0.1170.1520.3620.2830.1920.691 *0.4140.1370.2320.3950.3820.734 *0.2440.247
60.5650.4010.3820.730.6370.866 *0.3960.2850.3580.7210.5890.864 *0.7610.50.5060.7340.7990.908 *
90.7320.4710.4720.730.7450.753 *0.4530.5080.3890.7080.680.771 *0.8850.5280.5210.7340.916 *0.874
120.7780.4790.5130.730.835 *0.7510.4180.5860.5610.6730.7070.744 *0.8210.6060.6420.730.858 *0.819
100010.00.0680.0680.693 *0.00.00.0670.1460.0790.73 *0.190.00.00.140.0440.736 *0.00.0
30.0720.2510.3070.693 *0.1180.0910.340.1740.310.687 *0.2490.1750.2640.3480.4760.736 *0.2770.277
60.3270.5510.4260.6930.2480.785 *0.2980.370.4340.6870.3190.849 *0.8290.3440.5410.7360.870.932 *
90.5550.550.5070.6930.6480.729 *0.4010.4860.5010.6750.3390.679 *0.7570.4340.7070.7360.80.81 *
120.7670.610.6150.6930.81 *0.7180.3570.5210.6150.676 *0.4990.6710.7320.5630.6980.7360.779 *0.761
sklearn_smal -red1010.7020.780.6770.809 *0.7040.7360.797 *0.7360.7520.730.7890.7090.8790.8720.8830.893 *0.8790.842
30.7770.7260.847 *0.8160.7020.7220.894 *0.8040.8430.8180.8510.6650.8661.014 *0.9640.9630.8840.796
60.7940.8310.913 *0.8180.7960.7520.946 *0.8650.930.9090.8860.7410.9341.0211.0021.108 *0.9280.878
90.8910.8950.973 *0.820.8750.8820.94 *0.9160.9390.930.9240.8820.971.125 *0.9591.0211.0290.972
120.920.90.958 *0.8480.9140.880.965 *0.9610.9530.9340.9430.9380.9731.085 *1.0261.0740.9390.914
10010.661 *0.6260.661 *0.6580.6550.6060.630.6360.6910.701 *0.6460.6130.6530.6510.7130.75 *0.6260.631
30.7740.740.7970.807 *0.7660.6130.7410.7390.801 *0.7920.7640.6080.6730.7460.845 *0.8280.6530.661
60.8450.8160.881 *0.8550.850.6970.8620.8310.881 *0.860.8450.7370.7880.8680.911 *0.8770.7930.763
90.909 *0.9010.9070.8960.8720.8780.9070.8960.914 *0.8940.8820.9090.8680.9070.9270.947 *0.8750.88
120.9240.9380.956 *0.9320.9220.9140.9320.943 *0.9260.943 *0.9140.930.9210.9440.9670.969 *0.9250.929
100010.6390.6360.6190.653 *0.6150.5520.5930.6290.723 *0.6810.6230.5650.628 *0.6250.620.6150.6170.577
30.7160.7410.773 *0.7660.7240.5620.7240.7530.81 *0.7510.7360.5560.7360.6910.738 *0.7240.7250.596
60.8230.8230.858 *0.8330.8060.7180.8530.8630.877 *0.8250.8480.7290.843 *0.7830.8190.8070.8390.715
90.8880.8920.9020.8840.8780.915 *0.9080.931 *0.9120.9180.9170.9230.8790.8560.850.907 *0.8890.887
120.9360.9410.9350.9350.9390.954 *0.9290.955 *0.9540.9320.9410.9460.9110.9150.9160.93 *0.9140.92
sklearn_large1010.870.9490.8560.957 *0.870.870.8330.8330.820.838 *0.7560.7940.8420.895 *0.7570.7570.8420.842
30.9140.928 *0.8580.9160.8820.9160.7830.8270.8340.875 *0.7710.7790.930.8990.9040.8730.920.965 *
60.9760.9550.9560.9130.984 *0.980.8750.860.8770.9010.8450.959 *0.995 *0.9860.9810.8960.9830.982
90.9050.9940.999 *0.9530.8970.9350.927 *0.9120.8970.9150.8590.9241.0050.9560.9761.01.007 *0.933
121.107 *1.0230.940.9681.0761.1010.9190.9240.8760.9130.890.996 *1.0560.9720.9621.0211.0781.117 *
10010.670.6710.6850.6680.691 *0.6660.6620.6550.6540.6740.676 *0.670.6780.714 *0.670.6530.6780.678
30.7340.7090.766 *0.7210.7380.7210.6920.6870.7220.6690.7170.732 *0.7510.759 *0.7470.7170.7440.742
60.7340.740.8070.7650.7830.886 *0.7780.7130.7560.7280.7390.883 *0.830.7760.8190.8090.8340.906 *
90.7530.7620.8150.7960.8010.879 *0.8040.7670.7880.7530.8030.871 *0.908 *0.8160.8270.8620.8930.878
120.7910.810.8390.8140.8070.961 *0.8190.7790.8350.8110.8020.939 *0.9540.8240.8510.8660.9340.955 *
100010.5910.65 *0.6090.5710.620.5920.6170.5850.5890.6180.632 *0.5770.6440.646 *0.6220.6120.6120.616
30.6360.6640.69 *0.610.6530.6460.6340.5940.6340.656 *0.6480.6480.685 *0.6580.6830.6450.6560.681
60.6710.6610.7080.6980.7110.837 *0.6750.6710.7540.7530.6770.85 *0.7060.7030.770.7510.6810.849 *
90.6970.6990.7790.7160.7360.836 *0.7130.7220.7780.7860.7150.848 *0.7490.7620.790.7610.7260.841 *
120.7040.7120.7830.7720.770.898 *0.7260.7780.8110.8090.7650.904 *0.8020.7990.8280.7760.7780.921 *
Table A4. Results of the experiments for linear models on sklearn-generated datasets and on the public datasets.
Table A4. Results of the experiments for linear models on sklearn-generated datasets and on the public datasets.
Model
Name
Logistic RegressionLassoRidgeElastic Net
Method
Name
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Embedded
Ranking
Vsi
Ranking
Mrmr
Ranking
Permtest
Ranking
Shap
Ranking
Our
Method
Dataset NameData SizeNum Features
covid1011.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.01.002 *1.002 *1.002 *1.002 *1.0
31.0121.0131.0221.0141.057 *1.057 *1.0131.0111.0131.0081.063 *1.0441.0171.0111.0191.0131.068 *1.0641.0011.0011.0051.0051.052 *1.044
60.9930.9981.0130.9951.0551.058 *1.0020.9991.0031.0031.051 *1.0481.0131.0081.0130.9921.064 *1.064 *0.9941.0020.9971.0051.0441.05 *
91.0050.9981.00.9871.054 *1.0520.9991.0010.9991.0031.049 *1.0471.0051.0061.0071.011.055 *1.0540.9970.9981.00.9961.0471.05 *
121.010.9970.9950.9981.059 *1.0291.01.0051.0040.9971.05 *1.0491.0011.0081.0061.0041.0511.053 *0.9970.9980.9930.9971.0471.05 *
10011.0090.9880.9940.9991.056 *1.0310.9990.9981.0070.9991.0491.05 *1.0051.0041.0091.0041.055 *1.0510.9980.9970.9970.9961.0471.05 *
31.0061.0071.0061.0051.0521.055 *0.9980.9990.9980.9981.048 *1.048 *0.9970.9990.9970.9981.047 *1.047 *0.9980.9980.9980.9981.048 *1.048 *
61.0061.0051.0061.0021.054 *1.0520.9980.9980.9971.01.046 *1.046 *0.9990.9970.9981.01.05 *1.0470.9980.9980.9981.01.048 *1.046
91.0041.0021.0081.0031.051.051 *0.9960.9980.9971.01.047 *1.0440.9990.9970.9981.01.053 *1.0450.9980.9980.9981.01.049 *1.047
121.0021.0051.0071.0041.05 *1.0490.9970.9990.9990.9981.047 *1.0441.0010.9980.9981.0011.051 *1.0440.9970.9990.9981.01.048 *1.045
100010.9991.01.01.01.048 *1.0470.9960.9980.9981.01.047 *1.0431.00.9970.9981.0011.05 *1.0440.9980.9991.01.01.049 *1.044
30.9790.9790.9780.9791.028 *1.0270.9830.9830.9820.9831.032 *1.0310.9810.9790.9790.9791.03 *1.0280.9840.9820.9820.9821.034 *1.031
60.9790.9790.9790.9821.03 *1.0270.9830.9830.9830.9871.032 *1.0310.980.9790.9790.9811.031 *1.0280.9850.9820.9850.9861.034 *1.031
90.9790.9790.9810.9841.03 *1.0270.9830.9830.9860.991.032 *1.0310.9820.9790.9830.9841.03 *1.0270.9850.9820.9860.9891.034 *1.031
120.980.9790.9820.9871.03 *1.0270.9850.9840.9860.9911.033 *1.0310.9820.980.9860.9861.03 *1.0270.9860.9840.9880.991.034 *1.031
withmeds1010.9810.9830.9830.9871.03 *1.0270.9850.9870.9880.9911.034 *1.0310.9830.9830.9860.9881.031 *1.0270.9860.9850.9890.991.034 *1.031
30.0470.1180.0750.771 *0.0860.2170.01538.4620.05025.8045250.0 *0.00.1460.10.00.691 *0.1530.1530.00.8740.01.428 *0.00.0
60.4120.4160.3110.66 *0.4640.3291250.0681538.5260.0365025.726 *4200.120.0430.2120.2390.1270.68 *0.2230.1520.6730.9680.3841.23 *0.7070.033
90.5480.6630.6290.6410.721 *0.4252222.5611538.7920.2885518.314 *5250.1142000.2290.2030.3920.4860.695 *0.2130.2130.471.303 *0.7331.0990.4930.184
120.7940.807 *0.7370.5450.7940.5916316.189 *1538.950.6062365.3716176.9334667.2730.5390.5690.6310.699 *0.5660.5780.8071.578 *0.5981.1360.7390.448
10010.88 *0.7060.6280.5950.8610.6783750.4451539.2650.5932365.3226300.474 *0.6280.5070.793 *0.710.6420.5280.5031.1861.635 *1.0431.0191.0650.72
30.0330.0620.1060.486 *0.0670.00.00.0750.1390.532 *0.00.00.00.1520.0160.528 *0.00.00.00.0390.060.526 *0.00.0
60.1870.0890.2330.498 *0.2090.1220.2160.0750.350.541 *0.0660.1970.2170.1870.0160.561 *0.1070.0980.2330.1010.220.531 *0.20.099
90.3210.2280.4450.511 *0.3480.1120.2980.1240.4470.532 *0.1910.1970.2180.2750.1860.525 *0.1540.0930.3630.2350.3420.524 *0.3480.099
120.3160.2540.578 *0.5190.5370.4260.3690.2320.4860.517 *0.4380.4970.5020.4350.3810.565 *0.4810.5090.4390.2780.4280.547 *0.4440.411
100010.3540.3570.618 *0.5380.5980.5790.420.4140.5310.5770.5380.598 *0.5610.5560.470.5910.625 *0.5410.4930.4360.5060.5270.559 *0.461
30.00.0350.00.43 *0.00.00.0840.00.0390.458 *0.00.00.00.00.0950.508 *0.00.00.00.0460.2250.458 *0.00.0
60.1810.0810.2380.472 *0.2470.270.2170.2040.1880.507 *0.060.1720.2420.1190.160.514 *0.2990.2250.1060.1980.2150.476 *0.1420.137
90.3380.260.3880.457 *0.3750.2590.4210.3630.2480.482 *0.1930.2030.2850.2910.330.517 *0.3490.2160.3910.3640.2830.462 *0.2590.137
120.5640.2880.5310.5030.681 *0.5870.4820.4050.4980.5040.460.555 *0.5080.4040.573 *0.5170.470.4570.5270.5160.3440.4980.537 *0.442
sklearn_smal -red1010.520.4360.6050.4960.63 *0.6070.5440.450.646 *0.5270.5370.610.5620.4660.647 *0.5620.5670.510.6120.6960.4160.5340.705 *0.516
30.8 *0.7840.790.7430.7990.7030.770.80.8620.8460.877 *0.6420.3820.689 *0.5980.5610.3820.4220.5740.883 *0.7370.7810.5750.495
60.7630.8170.931 *0.8170.7730.6820.8850.8970.8940.933 *0.9070.6820.6120.8240.9340.942 *0.6120.6240.760.9220.9380.961 *0.760.728
90.8650.8360.962 *0.8640.9350.7960.9220.9410.988 *0.9540.9370.7880.8840.9760.9610.999 *0.8840.8430.7990.9051.013 *0.9470.8510.777
120.8980.8690.964 *0.9420.9270.9090.9370.9390.9360.9590.968 *0.9280.8841.025 *0.951.0010.8840.8950.8630.9550.9961.016 *0.9190.947
10010.9320.8710.9780.9850.99 *0.9150.950.995 *0.930.970.9840.9390.9320.9650.9751.047 *0.9010.9560.9680.9651.0231.032 *0.9940.94
30.747 *0.70.6750.7090.7040.6350.6980.6730.7250.729 *0.70.650.6280.5520.724 *0.6030.6280.5310.7220.6540.74 *0.7020.7210.618
60.84 *0.7690.84 *0.7750.7840.6030.8130.750.8150.8420.862 *0.6430.8030.7670.826 *0.810.8220.5890.7610.8020.861 *0.8060.7630.609
90.8750.8710.928 *0.9040.8890.7310.9170.8360.8930.9170.933 *0.7570.8480.885 *0.8820.8770.8640.7220.8450.8710.897 *0.8770.8440.744
120.9170.954 *0.940.954 *0.9180.9390.9540.9270.9040.950.962 *0.9430.8830.9180.9340.936 *0.8860.9310.8810.964 *0.9340.9260.8780.933
100010.9540.990.9850.9840.9581.021 *0.9740.9690.9390.996 *0.9780.9690.9170.9450.9660.9610.9210.967 *0.940.979 *0.9780.9560.9430.978
30.5650.5960.694 *0.6630.5650.5450.6790.5990.696 *0.6890.6720.5560.6590.6660.6760.701 *0.6590.5750.630.6210.722 *0.570.630.569
60.7390.770.7750.792 *0.7390.5530.7140.7740.7940.8 *0.7060.580.7880.7560.819 *0.7720.7880.5740.7890.7280.802 *0.7530.7830.575
90.8690.8540.872 *0.8660.8490.7260.860.850.8470.8450.866 *0.7270.8560.8490.8610.8250.866 *0.720.878 *0.8390.878 *0.8120.8710.728
120.9120.934 *0.9280.9330.8950.9160.9270.9060.8760.9070.928 *0.9140.8710.933 *0.8890.90.8740.9160.8910.91 *0.9040.8730.8790.91 *
sklearn_large1010.9540.9630.9690.9560.9570.995 *0.9660.940.9460.9390.9650.994 *0.9190.9560.9720.9470.9240.993 *0.9340.9430.9490.9260.9280.991 *
30.8570.909 *0.8810.870.8560.8270.828 *0.8020.7740.8130.7840.8110.4960.7130.3880.951 *0.4960.4860.5610.6610.8350.7290.888 *0.825
60.8610.9120.8450.8880.8730.943 *0.8140.8590.8160.872 *0.7470.8540.881.138 *0.8871.0430.880.8930.911.06 *0.9820.8490.8890.922
90.9080.8860.8370.8980.9470.97 *0.8540.8430.8190.8060.820.919 *1.2941.1611.111.1331.2971.322 *0.9081.033 *1.0060.9510.9140.922
120.8970.920.9450.949 *0.9070.9030.8940.7960.8850.820.8170.913 *1.2921.1551.2311.0531.2921.307 *0.9011.0341.058 *0.9330.9070.909
10010.8890.9480.974 *0.9410.9150.9540.9070.7980.8730.8370.8260.963 *1.3821.121.231.11.3821.383 *0.9371.085 *1.0110.940.931.014
30.7380.808 *0.780.7480.770.7220.6780.6920.753 *0.6840.7080.6820.6810.510.6210.704 *0.6810.6210.7070.650.756 *0.7480.7070.675
60.8270.8630.8320.7810.7960.867 *0.7720.7540.823 *0.7160.8040.8120.7480.7660.84 *0.7950.7490.8220.7460.8340.8010.8040.7440.851 *
90.8640.890.8880.8650.8150.971 *0.8180.7720.8430.7680.8420.934 *0.8630.8350.8740.9120.8570.929 *0.8490.8550.820.8790.8490.942 *
120.9280.8810.8790.8950.9440.963 *0.8660.850.8910.8440.8910.92 *0.8820.8730.9140.976 *0.880.920.9070.860.8520.9120.8940.921 *
100010.9590.8870.9330.9360.9641.072 *0.8950.8520.9080.8850.9011.013 *0.940.9110.9130.9880.9311.025 *0.9340.8850.8780.9420.9231.023 *
30.6090.6820.660.688 *0.6030.6240.6680.679 *0.6490.6420.6410.5740.6280.6170.5980.5970.642 *0.4860.6130.659 *0.6170.6460.6260.568
60.7020.6630.7370.6820.6890.741 *0.7370.670.765 *0.6840.7620.7460.6410.6710.7380.6830.6450.74 *0.6470.6520.748 *0.6680.6570.738
90.7740.6890.7840.7590.7680.847 *0.7880.6870.8120.7530.8010.856 *0.6680.6790.8270.810.6830.843 *0.6880.6710.8170.7640.6920.831 *
120.80.7440.8040.7930.7710.845 *0.8390.7490.8190.7730.882 *0.8560.740.7690.848 *0.8350.7540.8410.7330.760.852 *0.8240.7340.832

References

  1. Jong, K.; Mary, J.; Cornuéjols, A.; Marchiori, E.; Sebag, M. Ensemble feature ranking. In PKDD Lecture Notes in Computer Science, Proceedings of the Knowledge Discovery in Databases: PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, 20–24 September 2004; Boulicaut, J.F., Esposito, F., Giannotti, F., Pedreschi, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3202, pp. 267–278. [Google Scholar]
  2. Petković, M.; Džeroski, S.; Kocev, D. Feature ranking for semi-supervised learning. Mach. Learn. 2022, 112, 4379–4408. [Google Scholar] [CrossRef]
  3. Halotel, J.; Demyanov, V.; Gardiner, A. Value of Geologically Derived Features in Machine Learning Facies Classification. Math. Geosci. 2020, 52, 5–29. [Google Scholar] [CrossRef]
  4. Miah, M.I.; Zendehboudi, S.; Ahmed, S. Log data-driven model and feature ranking for water saturation prediction using machine learning approach. J. Pet. Sci. Eng. 2020, 194, 107921. [Google Scholar] [CrossRef]
  5. La Fata, C.M.; Giallanza, A.; Micale, R.; La Scalia, G. Ranking of occupational health and safety risks by a multi-criteria perspective: Inclusion of human factors and application of VIKOR. Saf. Sci. 2021, 138, 105234. [Google Scholar] [CrossRef]
  6. Ak, M.F.; Yucesan, M.; Gul, M. Occupational health, safety and environmental risk assessment in textile production industry through a Bayesian BWM-VIKOR approach. Stoch. Environ. Res. Risk Assess. 2022, 36, 629–642. [Google Scholar] [CrossRef]
  7. Abdulla, A.; Baryannis, G.; Badi, I. Weighting the Key Features Affecting Supplier Selection using Machine Learning Techniques. In Proceedings of the 7th International Conference on Transport and Logistics, Niš, Serbia, 6 December 2019. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Zhang, H.; Zhang, B. An Effective Ensemble Automatic Feature Selection Method for Network Intrusion Detection. Information 2022, 13, 314. [Google Scholar] [CrossRef]
  9. Megantara, A.A.; Ahmad, T. Feature Importance Ranking for Increasing Performance of Intrusion Detection System. In Proceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE), Yogyakarta, Indonesia, 15–16 September 2020. [Google Scholar]
  10. Masmoudi, S.; Elghazel, H.; Taieb, D.; Yazar, O.; Kallel, A. A machine-learning framework for predicting multiple air pollutants’ concentrations via multi-target regression and feature selection. Sci. Total Environ. 2020, 715, 136991. [Google Scholar] [CrossRef] [PubMed]
  11. Fang, L.; Jin, J.; Segers, A.; Lin, H.X.; Pang, M.; Xiao, C.; Deng, T.; Liao, H. Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China. Geosci. Model Dev. 2022, 15, 7791–7807. [Google Scholar] [CrossRef]
  12. Vatian, A.S.; Golubev, A.A.; Gusarova, N.F.; Dobrenko, N.V. Intelligent Clinical Decision Support for Small Patient Datasets. Sci. Tech. J. Inf. Technol. Mech. Opt. 2023, 23, 595–607. [Google Scholar] [CrossRef]
  13. Remeseiro, B.; Bolon-Canedo, V. A review of feature selection methods in medical applications. Comput. Biol. Med. 2019, 112, 103375. [Google Scholar] [CrossRef] [PubMed]
  14. Saqlain, S.M.; Sher, M.; Shah, F.A.; Khan, I.; Ashraf, M.U.; Awais, M.; Ghani, A. Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl. Inf. Syst. 2019, 58, 139–167. [Google Scholar] [CrossRef]
  15. Wu, L.; Hu, Y.; Liu, X.; Zhang, X.; Chen, W.; Yu, A.S.L.; Kellum, J.A.; Waitman, L.R.; Liu, M. Feature Ranking in Predictive Models for Hospital-Acquired Acute Kidney Injury. Sci. Rep. 2018, 8, 17298. [Google Scholar] [CrossRef] [PubMed]
  16. Zhu, L.; Li, L.; Li, R.; Zhu, L. Model-Free Feature Screening for Ultrahigh Dimensional Data. J. Am. Stat. Assoc. 2011, 106, 1464–1475. [Google Scholar] [CrossRef] [PubMed]
  17. Li, K.; Wang, F. Deep Feature Screening: Feature Selection for Ultra High-Dimensional Data via Deep Neural Networks. Neurocomputing 2023, 538, 126186. [Google Scholar] [CrossRef]
  18. Mousavi, M.; Khalili, N. VSI: An Interpretable Bayesian Feature Ranking Method Based on Vendi Score. Knowl.-Based Syst. 2025, 311, 112973. [Google Scholar] [CrossRef]
  19. Radivojac, P.; Obradovic, Z.; Dunker, A.K.; Vucetic, S. Feature selection filters based on the permutation test. In Lecture Notes in Computer Science, Proceedings of the Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, 20–24 September 2004; Boulicaut, J.F., Esposito, F., Giannotti, F., Pedreschi, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  20. Gao, L.; Wu, W. Relevance assignation feature selection method based on mutual information for machine learning. Knowl.-Based Systems. 2020, 209, 106439. [Google Scholar] [CrossRef]
  21. Bhadra, T.; Mallik, S.; Hasan, N.; Zhao, Z. Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer. BMC Bioinform. 2022, 23, 153. [Google Scholar] [CrossRef] [PubMed]
  22. Barraza, N.; Moro, S.; Ferreyra, M.; de la Peña, A. Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study. J. Inf. Sci. 2019, 45, 53–67. [Google Scholar] [CrossRef]
  23. Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers, Inc.: San Francisco, CA, USA, 1993. [Google Scholar]
  24. Peng, H.; Long, F.; Ding, C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
  25. Reshef, D.N.; Reshef, Y.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.S. Detecting novel associations in large data sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef] [PubMed]
  26. Li, X.; Xu, S.; Yu, M.; Wang, K.; Tao, Y.; Zhou, Y.; Shi, J.; Zhou, M.; Wu, B.; Yang, Z.; et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J. Allergy Clin. Immunol. 2020, 146, 110–118. [Google Scholar] [CrossRef] [PubMed]
  27. Liu, X.; Xue, S.; Xu, J.; Ge, H.; Mao, Q.; Xu, X.-H.; Jiang, H.-D. Clinical characteristics and related risk factors of disease severity in 101 COVID-19 patients hospitalized in Wuhan, China. Acta Pharmacol. Sin. 2021, 43, 64–75. [Google Scholar] [CrossRef] [PubMed]
  28. Phelps, M.; Christensen, D.M.; Gerds, T.; Fosbøl, E.; Torp-Pedersen, C.; Schou, M.; Køber, L.; Kragholm, K.; Andersson, C.; Biering-Sørensen, T.; et al. Cardiovascular comorbidities as predictors for severe COVID-19 infection or death. Eur. Heart J. 2021, 7, 172–180. [Google Scholar] [CrossRef] [PubMed]
  29. Alam, Z.; Rahman, S.; Rahman, S. A Random Forest based predictor for medical data classification using feature ranking. Inform. Med. Unlocked 2019, 15, 100180. [Google Scholar] [CrossRef]
  30. Joloudari, J.H.; Joloudari, E.H.; Saadatfar, H.; Ghasemigol, M.; Razavi, S.M.; Mosavi, A.; Nabipour, N.; Shamshirband, S.; Nadai, L. Coronary Artery Disease Diagnosis; Ranking the Significant Features Using Random Trees Model. Int. J. Environ. Res. Public Health 2020, 17, 731. [Google Scholar] [CrossRef] [PubMed]
  31. Muthukrishnan, R.; Rohini, R. LASSO: A feature selection technique in predictive modeling for machine learning. In Proceedings of the 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, 24 October 2016; pp. 18–20. [Google Scholar] [CrossRef]
  32. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
  33. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  34. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  35. Rice, J. Mathematical Statistics and Data Analysis; Duxbury Press: Pacific Grove, CA, USA, 1995; ISBN 0-534-209343. [Google Scholar]
  36. Schulte, R.V.; Prinsen, E.C.; Hermen, H.J.; Buurke, J.H. Genetic Algorithm for Feature Selection in Lower Limb Pattern Recognition. Sec. Biomed. Robot. 2021, 8, 710806. [Google Scholar] [CrossRef] [PubMed]
  37. Ali, W.; Saeed, F. Hybrid Filter and Genetic Algorithm-Based Feature Selection for Improving Cancer Classification in High-Dimensional Microarray Data. Processes 2023, 11, 562. [Google Scholar] [CrossRef]
  38. Abdollahi, J.; Nouri-Moghaddam, B. Feature selection for medical diagnosis: Evaluation for using a hybrid stacked-genetic approach in the diagnosis of heart disease. arXiv 2021. [Google Scholar] [CrossRef]
  39. Xie, H.; Zhang, L.; Lim, C.P.; Yu, Y.; Liu, H. Feature Selection Using Enhanced Particle Swarm Optimisation for Classification Models. Sensors 2021, 21, 1816. [Google Scholar] [CrossRef] [PubMed]
  40. Shen, X.; Yang, F.; Yang, P.; Yang, M.; Xu, L.; Zhuo, J.; Wang, J.; Lu, D.; Liu, Z.; Zheng, S.; et al. A Contrast-Enhanced Computed Tomography Based Radiomics Approach for Preoperative Differentiation of Pancreatic Cystic Neoplasm Subtypes: A Feasibility Study. Front. Oncol. 2020, 10, 248. [Google Scholar] [CrossRef] [PubMed]
  41. Awad, M.; Fraihat, S. Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems. J. Sens. Actuator Netw. 2023, 12, 67. [Google Scholar] [CrossRef]
  42. Nizami, I.F.; Majid, M.; Khurshid, K. Efficient feature selection for Blind Image Quality Assessment based on natural scene statistics. In Proceedings of the 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 10–14 January 2017; pp. 318–322. [Google Scholar]
  43. Alelyani, S. Stable bagging feature selection on medical data. J. Big Data 2021, 8, 11. [Google Scholar] [CrossRef]
  44. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017. [Google Scholar] [CrossRef]
  45. Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
  46. Gramegna, A.; Giudici, P. Shapley Feature Selection. FinTech 2022, 1, 72–80. [Google Scholar] [CrossRef]
  47. Verhaeghe, J.; Van Der Donckt, J.; Ongenae, F.; Van Hoeck, S. Powershap: A Power-full Shapley Feature Selection Method. In Lecture Notes in Computer Science, Proceedings of the Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2022, Grenoble, France, 19–23 September 2022; Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
  48. Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S.A. Problems with Shapley-value-based explanations as feature importance measures. arXiv 2020, arXiv:2002.11097v2. [Google Scholar]
  49. Yuan, H.; Liu, M.; Kang, L.; Miao, C.; Wu, Y. An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models. arXiv 2022, arXiv:2204.11351v1. [Google Scholar]
  50. Jenul, A.; Schrunner, S.; Pilz, J.; Tomic, O. A user-guided Bayesian framework for ensemble feature selection in life science applications (UBayFS). Mach. Learn. 2022, 111, 3897–3923. [Google Scholar] [CrossRef]
  51. Jreich, R.; Hatte, C.; Parent, E. Review of Bayesian selection methods for categorical predictors using JAGS. J. Appl. Stat. 2021, 49, 2370–2388. [Google Scholar] [CrossRef] [PubMed]
  52. Bartonicek, A.; Wickham, S.R.; Pat, N.; Conner, T.S. The value of Bayesian predictive projection for variable selection: An example of selecting lifestyle predictors of young adult well-being. BMC Public Health 2021, 21, 695. [Google Scholar] [CrossRef] [PubMed]
  53. Boulet, S.; Ursino, M.; Thall, P.; Jannot, A.-S.; Zohar, S. Bayesian variable selection based on clinical relevance weights in small sample studies—Application to colon cancer. Stat. Med. 2019, 38, 1–20. [Google Scholar] [CrossRef] [PubMed]
  54. Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer: New York, NY, USA, 2010; 458p, ISBN -978-1-4419-2322-6. [Google Scholar]
  55. Bell, D.A.; Wang, H. A Formalism for Relevance and Its Application in Feature Subset Selection. Mach. Learn. 2000, 41, 175–195. [Google Scholar] [CrossRef]
  56. Cover, T.M.; Thomas, J.A. Elements of information theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006; 774p. [Google Scholar]
  57. van der Vaart, A.W. 10.2 Bernstein–von Mises Theorem. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998; ISBN 0-521-49603-9. [Google Scholar]
  58. Hendeby, G.; Gustafsson, F. On Nonlinear Transformations of Gaussian Distributions. January 2003. Available online: https://users.isy.liu.se/en/rt/fredrik/reports/07SSPut.pdf (accessed on 10 July 2025).
  59. Guyon, I. Design of Experiments for the NIPS 2003 Variable Selection Benchmark. 2003. Available online: https://archive.ics.uci.edu/ml/machine-learning-databases/madelon/Dataset.pdf (accessed on 10 July 2025).
  60. Zoabi, Y.; Deri-Rozov, S.; Shomron, N. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. Npj Digit. Med. 2021, 4, 3. [Google Scholar] [CrossRef] [PubMed]
  61. COVID-19. COVID-19 Patient’s Symptoms, Status, and Medical History. 2022. Available online: https://www.kaggle.com/datasets/meirnizri/covid19-dataset (accessed on 10 July 2025).
  62. Lapp, D. Heart Disease Dataset. Available online: https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset (accessed on 10 July 2025).
  63. Fedesoriano. Heart Failure Prediction Dataset. Available online: https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction (accessed on 10 July 2025).
  64. UCI Machine Learning and 1 Collaborator. Red Wine Quality. Available online: https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 (accessed on 10 July 2025).
  65. Overschie, J.G.S.; Alsahaf, A.; Azzopardi, G. Fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms. J. Open Source Softw. 2022, 7, 4611. [Google Scholar] [CrossRef]
Figure 1. The scheme of the proposed method.
Figure 1. The scheme of the proposed method.
Entropy 27 00773 g001
Figure 2. An example of the computation of a Bayesianized tree.
Figure 2. An example of the computation of a Bayesianized tree.
Entropy 27 00773 g002
Figure 3. Results of models on the datasets: (a) first synthetic, and (b) second synthetic.
Figure 3. Results of models on the datasets: (a) first synthetic, and (b) second synthetic.
Entropy 27 00773 g003
Figure 4. Self-consistency values on the datasets: (a) first synthetic, (b) second synthetic, (c) first public, and (d) second public.
Figure 4. Self-consistency values on the datasets: (a) first synthetic, (b) second synthetic, (c) first public, and (d) second public.
Entropy 27 00773 g004
Figure 5. Monotonicity values on the datasets: (a) first synthetic, (b) second synthetic, (c) first public, and (d) second public.
Figure 5. Monotonicity values on the datasets: (a) first synthetic, (b) second synthetic, (c) first public, and (d) second public.
Entropy 27 00773 g005aEntropy 27 00773 g005b
Figure 6. Consistency of the proposed method with the others on the datasets: (a) first public, and (b) second public.
Figure 6. Consistency of the proposed method with the others on the datasets: (a) first public, and (b) second public.
Entropy 27 00773 g006
Table 1. Summary of the properties of datasets.
Table 1. Summary of the properties of datasets.
Dataset NameNumber of FeaturesNumber of Examples
sklearn_small5030,000
sklearn_large30030,000
Covid1651,831
Withmeds211,048,576
Heart111026
Heart113919
Winequality-red111600
Table 2. The results of the feature ranking methods with tree ensemble models.
Table 2. The results of the feature ranking methods with tree ensemble models.
XGBoostRandom Forest
DatasetData SizeNum FeaturesTop 1Top 2Top 3Top 1Top 2Top 3
Heart101ourembedshapourvsimrmr
6shapourembedourvsimrmr
12shappermtestourembedshapour
10001ourshapmrmrourmrmrshap
6permtestourmrmrpermtestembedour
12embedpermtestshapembedpermtestshap
Heart1101ourmrmrshapourmrmrembed
6shapourembedshapembedour
12embedourshapourvsishap
10001ourmrmrothersourpermtestshap
6mrmrshapembedvsiembedour
12mrmrvsishapshapmrmrvsi
Table 3. The results of the feature ranking methods with linear models.
Table 3. The results of the feature ranking methods with linear models.
RidgeElasticNet
DatasetData SizeNum FeaturesTop 1Top 2Top 3Top 1Top 2Top 3
Heart101ourvsishapshapourvsi
6ourshapembedourembedmrmr
12shapembedourvsiourpermtest
10001ourmrmrvsiourmrmrembed
6ourpermtestmrmrourpermtestmrmr
12ourpermtestvsiourvsipermtest
Heart1101ourembedshapourmrmrembed
6ourvsimrmrembedourshap
12permtestvsimrmrourembedmrmr
10001ourmrmrpermtestourmrmrvsi
6vsiourmrmrembedvsimrmr
12mrmrourembedvsiourshap
Table 4. The results of the feature ranking methods with ensemble models on the dataset “sklearn_small”.
Table 4. The results of the feature ranking methods with ensemble models on the dataset “sklearn_small”.
XGBoostRandom Forest
DatasetData SizeNum FeaturesTop 1Top 2Top 3Top 1Top 2Top 3
sklearn_small101permtestvsiourembedshapmrmr
6mrmrvsipermtestembedmrmrpermtest
12mrmrembedshapembedvsimrmr
10001permtestembedvsimrmrpermtestembed
6mrmrpermtestvsimrmrvsiembed
12ourvsishapvsimrmrour
Table 5. Comparison of top 15 features selected on samples of different sizes from the same dataset.
Table 5. Comparison of top 15 features selected on samples of different sizes from the same dataset.
50 Samples1000 Samples10,000 Samples
1PQ in lead IIP in lead IILung surfactant
2Lung surfactantLung surfactantP in lead II
3P in lead IIPQ in lead IIPQ in lead II
4Signs of right-sided heart overloadSigns of right-sided heart overloadArrhythmia by rate
5AnticoagulantsNonspecific intraventricular blockNonspecific intraventricular block
6QTc lengtheningQTc lengtheningQTc lengthening
7Arrhythmia by rateArrhythmia by rateLopinavir/ritonavir
8ST segment ischemic depressionLopinavir/ritonavirSigns of right-sided heart overload
9Angle alpha xAnticoagulantsAnticoagulants
10Lopinavir/ritonavirST segment ischemic depressionEnlargement of the left atrium
11Enlargement of the left atriumEnlargement of the left atriumST segment ischemic depression
12Nonspecific intraventricular blockAngle alpha xAngle alpha x
13Atrioventricular block (degree)Atrioventricular block (degree)Atrioventricular block (degree)
14Chioroquine/hydroxychloroquineBradycardia (1), tachycardia (2), …Bradycardia (1), tachycardia (2), …
15TocilizumabTocilizumabTocilizumab
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vatian, A.; Gusarova, N.; Tomilov, I. Feature Ranking on Small Samples: A Bayes-Based Approach. Entropy 2025, 27, 773. https://doi.org/10.3390/e27080773

AMA Style

Vatian A, Gusarova N, Tomilov I. Feature Ranking on Small Samples: A Bayes-Based Approach. Entropy. 2025; 27(8):773. https://doi.org/10.3390/e27080773

Chicago/Turabian Style

Vatian, Aleksandra, Natalia Gusarova, and Ivan Tomilov. 2025. "Feature Ranking on Small Samples: A Bayes-Based Approach" Entropy 27, no. 8: 773. https://doi.org/10.3390/e27080773

APA Style

Vatian, A., Gusarova, N., & Tomilov, I. (2025). Feature Ranking on Small Samples: A Bayes-Based Approach. Entropy, 27(8), 773. https://doi.org/10.3390/e27080773

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop