Next Article in Journal
Variable Filtered-Waveform Variational Mode Decomposition and Its Application in Rolling Bearing Fault Feature Extraction
Previous Article in Journal
Social Image Security with Encryption and Watermarking in Hybrid Domains
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploiting Data Distribution: A Multi-Ranking Approach

by
Beata Zielosko
*,
Kamil Jabloński
and
Anton Dmytrenko
Institute of Computer Science, University of Silesia in Katowice, Bȩdzińska 39, 41-200 Sosnowiec, Poland
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(3), 278; https://doi.org/10.3390/e27030278
Submission received: 1 December 2024 / Revised: 2 March 2025 / Accepted: 3 March 2025 / Published: 7 March 2025
(This article belongs to the Section Signal and Data Analysis)

Abstract

:
Data heterogeneity is the result of increasing data volumes, technological advances, and growing business requirements in the IT environment. It means that data comes from different sources, may be dispersed in terms of location, and may be stored in different structures and formats. As a result, the management of distributed data requires special integration and analysis techniques to ensure coherent processing and a global view. Distributed learning systems often use entropy-based measures to assess the quality of local data and its impact on the global model. One important aspect of data processing is feature selection. This paper proposes a research methodology for multi-level attribute ranking construction for distributed data. The research was conducted on a publicly available dataset from the UCI Machine Learning Repository. In order to disperse the data, a table division into subtables was applied using reducts, which is a very well-known method from the rough sets theory. So-called local rankings were constructed for local data sources using an approach based on machine learning models, i.e., the greedy algorithm for the induction of decision rules. Two types of classifiers relating to explicit and implicit knowledge representation, i.e., gradient boosting and neural networks, were used to verify the research methodology. Extensive experiments, comparisons, and analysis of the obtained results show the merit of the proposed approach.

1. Introduction

Technological advances and the global nature of the activities of many companies and institutions necessitate data processing in a distributed form. This applies not only to the location but also to the nature of the data, which may cover different parts of the company’s operations. Moreover, modern technologies generate huge amounts of data that are difficult to store and process in one place. Data are generated in multiple locations simultaneously and can take different forms depending on the structures in which it is stored.
A domain that plays an important role in this context is distributed data mining (DDM). It is the process of discovering knowledge or patterns from data stored in different locations, using distributed processing techniques [1,2]. The key features regarding DDM are (i) distributed data sources, which may be related to physical, organizational, or legal constraints; (ii) local processing, i.e., data are processed locally in each of the distributed locations to reduce the need to transmit large datasets over the network; (iii) consolidation of results where the results of local analyses are combined to produce a global picture; and (iv) global analysis, i.e., based on the combined findings, the system performs a final analysis for discovering global patterns or relationships. Examples of applications include distributed e-commerce systems analysing user preferences from different regions, for instance, Netflix’s platform that processes user data from different localisations to deliver personalised services [3,4].
Information theory is also used within DDM. Data that are dispersed are often characterised by different statistical distributions. Information theory, through measures such as entropy, allow us to analyse the diversity of these data and determine how much information is contained in the various data sources [5].
Feature selection plays an important role in the data mining domain, especially in the stages of data preprocessing and analysis. The main aim of this process is to identify, from the available set of features, those that are most relevant and have the greatest impact on the decisions to be taken [6]. The main goals of feature selection include enhancing the predictive performance of models, creating faster and more cost-efficient predictors, and offering deeper insight into the underlying process that generated the data [7,8]. It is important due to the nature of distributed systems and large datasets. Attribute selection methods help to reduce the number of analysed features, simplifying models and speeding up processing [9]. They also reduce the amount of data transferred between nodes, decreasing communication costs. When models are built in different locations, limiting to the most relevant attributes improves the consistency of results between nodes and facilitates the consolidation of results from different nodes [10,11,12]. In the framework of information theory and distributed data, measures such as information gain or mutual information should be mentioned to select locally relevant attributes in local sources and globally relevant features in the whole system.
Feature selection can be approached in two main ways. The first group of methods involves ranking features based on a specific criterion and selecting the top k features. The second one focuses on identifying the smallest subset of features that maintains the performance of the learning model. In the case of ranking methods, each attribute is assigned a weight to reflect its relevance; then, attributes are sorted, usually in order from most to least relevant, and the top-ranked attributes are used in the analysis. Feature ranking methods employ various metrics, such as similarity scores, statistical measures, information-theoretic approaches, or functions derived from the outputs of classifiers [13,14]. These techniques aim to prioritize features based on their relevance or contribution to a given model, helping to improve interpretability, reduce dimensionality, and enhance the overall performance of the learning algorithm [15,16,17].
The motivation for the proposed approach is the need to discover knowledge and global patterns from distributed data. Processing and analysing such sets is much more difficult than in the case of centralized data. A popular field that is being developed in this context is federated learning, which aims to create a global classification model taking into account the parameters and results of local models. It can be said that federated learning enables collaborative training of machine learning models by sharing models’ updates with a central server for aggregation [18]. This work does not use the federated learning technique to construct a global classifier; however, the goal is to create a global ranking of attributes based on local data sources and obtain from them local rankings. Weights of attributes and their ordering at the intermediate and global levels can be considered a source of knowledge about the most important features in a distributed data environment. Examples of applications include systems that analyse user preferences for a specific problem at the level of individual regions and then at the country level. Another application is systems aimed at supporting resource management, which allow for more effective management of data, processes, and allocation of these resources in distributed environments. For example, attributes with a higher position in the ranking can be treated as corresponding to the key tasks for the operation of the system; therefore, resources such as computing power, memory, and network bandwidth are allocated to them first.
The main contribution of this paper is a research methodology for creating a global ranking of attributes in a distributed environment. Taking into account a knowledge representation perspective, the proposed rankings are created using decision rules. In [19] weighting of attributes based on the greedy algorithm was considered, but only at the local level, and it was applied in the stylometry domain. In this work, the application of such ranking in the hierarchical approach for global ranking construction was proposed. The distributed data environment was obtained using reducts construction as a popular feature selection method in the framework of rough set theory [20,21]. In the proposed methodology, reducts were used as a method to obtain different (in terms of attributes) subsets of the dataset, which are considered as local data sources. Experiments were performed on the dataset related to predicting students’ dropout and academic success issues from the UCI ML Repository [22]. The analysis of the classification accuracy and comparison at the different stages of the global ranking construction is included. Informativeness of features, which consists of attribute rankings, was also studied. In this direction of our research, it is the first time that the verification of local rankings derived from the greedy algorithm for decision rule induction (and global ranking) has been carried out using classification models that are not based on decision rules. This process did not use decision rules filtering but sequentially constructed subtables based on the studied attributes from the ranking. The contribution contains a proposed methodology for the development of a global attribute ranking using the greedy algorithm for the induction of decision rules and the verification of this approach based on classifiers related to decision trees and neural networks.
The methodology was verified through extensive experiments. The dataset was divided into k sets, k = 1 , , 9 , each of which included i subtables, i = 2 , , 10 . The subtables were obtained by the selection of attributes driven by induced reducts from the entire dataset. For each subtable in the set k, local rankings were obtained by using the greedy algorithm for decision rules induction [23]. Then, for each set k, a strategy for intermediate rankings construction was proposed, taking into account the properties of the greedy algorithm and the characteristics of the retrieved local rankings. Then, global weights of attributes were obtained. All rankings at the intermediate and global levels were verified from the point-of-view of classification, i.e., the gradient boosting approach [24] and neural network in the form of MLP (multi-layer perceptron) [25]. The entropy of features, which consists of rankings, was also calculated. The constructed classifiers were controlled by the attributes’ ranking positions based on a backward elimination approach. The results of the experiments were validated against the test part of the entire dataset and analysed and compared for intermediate and global levels of ranking construction.
The paper consists of six sections. Section 2 presents the research background related to feature selection and employed methods. Section 3 describes the framework of the proposed research methodology. Experimental results are presented in Section 4. Section 5 includes a comparison and analysis of findings. Conclusions and future research directions are provided in Section 6.

2. Background

In this section, information related to feature selection, ranking construction, and selected classifiers is presented.

2.1. Feature Selection

The aim of feature selection is to remove irrelevant or redundant attributes [26,27] from the set of available features. Selecting relevant attributes allows the model to focus on the most influential features, leading to improved model efficiency and accuracy. Removing irrelevant variables reduces the risk of “noise” in the data, resulting in more accurate predictions. With large datasets, processing all features can be costly in terms of time and computing resources. Feature selection helps to reduce the number of variables, which reduces model training time and computing power requirements. Models learning on datasets with many features are more prone to overfitting, i.e., adjusting the training data in too much detail. By reducing the number of features, the model is generalized better on new data, improving its predictive ability. The number of features also plays an important role from the point of view of knowledge representation [28,29,30]. Models with fewer features are more comprehensible, which is particularly important in fields such as medicine. A smaller number of features makes it easier to interpret and understand how a model works.
This approach coincides with information theory, which provides mathematical tools for assessing which features in the data are most relevant for predicting the target variable [31,32]. Examples include mutual information (MI), entropy, information gain (IG), and minimal redundancy maximal relevance (mRMR) methods. MI measures how much information about one variable (e.g., target variable Y) is provided by another variable (e.g., feature X) and identifies the attributes most related to predicting the target variable. Entropy measures uncertainty or “heterogeneity” in the data and features with high entropy contain more potential information. Information gain measures how much uncertainty (entropy) of the target variable Y has been removed by including a feature X. It is widely used in decision tree construction algorithms, such as ID3, C4.5. Methods related to mRMR balances maximising the relevance of features relative to the target variable and minimising redundancy between attributes.
Feature selection methods can be divided into three main categories: filter, wrapper, and embedded [33]. Filtering methods assess the importance of features independently of the machine learning model. Variables are selected or discarded before the model training process, which makes these methods considered fast and scalable. Attributes can be selected based on statistical properties, e.g., the Pearson correlation coefficient. Filtering methods often evaluate features individually without taking their interactions into account. This means that features that may be relevant in combination with others may be omitted. As filtering methods do not directly consider the impact of features on the final accuracy of the model, they may not always lead to an optimal set of features in terms of the model.
Wrapper methods select a subset of features based on their impact on the accuracy of a particular predictive model. The selection process is iterative and involves training the model with different subsets of features to assess which features optimize model performance. Feature selection in wrapper methods is closely linked to the model, meaning that features selected in this way maximize the performance of a given algorithm. This leads to better performance than filter methods, especially for complex data with non-linear relationships. However, there is a risk that the set of features may be overfitted to the training set, which will worsen the model’s ability to generalize. If a different algorithm is chosen, feature selection may need to be performed again. Wrapper methods are used when model accuracy is a priority and computational resources are not a significant constraint.
Embedded methods are feature selection algorithms that are integrated into the model learning process. In contrast to filter techniques, embedded methods perform feature selection directly while training the model. These methods are popular in situations where computational efficiency is important and models need to be optimized for accuracy while maintaining simplicity. Examples are decision trees and their extensions, e.g., Random Forest, Gradient Boosting, or SVM methods, with appropriate modifications.

2.2. Ranking Construction

Attribute rankings are methods of ordering features (assigning weights) according to their importance in a given context, for example, machine learning models. The aim is to identify which attributes have the greatest impact on the outcome or are most important in predicting a given variable [14].
There are many approaches to assessing and ranking attributes [6,34,35,36], which can be divided into several main categories: (i) statistically based methods that use correlation, variance, and others measures to analyse the significance of features based on their distribution; (ii) machine learning-based methods that use, e.g., decision trees as Random Forest; (iii) feature selection-based methods, e.g., Relief algorithm or approach based on reducts; and (iv) information theory-based methods that use, for example, entropy or information gain.
In this work, we will use a decision rules-based approach for weighting attributes at the local level, i.e., for each subtable in the set k. Local rankings are obtained based on the greedy algorithm, which is known for its application to the set cover problem [37]. The motivation for selecting this algorithm is the issue of knowledge representation in the intuitive form of decision rules. Based on previous studies [23], it has been proven that this algorithm allows for the construction of short decision rules and, by making certain assumptions on the class NP, this algorithm allows for obtaining results close to those obtained by the best polynomial approximate algorithms. Short decision rules are easy to understand and interpret. In addition, the length of the rules, i.e., the number of descriptors (attribute = value pairs) forming the premises of the rule, is an important indicator of the quality of the rule. Another popular measure of rule quality is support, which is the number of objects in the dataset for which the left and right sides of the rule are met. Rules with high support allow the discovery of relevant patterns from the data.
An important property of the greedy algorithm is that the attributes forming the rule are characterised by the high separability of objects from other decision classes. This is due to the nature of this algorithm, which, in each iteration during the process of rule construction, selects an attribute that separates a maximum number of rows with a different decision. This algorithm works sequentially for each row r o w of the dataset represented by table T. U ( T , r o w ) denotes a set of rows from T that are labeled with a class label different from class d attached to the considered r o w . The Algorithm 1 presents the pseudocode of the greedy algorithm for the construction of the decision rule for r o w of T.
Algorithm 1 Greedy algorithm for the construction of decision rules
  • Require: Dataset T with attributes a 1 , , a m , row r o w = ( b 1 , , b m ) of T labeled by d.
  • Ensure: Decision rule for r o w of T.
  • Q ;
  • while attributes from Q separate from r o w less than U ( T , r o w ) rows do
  •    select a i { a 1 , , a m } with minimal index i such that a i separates from r o w the
  •    maximal number of rows unseparated by attributes from Q
  •     Q Q { a i } .
  • end while
  • a i Q ( a i = b i ) d .
In the presented approach, local rankings were constructed taking into account: (i) the number of rules in which the given attribute exists, (ii) the number of rows separated by the attribute, and (iii) the maximum support of the rule including the considered variable.
The construction of rankings at higher levels takes into account the number of rankings in which the attribute appears. If these values for two or more attributes are the same, the highest weight of the attribute at the lower ranking level is taken into account.
In the paper, for rankings created at higher levels, i.e., the so-called intermediate and global levels, entropy was calculated. This measure, determined for a set of attributes, assesses how much information (or uncertainty) these features contain. A high value indicates that the attribute (or set of features) is more diverse and potentially more informative [38]. In the context of data analysis, the entropy of a set of features A = { a 1 , , a n } is defined by the formula:
H ( A ) = i = 1 n p ( a i ) log 2 p ( a i ) ,
where p ( a i ) = w i j = 1 n w j , and w i is a weight assigned to a given attribute. The maximum entropy H m a x for the n-elements set is log 2 n , and it is achieved assuming that all elements have equal probability 1 n . The small difference H m a x H ( A ) suggests that the features are highly informative.

2.3. Selected Classifiers

Two different types of classifiers were used to verify the proposed research procedure: gradient boosting and neural networks.
Gradient Boosting is one of the most popular and efficient machine learning techniques, particularly used in regression and classification tasks [39]. The method is based on the idea of iteratively building an ensemble of models in the form of decision trees in such a way that each successive model corrects the errors of its predecessor. Thus, gradient boosting achieves high accuracy and is widely used in the field of data analysis and artificial intelligence. The cost function in gradient boosting is a very important element that determines how the model learns from the data. Log-loss and cross-entropy are the most popular in classification due to their ability to handle probabilistic predictions and penalise errors in a way that is proportional to the confidence of the predictions [40]. Logarithmic Loss (Log-Loss) is the most commonly used cost function for binary and multi-class classification tasks [41]. The name of the “gradient boosting” method derives from the use of the gradient—the direction of greatest decrease in the cost function—for optimisation [42]. In the research performed, the method was implemented using the XGBoost (ang. eXtreme Gradient Boosting) library.
The second classifier used in the verification of the proposed methodology is based on implicit knowledge representation, i.e., the neural network in the form of multi-layered perceptron, or MLP [43]. It is a versatile supervised learning algorithm used for classification and regression tasks. It is constructed of multiple interconnected layers of artificial neurons, where each layer transforms the input data through non-linear activation functions, which can be selected by the operator manually. MLPs are trained by iteratively adjusting the weights between neurons to minimize a loss function, effectively learning the underlying patterns within the data for classification purposes. The loss function can be both preset and selected automatically based on the operator’s needs.

3. Framework of Multi-Ranking Construction

In this section, detailed descriptions of all steps of the proposed research methodology and performed tasks are presented.

3.1. Framework of Developed Methodology

The research conducted within the framework of the proposed approach includes the following steps:
  • Data preparation;
  • Induction of reducts;
  • Construction of sets of subtables and data scattering;
  • Local rankings construction based on the decision rules induced by the greedy algorithm for each set k, k = 1 , , 9 , with i subtables, i = 2 , , 10 .
  • Intermediate ranking construction, for each set k;
  • Global ranking construction;
  • Verification of the importance of attributes at the intermediate and global levels using a backward elimination approach driven by the attribute’s ranking position and using gradient boosting and multi-layer perceptron methods;
  • Analysis and comparison of obtained results.
The Figure 1 presents a general overview of the developed methodology applied to distributed data.

3.2. Data Description and Preparation

The Predict Students Dropout and Academic Success dataset [22] consists of anonymised data collected from a higher education institution, i.e., Polytechnic University of Portalegre, Portugal. It is designed to predict student dropout risk and academic performance based on various attributes of students. The dataset includes 37 features, including a class label, both categorical and numerical, representing various factors such as (i) demographics (for example, age, gender, nationality), (ii) academic information (for example, grades, GPA, education), and (iii) socio-economic factors (as scholarship holder, tuition fees up to date, parental education level, and others). There are three values for the class label: “Dropout”, “Enrolled”, and “Graduate”, which refer to the student’s status at the end of the normal term. The dataset consists of 4424 rows, and for the purpose of experiments and validation of the proposed approach, it was divided in the proportions 70% training part and 30% testing part. Table 1 presents information about attributes included in the dataset, i.e, column Id contains the code of the attribute (which will be used later), and column Attribute contains the name of the attribute.
For the five attributes in the set, a discretisation of their values was carried out, that is: a7—previous qualification (grade), a13—admission grade, a20—age at enrollment, a26—curricular units 1st sem (grade) and a32—curricular units 2nd sem (grade). For this purpose, the Fayad and Iranii algorithm [44] was used as a supervised discretisation method with default settings available in the WEKA software [45].

3.3. Reducts and Data Distribution

In rough set theory, reducts are a popular feature selection method belonging to the group of filter category algorithms. There are different types of reducts, different definitions depending on the adopted criteria, and different algorithms for reduct construction [23,26]. From a classification perspective, a reduct is a minimal subset of attributes that has the same power to distinguish objects with different class labels as the full set of attributes. A reduct can also be defined as a minimal set of attributes that preserves the degree of dependency on the full set of attributes. The problem of finding different versions of reducts in data is NP-hard [37], so heuristic approaches are often used.
In the paper, reducts were constructed by using the genetic algorithm implemented in the RSES (i.e., Rough Set Exploration System) [46]. This algorithm enables the construction of a sufficiently large number of reducts within a reasonable timeframe [47]. It utilizes a binary genetic algorithm, incorporating traditional binary operators such as mutation and crossover, along with the “roulette wheel” selection method. The computation process has been optimized using an additional structure known as the “discernibility matrix” [48]. This is a binary matrix where each column represents an attribute and each row corresponds to a pair of distinct objects. If an attribute has different values for a pair of objects, a value of 1 is placed at the intersection of the corresponding column and row. Finding a reduct involves identifying the smallest subset of columns that covers the entire matrix.
In the framework of the performed experiments, for the considered dataset, 100 reducts were induced by the genetic algorithm. From this set, 54 were selected for the construction of subtables and divided into k sets, k = 1 , , 9 . Table 2 presents the characteristics of the obtained reducts, i.e, their lengths per set k. All reducts generated by the genetic algorithm contain between 10 and 13 attributes, which represent a relatively large percentage reduction in the number of attributes concerning the full set of features.
Table 3 presents all attributes included in reducts from the set k = 9 . For all ten reducts, the first three attributes are the same, and then the number of different attributes increases with the length of the reduct.
Based on the reducts from the input data table, subtables were created in such a way that each subtable contains only the columns corresponding to the attributes present in the given reduct. Table 4 describes characteristics, i.e., the number of rows after removing duplicates (row rows) and the number of columns (row attr) for the obtained subtables, for sets k, k = 1 , , 9 . Columns of Table 4 are labeled by numbers from 2 to 10, which correspond to the number of subtables in the set k. These are average values.
In the subtables created, the number of columns corresponds to the cardinalities of reducts presented in Table 2. The number of rows corresponds to the training part of the dataset and is about 70% of the rows from the entire dataset.

3.4. Importance of Attributes at Intermediate and Global Levels

For all 54 subtatables, local rankings based on decision rules induced by the greedy algorithm were constructed. For each subtable, a set of unique decision rules was induced, and then the weights of attributes were calculated, taking into account the number of rules in which the attribute appears w a , 1 . If two or more attributes have the same value, then the number of separated rows with different class labels was calculated as a ratio of rows separated by the given attribute-value pair and decision class to the total number of objects in the set with a different decision w a , 2 . An attribute with a higher ratio has been assigned a higher rank. If there were attributes with the same ratio values, then the maximum rule support among the rules for which the ratio of separated rows is maximal was taken into account w a , 3 .
Considering the way the data are distributed, i.e., into k sets, each containing i subtables, higher-level rankings, called intermediate rankings RI, were constructed based on local rankings. Each intermediate ranking is assigned to the set k, and they were created using i local rankings. Both intermediate and global level rankings follow the principle of the number of occurrences of attributes in the lower-level rankings.
Let W L a ( n ) be a weight of attribute a in the local ranking n represented as a vector:
W L a ( n ) = w a , 1 ( n ) w a , 2 ( n ) w a , 3 ( n ) .
For i local rankings in the set k, the weight W I a ( k ) of attribute a in the intermediate ranking k is defined as:
W I a ( k ) = max n = 1 i W L a j ( n ) , j = 1 , 2 , 3 .
Let w a , 4 ( k ) denote the number occurrence of a given attribute a in the set k with i local rankings:
w a , 4 ( k ) = n = 1 i 1 ( a W L a ( n ) ) ,
where 1 ( a W L a ) is an indicator function that equals 1 if attribute a is present in the given ranking and 0 otherwise.
Similarly, for k intermediate rankings, the global weight W G a of attribute a in the global ranking is defined as:
W G a = max s = 1 k W I a j ( s ) , j = 1 , 2 , 3 ,
and respectively
w a , 4 ( G ) = s = 1 k 1 ( a W I a ( s ) ) .
The general scheme for creating rankings is presented in Figure 2.
Table 5 presents the attributes appearing in the intermediate and global rankings. Column Nr indicates the position of the attribute in the ranking. Columns from RI_2 to RI_10 present attributes that consist of intermediate rankings, and the number from 2 to 10 corresponds to the number of local rankings in the set k. Column RG denotes the global ranking.
On the basis of the results obtained, it can be seen that three attributes, such as a8—nationality, a15—educational special needs, and a21—international, did not feature in any of the intermediate rankings. The ones presented in Table 5 contain a varying number of attributes, and usually, this number increases as the number of local rankings in set k increases. The global ranking contains 33 attributes and among those given, the 10 highest weightings are: course, application mode, inflation rate, mother’s qualification, mother’s occupation, application order, father’s qualification, curricular units 2nd sem (grade), previous qualification (grade), and curricular units 1st sem (grade). The lowest positions are assigned to attributes: admission grade, daytime/evening attendance, debtor, and curricular units 1st sem (without evaluations).
Figure 3 presents, for each attribute, the number of local rankings constructed by the greedy algorithm in which the attribute occurs.
It can be seen that the attribute a2 is present in all local rankings, while the attribute a4 is present in 45 rankings out of 54; however, in the global ranking (see Table 5), a4 is in the first position. It should be noted that this figure shows the frequency of occurrence of the attributes in the rankings and not the weights, which determine the order of the attributes, as shown in Table 5 for intermediate and global levels. Attributes a8, a15, and a21 did not appear in any of the local rankings, so they are not presented in Figure 3; they also do not appear in the intermediate and global levels.
Table 6 presents differences between H m a x and H ( A ) for intermediate and global rankings.
This visualisation is also shown in Figure 4.
The biggest difference is visible for RI_10, RI_9, and RI_8, which means that the data in the set are diverse; this may indicate greater informational value. Taking into account the number of local rankings on the basis of which the intermediate rankings were created, it can be observed that as the number of local rankings increases, the value of the entropy difference H m a x and H ( A ) usually increases as well, which indicates that the data is structured and informative.

4. Experimental Results

This section presents the performance of classifiers obtained based on intermediate and global rankings.
Training sets for the construction of classifiers were created in the form of subtables containing the attributes corresponding to the attributes included in the given ranking at the intermediate and global levels, respectively. For the constructed classification models, a backward elimination technique was applied, driven by the attributes included in a given ranking. Starting with the attribute in the lowest position in the ranking, the number of attributes in the subtable was sequentially reduced by removing them, and the classification accuracy was evaluated on the test set. This process was repeated until the accuracy was relatively low or the attributes were exhausted. To evaluate the effectiveness of a model, the accuracy of classification was used. It indicates the number of correctly classified objects relative to all objects in the test set.
The classification accuracy of a model trained on the dataset containing all attributes from the ranking is treated as a reference point in the local context; the accuracy of the model trained on the dataset containing all 36 attributes is considered the reference point in the global context.
In the research, XGBoost and MLP algorithms were used for the construction of classifiers. Default parameters were used for the XGBoost classifier. The MLP classifier was used with the following parameters: r a n d o m _ s t a t e = 3 , m a x _ i t e r = 1000 ; the rest of parameters were kept default to sklearn implementation. It is important to note that the number of maximum iterations in this case is the number of iterations the solver stops at if convergence has not been reached earlier.
The computational complexity of the main algorithms, with m as the number of attributes and n as the number of instances is as follows: the greedy algorithm for the induction of decision rules: O ( m · n 2 ) ; XGBoost for the construction of the model: O ( n · l o g ( n ) m ) ; MLP for the construction of the model: O ( m · n ) .
Table 7 presents reference values in the global context for XGBoost and MLP. In this case, XGBoost allows us to obtain slightly better results than MLP.
Table 8 and Table 9 present the accuracy of classifiers at the intermediate and global levels. Results in Table 8 are related to XGBoost classifiers in Table 9—MLP classifiers. In both tables, column Nr indicates the attribute’s position in the ranking. Columns from RI_2 to RI_10 present accuracy related to intermediate rankings; column RG denotes the global ranking. The values presented in bold indicate the accuracy that is equal to or greater than the reference value in the local context, i.e., the accuracy obtained by taking into account all attributes included in the given ranking.
Among the intermediate rankings, the biggest improvement is seen for the RI_7 ranking, where instead of 23 attributes, 16 attributes are used, achieving a higher accuracy of classification. This number of attributes represents almost 50% of the attributes in the entire dataset. Improvements are also visible for ranking RI_2, RI_3, and RI_6. Considering the global ranking, the highest classification accuracy, surpassing the local reference point, was obtained using only 26 attributes instead of 33. The performance of the classifier at this level is the same as for attribute 26 in ranking RI_8; however, the local reference point was not exceeded in this case. Overall, the obtained results demonstrate a trend where, for attributes occupying the top positions in the rankings, the accuracy of classification generally increases as the number of attributes grows.
Table 9 presents the accuracy of MLP classifiers at the intermediate and global levels.
The results in Table 9 are much more varied than in Table 8, as confirmed by the visualizations in Figure 5 and Figure 6. It can be seen that the results obtained for the neural network-based classifier show that for each intermediate and global ranking, an accuracy of classification equal to or greater than the reference point in the local context can be indicated for fewer attributes than the number of attributes in the ranking. In the case of rankings RI_6, RI_7, RI_8, a single attribute in the table is sufficient; however, note the relatively low classification value obtained for the full set of attributes, especially in ranking RI_6. In the case of the global ranking, instead of 33, 15 attributes in the table are sufficient to obtain a classification accuracy above the local reference value, which is 0.575.
Figure 5 shows the accuracy of the XGBoost classifier for each intermediate ranking with j last features from the said ranking removed. It can be interpreted as a visual representation of Table 8. Each line and color represent a subset of n k j attributes in general, where for each intermediate ranking: n k = | R I _ k | , k [ 2 , 10 ] and j [ 1 , n k 1 ] . The line listed as “accuracy” is shows the classifier performance on all attributes from k-th ranking, or in other words, “base” performance for each intermediate ranking.
Figure 6 is a form of visual representation of Table 9 and shows the accuracy of the MLP classifier for each intermediate ranking where “accuracy” represents classifier performance on all attributes across each R I _ k .
Taking into account the classification accuracy of MLP and XGBoost, the low results obtained by the MLP model are related to the use of a single model, where boosting and the gradient approach were omitted. It should also be noted that in the case of the MLP, the investigation into the provision of hidden layers, nodes, and hyperparameters was omitted, as they are not the subject of this article.
The experiments were conducted using Google Colab, a cloud-based platform that provides a Jupyter Notebook environment with access to GPU and TPU acceleration. The software environment included Python 3.13, along with libraries such as NumPy version 2.1.2, Pandas version 2.2.3, Scikit-learn version 1.5.2, and xgb version 2.1.2.

5. Summary of Results

In this section, the best subsets of features for each ranking will be presented for both classifiers, XGBoost and MLP, accordingly. To keep the structure intact, we will start with the results obtained with intermediate rankings, outlining the best performing subsets for each ranking, followed by the performance levels obtained for the global ranking.
Table 10 represents a qualitative improvement (if present) for each intermediate and global ranking with both classifiers used in this research. We can observe a tendency for MLP to severely underperform compared to XGBoost on the full set of features per ranking, leading to smaller subset selections for all rankings. However, only a few reached close to the performance level of XGBoost, which directly points to the advantage of gradient boosting tree-based classifiers when applied to the selected dataset. In this table, a c c . denotes reference accuracy in the local context for each R I k , n k j f e a t . denotes the number of last attributes removed from the ranking at which maximum accuracy, denoted as m a x ( a c c . ) , was obtained. The absolute difference in accuracy between a c c . and m a x ( a c c . ) is noted as a c c .
With the context of classifiers and the dataset used, one can observe that only XGBoost yielded measurable improvement in performance both on intermediate and global rankings, with MLP not being able to keep up. This is clearly visible in Figure 7, which shows a side-by-side comparison of the performance of XGBoost and MLP on global ranking across each n k j subset. In this figure, the notation of n k j is replaced with the actual position of the attribute in the global ranking RG.
Figure 8 shows the number of attributes in the intermediate rankings and the global rankings that achieved a classification accuracy equal to or greater than the reference point in the global context, i.e., 0.773, in the case of the XGBoost classifier.
In order to obtain a classification accuracy of at least the reference level, 21 attributes, instead of 36, in the rankings RI_6 and RG are sufficient, while the greatest difference from the reference value was obtained in the ranking RI_8, for the whole set of attributes, i.e., for 28 out of 36. Unfortunately, the results obtained for the MLP classifiers are much more divergent than in the case of XGBoost and did not exceed the global reference value.

6. Conclusions

In the paper, a research methodology for ranking construction from distributed data was proposed. The main contributions consist of (i) a data distribution approach based on reducts as a feature selection method, (ii) procedures for ranking construction at different levels, i.e., local with 54 rankings, 9 at the intermediate level, and 1 at the global level, and (iii) a method for ranking verification using a backward elimination strategy driven by attributes included in the rankings. Two different types of classifiers were used, related to implicit (MLP) and explicit (XGBoost) knowledge representation.
For the analysed dataset, it was observed that when employing the XGBoost classifier, even the attributes positioned at the lower end of the ranking contribute meaningfully to the construction of the classification model. In contrast, the application of the MLP classifier does not exhibit such a pattern, with the classification results demonstrating greater variability and less consistent reliance on the ranking order of attributes. It should be noted that in many cases, improvements were visible, which allowed for a reduction in the number of attributes and greater classification accuracy than the referenced values at the intermediate and global levels. The extensive experiments performed show the merit of the proposed methodology for multi-ranking construction. In the future, other classifiers with knowledge representation properties will be studied, and other strategies for ranking construction in a framework of dispersed data will be investigated.

Author Contributions

Conceptualization, B.Z.; methodology, B.Z., K.J. and A.D.; software, K.J. and A.D.; validation, B.Z., K.J. and A.D.; formal analysis, B.Z. and K.J.; investigation, B.Z.; data curation, B.Z. and K.J.; writing—original draft preparation, B.Z., K.J. and A.D.; writing—review and editing, B.Z. and K.J.; visualization, B.Z., K.J. and A.D.; supervision, B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Datasets used during the experiments are downloaded from UCI Machine Learning Repository https://archive.ics.uci.edu (accessed on October 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Manss, C.; Shutin, D. Global-Entropy Driven Exploration with Distributed Models Under Sparsity Constraints. Appl. Sci. 2018, 8, 1722. [Google Scholar] [CrossRef]
  2. Moshkov, M.; Zielosko, B.; Tetteh, E.T. Selected Data Mining Tools for Data Analysis in Distributed Environment. Entropy 2022, 24, 1401. [Google Scholar] [CrossRef]
  3. Guste, R.R.A.; Ong, A.K.S. Machine Learning Decision System on the Empirical Analysis of the Actual Usage of Interactive Entertainment: A Perspective of Sustainable Innovative Technology. Computers 2024, 13, 128. [Google Scholar] [CrossRef]
  4. Denham, B.; Pears, R.; Naeem, M.A. HDSM: A distributed data mining approach to classifying vertically distributed data streams. Knowl.-Based Syst. 2020, 189, 105114. [Google Scholar] [CrossRef]
  5. Cover, T.M.; Thomas, J.A. Entropy, Relative Entropy and Mutual Information. In Elements of Information Theory; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2006; pp. 12–49. [Google Scholar] [CrossRef]
  6. Liu, H.; Motoda, H. Computational Methods of Feature Selection; Chapman & Hall/Crc Data Mining and Knowledge Discovery Series; Chapman & Hall/CRC: Boca Raton, FL, USA, 2007. [Google Scholar]
  7. Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L. (Eds.) Feature Extraction: Foundations and Applications; Studies in Fuzziness and Soft Computing; Physica-Verlag, Springer: Heidelberg, Germany, 2006; Volume 207. [Google Scholar]
  8. Stanczyk, U.; Zielosko, B.; Jain, L.C. (Eds.) Advances in Feature Selection for Data and Pattern Recognition: An Introduction. In Advances in Feature Selection for Data and Pattern Recognition; Intelligent Systems Reference Library; Springer: Berlin/Heidelberg, Germany, 2018; Volume 138, pp. 1–9. [Google Scholar] [CrossRef]
  9. Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Recent advances and emerging challenges of feature selection in the context of big data. Knowl.-Based Syst. 2015, 86, 33–45. [Google Scholar] [CrossRef]
  10. Holloway, R.; Ho, D.; Delotavo, C.; Xie, W.Y.; Rahimi, I.; Nikoo, M.R.; Gandomi, A.H. Optimal location selection for a distributed hybrid renewable energy system in rural Western Australia: A data mining approach. Energy Strategy Rev. 2023, 50, 101205. [Google Scholar] [CrossRef]
  11. Liu, W.; Wang, J. Recursive elimination current algorithms and a distributed computing scheme to accelerate wrapper feature selection. Inf. Sci. 2022, 589, 636–654. [Google Scholar] [CrossRef]
  12. Qin, L.; Wang, X.; Yin, L.; Jiang, Z. A distributed evolutionary based instance selection algorithm for big data using Apache Spark. Appl. Soft Comput. 2024, 159, 111638. [Google Scholar] [CrossRef]
  13. Bolón-Canedo, V.; Sechidis, K.; Sánchez-Maroño, N.; Alonso-Betanzos, A.; Brown, G. Insights into distributed feature ranking. Inf. Sci. 2019, 496, 378–398. [Google Scholar] [CrossRef]
  14. Petković, M.; Ceci, M.; Pio, G.; Škrlj, B.; Kersting, K.; Džeroski, S. Relational tree ensembles and feature rankings. Knowl.-Based Syst. 2022, 251, 109254. [Google Scholar] [CrossRef]
  15. Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
  16. Mafarja, M.M.; Mirjalili, S. Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 2017, 260, 302–312. [Google Scholar] [CrossRef]
  17. Xu, H.; Ma, S.; Wang, W. An ordered feature recognition method based on ranking separability. Inf. Sci. 2023, 648, 119518. [Google Scholar] [CrossRef]
  18. Pekala, B.; Szkola, J.; Grochowalski, P.; Gil, D.; Kosior, D.; Dyczkowski, K. A Novel Method for Human Fall Detection Using Federated Learning and Interval-Valued Fuzzy Inference Systems. J. Artif. Intell. Soft Comput. Res. 2025, 15, 77–90. [Google Scholar] [CrossRef]
  19. Zielosko, B.; Stanczyk, U.; Jablonski, K. Construction of Features Ranking—Global Approach. In Proceedings of the Harnessing Opportunities: Reshaping ISD in the Post-COVID-19 and Generative AI Era (ISD2024 Proceedings), Gdańsk, Poland, 26–28 August 2024; Marcinkowski, B., Przybylek, A., Jarzebowicz, A., Iivari, N., Insfrán, E., Lang, M., Linger, H., Schneider, C., Eds.; University of Gdańsk/Association for Information Systems: Gdańsk, Poland, 2024. [Google Scholar]
  20. Pawlak, Z.; Skowron, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
  21. Artiemjew, P.; Ropiak, K. A Novel Ensemble Model—The Random Granular Reflections. Fundam. Inform. 2021, 179, 183–203. [Google Scholar] [CrossRef]
  22. Realinho, V.; Martins, M.V.; Machado, J.; Baptista, L. Predict Students’ Dropout and Academic Success; UCI Machine Learning Repository: Noida, India, 2021. [Google Scholar] [CrossRef]
  23. Zielosko, B.; Piliszczuk, M. Greedy Algorithm for Attribute Reduction. Fundam. Inform. 2008, 85, 549–561. [Google Scholar] [CrossRef]
  24. Andrade, L.A.C.; Cunha, C.B. Disaggregated retail forecasting: A gradient boosting approach. Appl. Soft Comput. 2023, 141, 110283. [Google Scholar] [CrossRef]
  25. Akama, S. Neural Networks. In Artificial Life: How to Create a Life Computationally; Springer Nature: Cham, Switzerland, 2024; pp. 31–51. [Google Scholar] [CrossRef]
  26. Grzegorowski, M.; Ślȩzak, D. On resilient feature selection: Computational foundations of r-C-reducts. Inf. Sci. 2019, 499, 25–44. [Google Scholar] [CrossRef]
  27. Matusiewicz, Z.; Mroczek, T. Attribute reduction method based on fuzzy relational equations and inequalities. Int. J. Approx. Reason. 2025, 178, 109355. [Google Scholar] [CrossRef]
  28. da Costa, N.L.; de Lima, M.D.; Barbosa, R. Analysis and improvements on feature selection methods based on artificial neural network weights. Appl. Soft Comput. 2022, 127, 109395. [Google Scholar] [CrossRef]
  29. Zielosko, B. Application of Dynamic Programming Approach to Optimization of Association Rules Relative to Coverage and Length. Fundam. Inform. 2016, 148, 87–105. [Google Scholar] [CrossRef]
  30. Stanczyk, U. Pruning Decision Rules by Reduct-Based Weighting and Ranking of Features. Entropy 2022, 24, 1602. [Google Scholar] [CrossRef]
  31. Estevez, P.A.; Tesmer, M.; Perez, C.A.; Zurada, J.M. Normalized Mutual Information Feature Selection. IEEE Trans. Neural Netw. 2009, 20, 189–201. [Google Scholar] [CrossRef] [PubMed]
  32. Vinh, L.T.; Lee, S.; Park, Y.T.; d’Auriol, B.J. A novel feature selection method based on normalized mutual information. Appl. Intell. 2012, 37, 100–120. [Google Scholar] [CrossRef]
  33. Zielosko, B.; Żabiński, K. Optimization of Decision Rules Relative to Length Based on Modified Dynamic Programming Approach. In Advances in Feature Selection for Data and Pattern Recognition; Intelligent Systems Reference Library; Stańczyk, U., Zielosko, B., Jain, L.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 138, pp. 73–93. [Google Scholar] [CrossRef]
  34. Teisseyre, P. Feature ranking for multi-label classification using Markov networks. Neurocomputing 2016, 205, 439–454. [Google Scholar] [CrossRef]
  35. Garchery, M.; Granitzer, M. On the influence of categorical features in ranking anomalies using mixed data. Procedia Comput. Sci. 2018, 126, 77–86. [Google Scholar] [CrossRef]
  36. Paja, W. Application of the Fuzzy Approach for Evaluating and Selecting Relevant Objects, Features, and Their Ranges. Entropy 2023, 25, 1223. [Google Scholar] [CrossRef]
  37. Moshkov, M.J.; Piliszczuk, M.; Zielosko, B. Greedy Algorithm for Construction of Partial Association Rules. Fundam. Inform. 2009, 92, 259–277. [Google Scholar] [CrossRef]
  38. Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  39. Adler, A.I.; Painsky, A. Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection. Entropy 2022, 24, 687. [Google Scholar] [CrossRef] [PubMed]
  40. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. In Artificial Intelligence Review; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1937–1967. [Google Scholar] [CrossRef]
  41. Baran, K. ShuffleNet and XGBoost classifier for stress reactions detection. Procedia Comput. Sci. 2024, 246, 3771–3780. [Google Scholar] [CrossRef]
  42. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  43. Ertel, W. Neural Networks. In Introduction to Artificial Intelligence; Springer Fachmedien: Wiesbaden, Germany, 2025; pp. 253–309. [Google Scholar] [CrossRef]
  44. Fayyad, U.M.; Irani, K.B. Multi-Interval Discretization of Continuousvalued Attributes for Classification Learning; Morgan Kaufmann: Burlington, MA, USA, 1993; Volume 2, pp. 1022–1027. [Google Scholar]
  45. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. The WEKA Data Mining Software: An Update. SIGKDD Explor. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  46. Bazan, J.; Szczuka, M. The Rough Set Exploration System. In Transactions on Rough Sets III; Lecture Notes in Computer, Science; Peters, J.F., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3400, pp. 37–56. [Google Scholar]
  47. Wróblewski, J. Theoretical Foundations of Order-Based Genetic Algorithms. Fundam. Inform. 1996, 28, 423–430. [Google Scholar] [CrossRef]
  48. Wróblewski, J. Covering with Reducts—A Fast Algorithm for Rule Generation. In Proceedings of the Rough Sets and Current Trends in Computing, Warsaw, Poland, 22–26 June 1998; Polkowski, L., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 402–407. [Google Scholar]
Figure 1. Framework of developed methodology.
Figure 1. Framework of developed methodology.
Entropy 27 00278 g001
Figure 2. General scheme for rankings construction.
Figure 2. General scheme for rankings construction.
Entropy 27 00278 g002
Figure 3. Occurrence of attributes in the local rankings.
Figure 3. Occurrence of attributes in the local rankings.
Entropy 27 00278 g003
Figure 4. Informativeness of attribute rankings.
Figure 4. Informativeness of attribute rankings.
Entropy 27 00278 g004
Figure 5. Accuracy of XGBoost on intermediate rankings with n k j attributes.
Figure 5. Accuracy of XGBoost on intermediate rankings with n k j attributes.
Entropy 27 00278 g005
Figure 6. Accuracy of MLP on intermediate rankings with n k j attributes.
Figure 6. Accuracy of MLP on intermediate rankings with n k j attributes.
Entropy 27 00278 g006
Figure 7. Performance of XGBoost and MPL on RG.
Figure 7. Performance of XGBoost and MPL on RG.
Entropy 27 00278 g007
Figure 8. Number of attributes in rankings with higher accuracy than global reference value.
Figure 8. Number of attributes in rankings with higher accuracy than global reference value.
Entropy 27 00278 g008
Table 1. Attributes in dataset related to students dropout and academic success.
Table 1. Attributes in dataset related to students dropout and academic success.
IdAttributesIdAttributes
a1Marital Statusa19Scholarship holder
a2Application modea20Age at enrollment
a3Application ordera21International
a4Coursea22Curricular units 1st sem (credited)
a5Daytime/evening attendancea23Curricular units 1st sem (enrolled)
a6Previous qualificationa24Curricular units 1st sem (evaluations)
a7Previous qualification (grade)a25Curricular units 1st sem (approved)
a8Nationalitya26Curricular units 1st sem (grade)
a9Mother’s qualificationa27Curricular units 1st sem (without evaluations)
a10Father’s qualificationa28Curricular units 2nd sem (credited)
a11Mother’s occupationa29Curricular units 2nd sem (enrolled)
a12Father’s occupationa30Curricular units 2nd sem (evaluations)
a13Admission gradea31Curricular units 2nd sem (approved)
a14Displaceda32Curricular units 2nd sem (grade)
a15Educational special needsa33Curricular units 2nd sem (without evaluations)
a16Debtora34Unemployment rate
a17Tuition fees up to datea35Inflation rate
a18Gendera36GDP
Table 2. Cardinalities of reducts.
Table 2. Cardinalities of reducts.
Set kLength of Reducts
11010
2111212
315111111
41212121211
5121212151010
610111111121313
71412121212131212
8111111111112121212
912121212131010101010
Table 3. Attributes included in reducts from the set k = 9 .
Table 3. Attributes included in reducts from the set k = 9 .
12345678910111213
reduct1a2a3a4a6a7a10a12a18a19a28a30a36
reduct2a2a3a4a6a7a10a12a18a19a28a30a34
reduct3a2a3a4a6a7a10a12a17a19a25a29a36
reduct4a2a3a4a7a9a10a11a12a18a26a31a33
reduct5a2a3a4a7a9a10a11a14a19a22a26a33a35
reduct6a2a3a4a7a9a10a11a31a32a35
reduct7a2a3a4a7a9a10a11a26a31a35
reduct8a2a3a4a7a9a10a11a30a31a34
reduct9a2a3a4a7a9a12a19a30a31a36
reduct10a2a3a4a7a9a12a19a30a31a34
Table 4. Characteristics of distributed subtables.
Table 4. Characteristics of distributed subtables.
2345678910
rows3074.03086.73079.23086.23079.53075.73075.83081.73077.6
attr10.011.712.011.811.811.612.411.411.1
Table 5. Intermediate and global rankings of attributes for distributed data.
Table 5. Intermediate and global rankings of attributes for distributed data.
NrRI_2RI_3RI_4RI_5RI_6RI_7RI_8RI_9RI_10RG
1a4a4a4a4a4a2a2a4a4a4
2a24a2a2a2a2a3a1a2a2a2
3a2a3a9a9a10a10a11a12a3a35
4a9a9a3a3a3a1a34a7a7a9
5a3a11a7a7a7a11a10a11a10a11
6a11a32a1a26a1a26a7a10a9a3
7a10a10a35a1a24a12a9a1a12a10
8a26a7a10a24a11a9a3a24a31a32
9a1a1a19a10a19a7a32a32a19a7
10a34a35a24a19a12a4a12a26a30a26
11a35a18a11a35a35a24a20a35a11a1
12 a36a32a34a9a35a4a9a35a12
13 a23a26a28a26a36a31a14a36a24
14 a22a25a12a30a29a29a34a34a34
15 a6a31a11a34a19a6a3a26a19
16 a12a32a17a31a19a19a6a25
17 a27a29a16a25a18a20a18a17
18 a17a17a31a32a25a18a28a31
19 a16 a25a28a24a6a33a36
20 a14 a32a5a23a25a25a29
21 a29a34a26a36a32a18
22 a27a23a17a22a22a6
23 a20a33a33a17a29a28
24 a18 a35a13a17a14
25 a30 a14a30
26 a36 a23
27 a28 a22
28 a14 a33
29 a20
30 a27
31 a16
32 a5
33 a13
Table 6. Informativeness of the set of attributes.
Table 6. Informativeness of the set of attributes.
RI_2RI_3RI_4RI_5RI_6RI_7RI_8RI_9RI_10RG
0.0370.1150.2010.2200.2450.2130.2540.2720.3610.168
Table 7. Performance of classifiers including all attributes in dataset.
Table 7. Performance of classifiers including all attributes in dataset.
ClassifierAccuracy
XGBoost0.773
MLP0.728
Table 8. Performance of XGBoost classifiers for the intermediate and global rankings of attributes.
Table 8. Performance of XGBoost classifiers for the intermediate and global rankings of attributes.
NrRI_2RI_3RI_4RI_5RI_6RI_7RI_8RI_9RI_10RG
10.5280.5280.5280.5280.5280.5440.5440.5280.5280.528
20.5850.5550.5550.5550.5550.5530.5410.5550.5550.555
30.5980.5500.5690.5690.5620.5380.5480.5690.5500.562
40.5860.5420.5420.5420.5630.5360.5400.5550.5470.547
50.5810.5510.5530.5530.5480.5440.5440.5660.5520.553
60.5780.6440.5410.6390.5580.6130.5440.5600.5480.560
70.5810.6480.5520.6360.5960.6170.5350.5700.5570.556
80.6310.6640.5540.6410.5990.6060.5440.5910.7210.652
90.6370.6600.5630.6300.6140.6030.6520.6810.7240.675
100.6450.6650.6020.6480.6060.6370.6480.6890.7260.681
110.6450.6730.6190.6450.6140.6480.6590.6860.7190.676
12 0.6660.6790.6480.6240.6650.6760.6960.7360.682
13 0.6780.6690.6710.6720.6690.7200.6780.7350.684
14 0.6780.7180.6740.6810.6840.7280.6820.7360.686
15 0.6770.7410.6840.6780.6930.7420.6800.7450.686
16 0.7370.6830.7160.7410.7480.6870.7410.706
17 0.7450.7000.7240.7450.7390.6860.7370.734
18 0.7520.7300.7660.7500.7500.6970.7480.755
19 0.758 0.7700.7520.7430.6980.7410.765
20 0.761 0.7670.7370.7440.7140.7490.770
21 0.7750.7440.7460.7210.7380.774
22 0.7710.7450.7660.7270.7470.766
23 0.7760.7390.7660.7480.7510.773
24 0.773 0.7770.7510.7730.773
25 0.778 0.7740.770
26 0.782 0.782
27 0.769 0.781
28 0.785 0.779
29 0.776
30 0.772
31 0.774
32 0.769
33 0.774
Table 9. Performance of MLP classifiers for the intermediate and global rankings of attributes.
Table 9. Performance of MLP classifiers for the intermediate and global rankings of attributes.
NrRI_2RI_3RI_4RI_5RI_6RI_7RI_8RI_9RI_10RG
10.4830.4830.4830.4830.4830.5380.5380.4830.4830.483
20.3320.5110.5110.5110.5110.4830.4840.5110.5110.511
30.3380.2920.5410.5410.4830.5340.5000.3310.2920.481
40.3370.4880.3550.3550.4890.4850.5440.3320.3370.535
50.4820.4840.4890.4890.4890.5160.5470.5310.3280.486
60.4870.4070.3380.4880.4880.6000.5350.4900.3920.486
70.5280.5330.3380.4880.3400.6120.5220.5350.1960.531
80.4860.3420.4530.2110.4870.6140.5100.4830.6490.356
90.3550.4880.5380.4890.2030.5960.6400.5040.6160.530
100.3920.4860.4930.4940.4870.3380.6360.5490.4920.340
110.5030.4830.4040.3340.4890.4860.6400.4870.6180.383
12 0.4890.2610.4900.3410.2220.2030.5480.3440.495
13 0.4840.4880.2080.2320.4890.4720.4500.6930.340
14 0.4870.3490.4920.4340.3400.4950.4500.5780.489
15 0.4950.6440.2030.5310.3440.6170.3430.6660.606
16 0.2430.4880.3540.2060.3660.5590.5640.650
17 0.5740.5840.2030.4900.5490.4900.2080.419
18 0.5810.5180.6960.6350.4940.3450.5210.493
19 0.349 0.7020.6270.6680.4910.6220.483
20 0.545 0.6630.4200.3570.4360.3730.340
21 0.5120.3810.4910.3440.5170.377
22 0.6900.5110.6480.4910.3490.610
23 0.6840.4900.2950.4890.6010.684
24 0.232 0.6050.4890.4640.492
25 0.705 0.6750.616
26 0.642 0.701
27 0.633 0.715
28 0.506 0.344
29 0.703
30 0.489
31 0.690
32 0.377
33 0.575
Table 10. Highest performing feature subsets of intermediate and global rankings.
Table 10. Highest performing feature subsets of intermediate and global rankings.
RI_2RI_3RI_4RI_5RI_6RI_7RI_8RI_9RI_10RG
XGBoost
a c c . 0.6450.6770.7610.7290.7730.7390.7850.7510.7740.774
n k j f e a t . n 2 n 1 n 4 n 7
m a x ( a c c . ) 0.678 0.7760.752 0.782
a c c . 0.0015 0.0030.013 0.008
MLP
a c c . 0.5030.4950.5450.5180.2320.490.5060.4890.6750.575
n k j f e a t . n 4 n 8 n 5 n 1 n 5 n 5 n 3 n 8 n 12 n 6
m a x ( a c c . ) 0.5280.5330.6440.5840.7020.6350.7050.560.6930.715
a c c . 0.0250.0380.0990.0660.470.1450.20.0710.0180.14
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zielosko, B.; Jabloński, K.; Dmytrenko, A. Exploiting Data Distribution: A Multi-Ranking Approach. Entropy 2025, 27, 278. https://doi.org/10.3390/e27030278

AMA Style

Zielosko B, Jabloński K, Dmytrenko A. Exploiting Data Distribution: A Multi-Ranking Approach. Entropy. 2025; 27(3):278. https://doi.org/10.3390/e27030278

Chicago/Turabian Style

Zielosko, Beata, Kamil Jabloński, and Anton Dmytrenko. 2025. "Exploiting Data Distribution: A Multi-Ranking Approach" Entropy 27, no. 3: 278. https://doi.org/10.3390/e27030278

APA Style

Zielosko, B., Jabloński, K., & Dmytrenko, A. (2025). Exploiting Data Distribution: A Multi-Ranking Approach. Entropy, 27(3), 278. https://doi.org/10.3390/e27030278

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop