Optimizing Attribute Reduction in Multi-Granularity Data through a Hybrid Supervised–Unsupervised Model

: Attribute reduction is a core technique in the rough set domain and an important step in data preprocessing. Researchers have proposed numerous innovative methods to enhance the capability of attribute reduction, such as the emergence of multi-granularity rough set models, which can effectively process distributed and multi-granularity data. However, these innovative methods still have numerous shortcomings, such as addressing complex constraints and conducting multi-angle effectiveness evaluations. Based on the multi-granularity model, this study proposes a new method of attribute reduction, namely using multi-granularity neighborhood information gain ratio as the measurement criterion. This method combines both supervised and unsupervised perspectives, and by integrating multi-granularity technology with neighborhood rough set theory, constructs a model that can adapt to multi-level data features. This novel method stands out by addressing complex constraints and facilitating multi-perspective effectiveness evaluations. It has several advantages: (1) it combines supervised and unsupervised learning methods, allowing for nuanced data interpretation and enhanced attribute selection; (2) by incorporating multi-granularity structures, the algorithm can analyze data at various levels of granularity. This allows for a more detailed understanding of data characteristics at each level, which can be crucial for complex datasets; and (3) by using neighborhood relations instead of indiscernibility relations, the method effectively handles uncertain and fuzzy data, making it suitable for real-world datasets that often contain imprecise or incomplete information. It not only selects the optimal granularity level or attribute set based on specific requirements, but also demonstrates its versatility and robustness through extensive experiments on 15 UCI datasets. Comparative analyses against six established attribute reduction algorithms confirms the superior reliability and consistency of our proposed method. This research not only enhances the understanding of attribute reduction mechanisms, but also sets a new benchmark for future explorations in the field.


Introduction
In this era of information explosion, data are growing exponentially in both dimension and volume, which leads to the attributes of data becoming redundant and vague.How to find valuable information from massive data has become challenging.Rough set theory, introduced by Pawlak [1] in 1982 as a simple and efficient method for data mining, can deal with fuzzy, incomplete, and inaccurate data [2].
The traditional model of rough sets mainly focuses on describing the uncertainty and fuzziness of data through binary relations [3].In recent years, multi-granularity rough set models have been proposed to fully mine the multiple granularity levels of target information, extending the traditional single binary relation to multiple binary relations, with the work of Qian et al. [4] being representative.This model has provided a new solution for rough set theory in dealing with distributed data and multi-granularity data.Afterward, researchers continuously improved Qian's multi-granularity rough set model.Some of the improvements combine multi-granularity rough sets with decision-theoretic rough sets to form a multi-granularity decision-theoretic rough set model [5].In addition, there is research combining multi-granularity rough sets with the three-way decision model, proposing a multi-granularity three-way decision model [6].Targeting the granulation of attributes and attribute values, Xu proposed an improved multi-granularity rough set model [7].To expand the applicability of multi-granularity rough sets, Lin et al. integrated the neighborhood relation into the multi-granularity rough set model, proposing the neighborhood multi-granularity rough set.The introduction of this model has made the multi-granularity rough set research branch a hot topic of study [8].These rough set models can effectively reduce data dimensionality, achieved by attribute reduction [9].
Attribute reduction can be achieved through supervised or unsupervised constraints, and research on constraints from both supervised and unsupervised perspectives has been extensively explored [10].Specifically, some studies propose attribute reduction constraints based on measures from only one perspective, using these constraints to find qualified reductions.For instance, Jiang et al. [11] and Yuan et al. [12] concentrated on attribute reduction through the lens of supervised information granulation and related supervised metrics, respectively; meanwhile, Yang et al. [13] proposed a concept known as fuzzy complementary entropy for attribute reduction within an unsupervised model; The algorithm discussed by Jain and Som [14] introduces a sophisticated multigranular rough set model that utilizes an intuitionistic fuzzy beta covering approach; Ji et al. [15] developed an extended rough sets model based on a fuzzy granular ball to enhance attribute reduction effectiveness.However, whether considering supervised measures or unsupervised measures, single-perspective based measures exhibit inherent constraints.Firstly, measures relying on a single perspective may overlook the multifaceted evaluation of data, leading to the neglect of some important attributes [16].This is because when only one fixed measure is used for the attribute reduction of data, the importance of each attribute is judged only based on its criterion.However, if other measures are needed for evaluation, then relying only on that criterion may no longer yield accurate results.Secondly, relying only on a single-perspective measure may not fully capture the characteristics of data under complex conditions, resulting in the selection of attributes that are neither accurate nor complete.For instance, if conditional entropy is used as a measure to evaluate attributes [17], the derived reduction may only possess the single feature required for evaluation, without fully considering other types of uncertainty features and learning capabilities.
To solve the limitations of the attribute reduction mentioned above, this paper introduces a new measure that merges both supervised and unsupervised perspectives, leading to a novel rough set model.The model proposed in this paper has the following advantages: (1) it integrates multi-granularity and neighborhood rough sets, making the model more adaptable to data features at different levels; and (2) for attribute sets of different granularities, it introduces a fusion strategy, selecting the optimal granularity level or attribute set according to the needs of different tasks and datasets, which can be flexibly adjusted based on specific circumstances.
The rest of this paper is organized as follows.Section 2 reviews related basic concepts.Section 3 provides a detailed introduction to the basic framework and algorithm design of the proposed method.In Section 4, the accuracy of our method is calculated and discussed through experiments.Finally, Section 5 concludes this paper and depicts some future works.

Neighborhood Rough Sets
Neighborhood rough sets were proposed by Hu et al. as an improvement over traditional rough sets [18].The key distinction lies in that neighborhood rough sets are established on the basis of neighborhood relations, as opposed to relations of indiscernibility [19].Hence, the neighborhood rough set model is capable of processing both discrete and continuous data [20].Moreover, the partitioning of neighborhoods granulates the sample space, which can reflect the discriminative power of different attributes on the samples [21].
Within the framework of rough set theory, a decision system is characterized by a tuple, represented by DS = (U, AT), where U denotes a finite collection of samples and AT encompasses a suite of conditional attributes, including a decision attribute d [22].The attribute d captures the sample labels [23].For every x in U and every a in AT, a(x) signifies the value of x for the conditional attribute a, and d(x) represents the label of x.Utilizing d, one can derive an equivalence relation on U: Pursuant to I ND(d), it leads to a division of U/I ND(d) = X 1 , X 2 , . . ., X q (q ≥ 2).Each X k within U/I ND(d) is recognized as the k-th decision category.Notably, the decision category that includes the sample x can be similarly referred to as [x] d .
In rough set methods, binary relations are often used for information granulation, among which neighborhood relations, as one of the most effective binary relations, have received extensive attention.The formation of neighborhood relations is as follows: where r A is a distance function regarding A ⊆ AT, r ≥ 0 is a radius.
Based on I ND(d), a segmentation of U/I ND(d) = X 1 , X 2 , ..., X q (q ≥ 2) can be initiated.For every X k within U/I ND(d), it is identified as the k-th decision group.In particular, the decision group that encompasses sample x may also be represented as [x] d .
In alignment with Equation (2), the vicinity of a sample x is established as follows: From the perspective of granular computing [24,25], both I ND(d) and N δA are derivations of information granules [26].The most significant difference between these two types of information granules lies in their intrinsic mechanisms, i.e., the binary relations used.Based on the outcomes of these information granules, the concepts of lower and upper approximations within the context of neighborhood rough sets, as the fundamental units, were also proposed by Cheng et al.
[x] A i is an equivalence class of x under A i , for any X ⊆ U, the optimistic multigranularity lower and upper approximations of A i with respect to X are defined as follows: are called optimistic multi-granularity rough sets.
[x] A i is an equivalence class of x under A i , for any X ⊆ U, the pessimistic multigranularity [30] lower and upper approximations of A i with respect to X are defined as follows: , then ∑ m i=1 A P i (x) and ∑ m i=1 A P i (x) are called pessimistic multi-granularity rough sets.
In the pursuit of refining data analysis, particularly when addressing complex and heterogeneous data sets, the application of multi-granularity rough sets provides a transformative framework.This approach offers a flexible methodology for representing data across various levels of granularity, allowing analysts to dissect large and diverse datasets into more comprehensible and manageable segments.This adaptability is crucial in environments where data exhibits varying degrees of precision, stemming from different sources or capturing differing phenomena.

Multi-Granularity Neighborhood Rough Sets
In the literature [31], Lin et al. proposed two types of neighborhood multi-granularity rough sets, which can be applied to deal with incomplete systems containing numerical and categorical attributes [32].To simplify the problem, when dealing with incomplete systems, only the application of neighborhood multi-granularity rough sets to numerical data are considered.
Given DS = (U, AT), where AT = {A k | k ∈ {1, 2, . . ., m}}, U = {x i | i ∈ {1, 2, . . ., n}}, X ⊆ U, in the optimistic neighborhood multi-granularity rough sets, the neighborhood multi-granularity approximation of X is defined as: where δA k (x i ) is the neighborhood granularity of x i , based on the granularity structure A k .Given DS = (U, AT), where AT = {A k | kin{1, 2, . . ., m}}, U = {x i | iin{1, 2, . . ., n}}, X ⊆ U, in the pessimistic neighborhood multi-granularity rough sets, the neighborhood multi-granularity approximation of X is defined as: where δA k (x i ) is the neighborhood granularity of x i , based on the granularity structure A k .The incorporation of multi-granularity neighborhood rough sets extends this concept by emphasizing local contexts and the spatial or temporal relationships inherent within the data.By focusing on the neighborhoods around each data point, these sets are particularly adept at mitigating the influence of noise and anomalies, significantly enhancing the robustness of the analysis.The neighborhood-based approach also facilitates adaptive threshold settings, crucial for accurately defining the granularity level in datasets where this parameter is not readily apparent.

Supervised Attribute Reduction
It is well known that neighborhood rough sets are often used in supervised learning tasks, especially in enhancing generalization performance and reducing classifier complexity [33].The advantage of attribute reduction lies in its easy adaptation to different practical application requirements, hence a variety of forms of attribute reduction have emerged in recent years.For neighborhood rough sets, information gain and split information value are two metrics that can be used to further explore the forms of attribute reduction.
Given the data DS = ⟨U, AT, d, δ⟩, for any A ⊆ AT, the neighborhood information gain of D based on A is defined as: Here, H NRS is the entropy of the entire dataset D, calculated based on the distribution under neighborhood lower or upper approximation [34].H NRS (d, A) is the expected value of uncertainty considering attribute A, defined as: Given the data DS = ⟨U, AT, d, δ⟩, for any A ⊆ AT, the neighborhood split information value of d based on A is defined as: Here, δ A (X j ) represents the sample set X j within the neighborhood formed by attribute A, and n is the number of different neighborhoods formed by A [35].
The combination of neighborhood information gain and split information value helps to more comprehensively assess the impact of attributes on dataset classification, thereby making more effective decisions in attribute reduction.

Unsupervised Attribute Reduction
It is widely recognized that supervised attribute reduction necessitates the use of sample labels, which are time-consuming and expensive to obtain in many practical tasks [36].In contrast, unsupervised attribute reduction does not require these labels, hence it has received more attention recently.
In unsupervised attribute reduction, if it is necessary to measure the importance of attributes, one can construct models by introducing pseudo-label strategies and using information gain and split information as metrics.
Given unsupervised data IS = ⟨U, AT⟩ and δ, for any A ⊆ AT, the unsupervised information gain based on A is defined as: where H NRS (d, A) is the expected value of uncertainty considering attribute A, defined as: d a denotes the pseudo-label decision for samples generated using conditional attribute a.
Given unsupervised data IS = ⟨U, AT⟩ and δ, for any A ⊆ AT, the unsupervised split information based on A is defined as: Here, d a is a pseudo-label decision, recorded by using conditional attribute a for sample pseudo-labels.
These definitions provide a new method for evaluating attribute importance in an unsupervised setting.Information gain reflects the contribution of an attribute to data classification, while split information measures the degree of confusion introduced by an attribute in the division of the dataset.This approach helps in more effective attribute selection and reduction in unsupervised learning.

Definition of Multi-Granularity Neighborhood Information Gain Ratio
Considering a dataset DS = ⟨U, AT, d, δ⟩, with U representing the sample set, AT indicating the attribute set, d denoting the decision attribute, and δ specifying the neighborhood radius.
For any A ⊆ AT, the multi-granularity neighborhood information gain ratio is defined as: where SI NRS (d, A) is the neighborhood split information quantity based on A, e |IG NRS (d,A)| is the information gain for decision attribute d based on attribute A with the base of the natural logarithm, and W A is the granularity space coefficient of attribute A in the multi-granularity structure, reflecting its importance in the multi-granularity structure.
For the calculation of the granularity space coefficient, given a set of granularities G 1 , G 2 , ..., G n , the performance of attribute A under each granularity can be measured by the quantitative indicator P G i (A).The definition of the granularity space coefficient W A is as follows: where β i is the granularity space allocated to each granularity G i , reflecting the importance of different granularities.These granularity spaces are usually determined based on the specific background knowledge or experimental verification of the problem.The granularities G 1 , G 2 , ..., G n in the multi-granularity structure are determined according to the data characteristics, problem requirements, etc. [37], and each granularity reflects different levels or details of the data.When calculating the granularity space coefficient, the performance of the attribute under different granularities is considered, in order to more accurately reflect its importance in the multi-granularity structure.
The neighborhood rough set is a method for dealing with uncertain and fuzzy data, which uses neighborhood relations instead of the indiscernible relations in traditional rough sets.In this method, data are decomposed into different granularities, each representing different levels or details of the data.Information gain ratio is a method for measuring the importance of attributes in data classification.It is based on the concept of information entropy and evaluates the classification capability of an attribute by comparing the entropy change in the dataset with and without the attribute.
Therefore, ϵ combines these two concepts, i.e., neighborhood information gain at different granularities and the split information value of attributes, to evaluate the importance of attributes in multi-granularity data analysis.The structure of the ϵ-reduct part is shown in Figure 1.This method not only considers the information gain of attributes, but also their performance at different granularities, thus providing a more comprehensive method of attribute evaluation.Given a decision system DS and a threshold θ ∈ [0, 1], an attribute A is considered significant if it satisfies the following conditions: There is no proper subset A ′ of attribute A such that ϵ AT (d) < θ.In this definition, significant attributes are determined based on their contribution to the information gain ratio, aiming to select attributes that are informative, yet not redundant for the decision-making process.This method is based on greedy search techniques for attribute reduction, and helps identify attributes that significantly impact the decision outcome.
Given a dataset DS = ⟨U, AT, d⟩, where U is the set of objects, AT is the set of conditional attributes, and d is the decision attribute.For any attribute subset A ⊆ AT and any a ∈ AT − A (i.e., any attribute not in A), the significance of attribute a regarding the multi-granularity neighborhood information gain ratio is defined as follows: The aforementioned significance function suggests that an increase in value enhances the importance of a conditional attribute, making it more probable to be included in the reduction set.For example, if Sig ϵ a 1(d) < Sig ϵ a 2(d) , where a1, a2 ∈ AT − A, then ϵ A∪{a1} (d) < ϵ A∪{a2} (d).Such a result indicates that choosing a2 to join A would lead to a higher multi-granularity neighborhood information gain ratio compared to a1.Given the foregoing, it is not formidable to conclude that ϵ-reduct has the following benefits.

1.
Multi-level data analysis: By incorporating multi-granularity structures, the algorithm can analyze data at various levels of granularity.This allows for a more detailed understanding of data characteristics at each level, which can be crucial for complex datasets.

2.
Comprehensive attribute evaluation: The algorithm evaluates attributes not only based on information gain, but also considering their performance across different granularities through the granularity space coefficient.This provides a holistic measure of attribute importance that accounts for varied data resolutions and contexts.

3.
Handling uncertainty and fuzziness: by using neighborhood relations instead of indiscernibility relations, the method effectively handles uncertain and fuzzy data, making it suitable for real-world datasets that often contain imprecise or incomplete information.
However, while having various advantages, it may also has certain limitations like computational complexity due to the computation of neighborhood information gain ratio for each attribute across multiple granularities.These have endowed it with infinite potential and room for development.

Detailed Algorithm
Based on the significance function, Algorithm 1 is designed to find the ϵ-reduct.
To streamline the analysis of the computational complexity for Algorithm 1, we initiate by applying k-means clustering to generate pseudo labels for the samples.With T denoting the iteration count for k-means clustering and k indicating the cluster count, the complexity of creating pseudo labels is O(k • T • |U| • |AT|), where |U| is the total number of samples, and |AT| signifies the attribute count.Subsequently, the calculation of ϵ A∪a (d) occurs no more than (1 + |AT|) • |AT|/2 times.In conclusion, the computational complexity of Algorithm 1 equates to O |U| 2 •|AT| 3  2 Algorithm 1 Forward greedy searching for ϵ-reduct with neighborhood rough set (NRS-ϵ) Input: A decision system DS = (U, AT, d), a neighborhood radius δ, a significance threshold θ.Output: An ϵ-reduct A.

Dataset Description
To evaluate the performance of the proposed measure, 15 UCI datasets are used in this experiment.These datasets were carefully selected after a thorough review to meet the multigranular criteria required by our method, accommodating both supervised and unsupervised learning scenarios.Table 1 summarizes the statistical information of these datasets.

Experimental Configuration
The experiment was performed on a personal computer running Windows 11, featuring an Intel Core i5-12500H processor (2.50 GHz) with 16.00 GB RAM.MATLAB R2023a served as the development environment.
In this experiment, a double means algorithm was adopted to recursively allocate attribute granularity space, while utilizing the k-means clustering method [38] to generate pseudo labels for samples, and the information gain ratio as the criterion for evaluating attribute reduction.Notably, the selected k-value needs to match the number of decision categories in the dataset.Moreover, the effect of the neighborhood rough set is significantly influenced by the preset radius size.To demonstrate the effectiveness and applicability of the proposed method, a series of experiments are designed using 20 different radius values, incremented by 0.02, ranging from 0.02 to 0.40.Through 10-fold cross-validation, the simplified reasoning process is validated.Specifically, for each specific radius, the dataset was divided into ten subsets, nine for training and one for testing.This cross-validation process was repeated 10 times to ensure that each subset had the opportunity to serve as the test set, thereby evaluating the classification performance and ensuring the reliability and stability of the model.
In the experiment, the proposed measure is compared with six advanced attribute reduction algorithms as well as with the algorithm without applying any attribute reduction methods (no reduct) using Regression Trees (CART) [20], K-Nearest Neighbors (KNN, K = 3) [39], and Support Vector Machines (SVM) [40].The performance of the reducer is evaluated in aspects of the stability, accuracy, and timeliness of classification, as well as the stability of reduction.The attribute reduction algorithms included for comparison are: MapReduce-Based Attribute Reduction Algorithm (MARA) [41]; Robust Attribute Reduction Based On Rough Sets (RARR) [42]; Bipolar Fuzzy Relation System Attribute Reduction Algorithms (BFRS) [43]; Attribute Group (AG) [44]; Separability-Based Evaluation Function (SEF) [45]; Genetic Algorithm-based Attribute Reduction (GAAR) [46].

Comparison of Classification Accuracy
In this part, the classification accuracy of each algorithm is evaluated using KNN, SVM, and CART for predicting test samples.Regarding attribute reduction algorithms, within a decision system DS, the definition of classification accuracy post-reduction is as follows: where Pre red (x i ) is the predicted label for x i using the reduced set red.Table 2 and Figure 2 present the specific classification accuracy outcomes for each algorithm across 15 datasets.From these observations, several insights can be readily inferred: 1.

2.
Examining the average classification accuracy per algorithm reveals that the accuracy associated with NRS-ϵ is on par with, if not exceeding, that of MARA, RARR, BFRS, AG, SEF, and GAAR.When using the CART classifier, the average classification accuracy of NRS-ϵ is 0.8012, up to 29.28% higher than other algorithms; when using the KNN classifier, the average classification accuracy of NRS-ϵ is 0.8169, up to 34.48% higher than other algorithms; when using SVM, the average classification accuracy of NRS-ϵ is 0.80116, up to 36.38% higher than other algorithms.

Comparison of Classification Stability
Similar to the evaluation of classification accuracy, this section explores the classification stability obtained by analyzing the classification results of seven different algorithms, including experiments with CART, KNN, and SVM classifiers.In a decision system DS = ⟨U, AT, d, δ⟩, assume the set U is equally divided into z mutually exclusive groups of the same size (using 10-fold cross-validation, so z = 10); that is, U 1 , . . ., U τ , . . ., U z (1 ≤ τ ≤ z).Then, the classification stability based on redundancy reduction red τ (obtained by removing U τ from the set U) can be represented as: where Exa(red τ , red τ ′ ) measures the consistency between two classification results, which can be defined according to Table 3.
Table 3. Joint distribution of classification results.
In Table 3, Prered τ (x) represents the predicted label of x obtained by red τ .The symbols ψ 1 , ψ 2 , ψ 3 , and ψ 4 , respectively, represent the number of samples that satisfy the corresponding conditions in Table 4. Based on this, Exa(red τ , red τ ′ ) is defined as The classification stability index reflects the degree of deviation of prediction labels when data perturbation occurs.Higher values of classification stability mean more stable prediction labels, indicating better quality of the corresponding reduction.Improvements in classification stability mean increased stability of prediction label results and reduced interference with training samples.After analyzing the 15 datasets using these three classifiers, Table 4 and Figure 3 present the findings of each algorithm in terms of classification stability.It should be noted that the classification stability index reflects the degree of change in prediction labels when data are perturbed.Higher classification stability values indicate more stable prediction labels, meaning the related redundancy reduction has a higher quality.
Regarding average classification accuracy, the stability of NRS-ϵ markedly surpasses that of competing algorithms.Specifically, when using the CART classifier, the classification stability of NRS-ϵ was 0.8228, up to 12.51% higher than other methods; when using the KNN classifier, its classification stability was 0.8972, up to 25.14% higher than other methods; and through the SVM classifier, the classification stability of NRS-ϵ was 0.9295, up to 14.61% higher than other methods.

Comparisons of Elapsed Time
In this section, the time required for attribute reduction by different algorithms is compared.The results are shown in Table 5.
An increase in the value of dimensionality reduction stability correlates with an extended length of reduction.Through an in-depth analysis of the table below, the findings listed below can be easily derived.The reduction length of NRS-ϵ is longer, suggesting that there is a need to enhance the algorithm's time efficiency throughout the simplification process.
When analyzing the average processing time performance of algorithms, from the perspective of average time consumed, it is noteworthy that the value of NRS-ϵ is reduced by 97.23% and 48.86% compared to RARR and GAAR, respectively.Taking the dataset "Car Evaluation (ID: 6)" as an example, the time consumed by NRS-ϵ, MARA, RARR, BFRS, AG, SEF, and GAAR are 122.1212seconds, 6.9838 seconds, 421.1056 seconds, 154.8219 seconds, 31.4599seconds, 33.3661 seconds, and 54.0532 seconds, respectively.Hence, under certain conditions, the time NRS-ϵ takes for attribute reduction is less compared to RARR and BFRS.
Based on the discussion, it is evident that while our novel algorithm exhibits better time efficiency compared to RARR and BFRS on certain datasets, the speed of NRS-ϵ requires further enhancement.

Comparison of Attribute Dimensionality Reduction Stability
In this section, the attribute dimensionality reduction stability related to 15 datasets is presented.Table 6 shows that the dimensionality reduction stability of NRS-ϵ is slightly lower than GAAR and SEF, but still maintains a leading position.Compared to MARA, RARR, BFRS, and AG, the average dimensionality reduction stability value of NRS-ϵ has increased by 100.2%, 49.89%, 27.19%, and 14.15%, respectively, while it only decreased by 19.323% and 6.677% compared to GAAR and SEF.
Although NRS-ϵ does not fall short of GAAR and SEF's results in terms of dimensionality reduction stability on many datasets, in some cases, its results in attribute dimensionality reduction are superior to the six advanced algorithms.For example, for the "Letter Recognition (ID: 15)" dataset, the dimensionality reduction stability of NRS-ϵ, MARA, RARR, BFRS, AG, SEF, and GAAR were 0.8608, 0.6001, 0.4011, 0.7882, 0.6549, 0.7723, and 0.7442, respectively.Compared to other algorithms, the results of NRS-ϵ improved by 43.47%, 115.2%, 9.211%, 31.44%,11.46%, and 15.67%, respectively.Thus, it is important to recognize that employing NRS-ϵ favors the selection of attributes better aligned with variations in samples.

Conclusions and Future Expectations
In this study, we introduced a novel attribute reduction strategy designed to address the challenges associated with high-dimensional data analysis.This strategy innovatively combines multi-granularity modeling with both supervised and unsupervised learning frameworks, enhancing its adaptability and effectiveness across various levels of data complexity.This model's integration of multi-granularity aspects distinguishes it from conventional attribute reduction methods by providing enhanced flexibility and adaptability to different data feature levels.This allows for more precise and effective handling of complex, high-dimensional datasets.The application of our proposed strategy across 15 UCI datasets has demonstrated not only exceptional classification performance, but also robust stability during the dimensionality reduction process.These results substantiate the practical utility and effectiveness of our approach in diverse data scenarios.While the strategy marks a significant advancement in attribute reduction, it does present challenges, primarily related to computational efficiency.The sophisticated nature of the integrated measurement methods, though beneficial for attribute selection quality, substantially increases the computational time required.This aspect can be particularly limiting in time-sensitive applications.To enhance the practicality and efficiency of our attribute reduction strategy, future research efforts could focus on: 1.
Implementing acceleration technologies could significantly reduce the computational burden, making the strategy more feasible for larger or more complex datasets.2.
Exploring alternative rough set-based fundamental measurements could provide deeper insights into their impact on classification performance.This exploration may lead to the discovery of even more effective attribute reduction techniques.
By addressing these limitations and exploring these suggested future research directions, we can further refine our attribute reduction strategy, potentially setting a new benchmark in the field.Our findings not only contribute to the existing body of knowledge, but also pave the way for future explorations aimed at enhancing data preprocessing techniques in the era of big data.

Table 2 .
The comparisons of the classification accuracies.

Table 5 .
The elapsed time of all seven algorithms.