You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

6 May 2025

Rough Set Theory and Soft Computing Methods for Building Explainable and Interpretable AI/ML Models

and
1
Information Systems Department, College of Computer Science and Information Technology, King Faisal University, Al Ahsa 31982, Saudi Arabia
2
Information Systems Department, Military Academy of Fondouk Jedid, Nabeul 8012, Tunisia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
This article belongs to the Special Issue Data and Text Mining: New Approaches, Achievements and Applications

Abstract

This study introduces a novel framework leveraging Rough Set Theory (RST)-based feature selection—MLReduct, MLSpecialReduct, and MLFuzzyRoughSet—to enhance machine learning performance on uncertain data. Applied to a private cardiovascular dataset, our MLSpecialReduct algorithm achieves a peak Random Forest accuracy of 0.99 (versus 0.85 without feature selection), while MLFuzzyRoughSet improves accuracy to 0.83, surpassing our MLVarianceThreshold (0.72–0.77), an adaptation of the traditional VarianceThreshold method. We integrate these RST techniques with preprocessing (discretization, normalization, encoding) and compare them against traditional approaches across classifiers like Random Forest and Naive Bayes. The results underscore RST’s edge in accuracy, efficiency, and interpretability, with MLSpecialReduct leading in minimal attribute reduction. Against baseline classifiers without feature selection and MLVarianceThreshold, our framework delivers significant improvements, establishing RST as a vital tool for explainable AI (XAI) in healthcare diagnostics and IoT systems. These findings open avenues for future hybrid RST-ML models, providing a robust, interpretable solution for complex data challenges.

1. Introduction

1.1. Context

In an era where data drive innovation, Rough Set Theory (RST) offers a powerful foundation for tackling uncertain and complex datasets. This study harnesses RST-based methods to advance feature selection in machine learning workflows. Initially developed by Zdzislaw Pawlak in the early 1980s, RST excels in identifying hidden patterns within incomplete, imprecise, or uncertain datasets, making it indispensable for modern intelligent systems and knowledge discovery tasks [,]. Unlike traditional methods that often struggle with non-linear or missing data, RST provides a structured and robust approach [], enabling the extraction of decision rules from both qualitative and quantitative data [].
What sets RST apart is its inherent ability to process and interpret imperfect data without requiring additional assumptions or information []. This makes RST particularly effective in tasks such as feature selection, discretization, and decision rule induction. Its flexibility allows it to adapt to various machine learning (ML) and data preprocessing scenarios [], rendering it indispensable for tasks that require handling uncertainty or incomplete data. In this evolving landscape, RST is recognized as a versatile and essential tool in the pursuit of intelligent data analysis [,].

1.2. Problematic

As datasets grow in complexity and machine learning spans diverse domains, advanced feature selection becomes essential []. Traditional methods like VarianceThreshold struggle with uncertain, imbalanced, or large-scale data, leading to overfitting and inefficiency []. This study addresses the integration of RST-based feature selection—MLReduct, MLSpecialReduct, and MLFuzzyRoughSet [,]—into ML workflows to enhance classifier performance, such as with XGBoost [], surpassing the limitations of our adapted MLVarianceThreshold [].

1.3. Contribution

This paper presents a novel framework based on Rough Set Theory (RST) for feature selection, designed to enhance machine learning classifiers under uncertain and complex data conditions. Our work advances RST-based feature selection through three key innovations:
  • Advanced RST Methods: We introduce three RST-based algorithms—MLReduct, MLSpecialReduct, and MLFuzzyRoughSet—that outperform traditional methods like MLVarianceThreshold. MLSpecialReduct, in particular, achieves minimal attribute reduction while maintaining high accuracy.
  • Dynamic Dependency Optimization: MLSpecialReduct uniquely combines incremental attribute evaluation with real-time redundancy checks, eliminating the need for postprocessing.
  • Hybrid Fuzzy Rough Handling: MLFuzzyRoughSet introduces adaptive membership thresholds (Section 5.3.5), automatically tuning α -cuts based on data distribution.
  • Computational Efficiency: Our methods achieve 12% higher accuracy than tri-level reducts [] while reducing runtime by 3× (Section 5.4).
  • Statistical Robustness: All results include 10-fold cross-validation with p < 0.01 significance.
  • Broad Applicability: Validated on both private clinical data and public benchmarks (Section 5.5).

1.4. Paper Organization

This paper is structured as follows: Section 2 surveys the RST-based feature selection literature. Section 3 outlines Rough Set Theory (RST) foundations and its modern applications in feature selection. Section 4 details our methodology, including MLReduct, MLSpecialReduct, and MLFuzzyRoughSet implementations. Section 5 presents experimental results, and Section 6 summarizes findings and future research directions.

3. Background

Rough Set Theory (RST), introduced by Zdzislaw Pawlak in 1982, is a robust framework for managing uncertainty and incompleteness in data analysis. Unlike probabilistic or fuzzy methods, RST relies solely on inherent data patterns, requiring no external assumptions []. This data-driven nature has made it invaluable in domains with imperfect data, such as healthcare, finance, and IoT [], evolving from a theoretical tool to a practical solution for feature selection and decision making in modern machine learning (ML) [].

3.1. The Importance of Feature Selection in Machine Learning

Feature selection is a critical preprocessing step in ML, aimed at reducing dimensionality, improving model performance, and enhancing interpretability. High-dimensional datasets, common in applications like genomics and image processing, often contain redundant or irrelevant features that can degrade classifier performance and increase computational complexity []. Traditional feature selection methods, such as filter-based approaches (e.g., Variance Threshold) and wrapper-based techniques, have limitations in handling noisy, incomplete, or imbalanced data. This has led to the exploration of alternative methods, including RST-based approaches, which excel in identifying minimal feature subsets that preserve the underlying structure of the data [].

3.2. Rough Set Theory: Foundations and Advancements

RST-based feature selection has become increasingly relevant in modern machine learning (ML), offering solutions to challenges like high dimensionality, uncertainty, and the need for interpretability. Its ability to derive minimal feature subsets while preserving the data structure makes it a powerful tool across diverse ML applications. In healthcare, RST enhances predictive models by identifying critical features for tasks like disease classification, balancing accuracy with transparency essential for clinical decision making. In natural language processing (NLP), RST aids in processing noisy text data, enabling robust sentiment analysis and topic modeling by focusing on key linguistic attributes. Computer vision benefits from RST through efficient feature selection for image classification and segmentation, where it reduces computational overhead while maintaining performance.
Beyond traditional domains, RST supports ML in resource-constrained environments, such as IoT systems, by optimizing feature sets for real-time predictive tasks like anomaly detection or equipment monitoring. Its integration with deep learning further exemplifies its versatility, where hybrid approaches combine RST’s interpretability with neural networks’ predictive power, as seen in domains requiring explainable outcomes, such as autonomous systems. These applications underscore RST’s potential to address scalability and interpretability challenges in ML, paving the way for its continued evolution in data-driven innovation. At its core, RST is based on the concept of indiscernibility, which partitions a dataset into equivalence classes based on descriptive attribute values. These partitions form the basis for deriving lower and upper approximations, which capture the certainty and possibility of classifying objects within the dataset []. The reduct algorithm, a key component of RST, identifies the minimal set of attributes that maintain the discernibility of objects, thereby enabling efficient feature selection [].
Recent advancements in RST have focused on extending its applicability to more complex datasets. For example, Fuzzy Rough Sets (FRSs) combine the principles of RST with fuzzy logic to handle imprecise or overlapping data, making them suitable for real-world applications where uncertainty is inherent []. Additionally, specialized algorithms like SpecialReduct have been developed to optimize attribute reduction, achieving higher accuracy and computational efficiency compared to traditional methods [].

3.3. Applications of RST in Modern Machine Learning

RST-based feature selection has become increasingly relevant in modern machine learning (ML), offering solutions to challenges like high dimensionality, uncertainty, and the need for interpretability []. Its ability to derive minimal feature subsets while preserving data structure makes it a powerful tool across diverse ML applications. In healthcare, RST enhances predictive models by identifying critical features for tasks like disease classification [], balancing accuracy with transparency essential for clinical decision making. In natural language processing (NLP), RST aids in processing noisy text data, enabling robust sentiment analysis and topic modeling by focusing on key linguistic attributes. Computer vision benefits from RST through efficient feature selection for image classification and segmentation, where it reduces computational overhead while maintaining performance [].
Beyond traditional domains, RST supports ML in resource-constrained environments, such as IoT systems, by optimizing feature sets for real-time predictive tasks like anomaly detection or equipment monitoring. In speech recognition, RST improves model robustness by selecting essential acoustic features from noisy audio, enhancing accuracy in diverse conditions. Similarly, in predictive maintenance [], RST identifies key indicators from industrial time-series data, enabling efficient failure prediction under uncertainty. Its integration with deep learning further exemplifies its versatility, where hybrid approaches combine RST’s interpretability with neural networks’ predictive power, as seen in domains requiring explainable outcomes, such as autonomous systems. These applications underscore RST’s potential to address scalability and interpretability challenges in ML, paving the way for its continued evolution in data-driven innovation.

4. Methodology

Our methodology integrates Rough Set Theory (RST)-based feature selection with preprocessing and classification steps to enhance machine learning performance. Figure 1 provides an overview of the general architecture of our framework, illustrating the flow from data preprocessing to model evaluation.
Figure 1. General architecture.

4.1. Preprocessing

Preprocessing entails preparing the raw dataset for analysis by converting it into an appropriate format. This process involves discretization to manage continuous data, encoding for categorical data, and normalization to standardize the dataset.

4.1.1. Discretization

Discretization transforms continuous numerical features into discrete categories or intervals, simplifying data for analysis []. This process enhances interpretability and compatibility with algorithms favoring categorical inputs. In this study, we discretize key features—age, blood pressure, cholesterol, and maximum heart rate—into meaningful groups. Age is categorized into ranges like young, middle-aged, and old; blood pressure into normal or abnormal; cholesterol into low, normal, or high; and maximum heart rate into low, normal, or elevated intervals []. This approach supports categorical analysis and boosts machine learning model performance by improving data structure and readability.
Continuous variables were binned using clinically validated thresholds:
  • Age:
    <40 (Young)
    40–60 (Middle-aged)
    >60 (Elderly, per WHO guidelines)
  • Blood Pressure (mmHg):
    <120 (Normal)
    120–139 (Prehypertension)
    ≥140 (Hypertension, JNC7 classification)
  • Cholesterol (mg/dL):
    <200 (Desirable)
    200–239 (Borderline high)
    ≥240 (High, per NCEP ATP III)

4.1.2. Encoding

Encoding converts categorical data into numerical formats suitable for machine learning. For attributes with inherent order (e.g., “low”, “medium”, “high”), ordinal encoding assigns unique integers based on their natural sequence, preserving relationships []. For the target class, label encoding assigns distinct integers to class labels without implying order, fitting classification tasks. These techniques transform categorical attributes and labels, enabling algorithms requiring numerical inputs to effectively train and evaluate predictive models.
  • Ordinal Features: Chest pain type (1–4 scale) preserved as integers
  • Nominal Features: One-hot encoding (e.g., gender, thalassemia)
  • Target: Binary label (0: healthy, 1: CVD)

4.1.3. Normalization of Attributes

Normalization is a critical preprocessing step that rescales numerical attributes to a standardized range, ensuring uniformity in feature magnitudes. In this study, we employ Min-Max scaling [], which transforms each attribute into a range of [0, 1] using the following formula:
X normalized = X X min X max X min
where X is the original value, and X min and X max are the minimum and maximum values of the attribute, respectively. This approach ensures that all attributes contribute equally to the analysis, preventing features with larger scales from dominating the model.
Normalization is particularly beneficial for algorithms that rely on distance metrics or gradient-based optimization, such as Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN). By eliminating scale-related biases, normalization enhances model performance and stability []. For example, in our cardiovascular dataset, attributes like age (ranging from 0 to 100) and cholesterol levels (ranging from 100 to 600 mg/dL) were normalized to ensure consistent scaling.

4.1.4. Normalization of Class

Class normalization is a preprocessing step that adjusts class labels to ensure compatibility with algorithms or metrics requiring zero-indexed classes. In this study, we subtract a constant value from the class labels to shift them to a zero-based index. For instance, if the original class labels are [1, 2, 3], they are transformed to [0, 1, 2].
This process ensures consistent scaling between attributes and class labels, minimizing magnitude-related biases and improving model accuracy []. Class normalization is particularly important for algorithms that interpret class labels as numerical values, such as neural networks or certain implementations of decision trees. In our experiments, this step ensured that the target variable (indicating the presence or absence of cardiovascular disease) was properly aligned with the input features, enhancing the overall performance of the classifiers.

4.2. Classification

4.2.1. MLReduct

The MLReduct method aims to identify minimal subsets of attributes (reductions) in a decision system D S that preserve the same classification power as the full set of attributes. This is achieved by comparing the positive region of attribute combinations against the positive region of the full attribute set. The positive region represents the set of objects that can be definitively classified into decision classes based on the given attributes. By iterating through all possible attribute combinations and retaining those that preserve the positive region, the method identifies the smallest reductions, optimizing the classification process and improving efficiency in machine learning tasks.
Algorithm 1 generates all possible combinations of attributes (excluding the decision column d) in the decision system D S . It systematically iterates through attribute subsets of varying sizes and stores them in a list. The empty combination is removed, as it is not relevant for the reduction process in MLReduct.
Algorithm 1 Combinations
Require: Decision system D S , decision column d
Ensure: List of attribute combinations l i s t _ c o m b i n a t i o n s
 1:
l i s t _ c o m b i n a t i o n s [ ]
 2:
C list ( D S . c o l u m n s )
 3:
C . r e m o v e ( d )
 4:
for n in r a n g e ( l e n ( C ) + 1 )  do
 5:
     list_combinations list_combinations + list ( c o m b i n a t i o n s ( C , n ) )
 6:
end for
 7:
list_combinations.remove ( ( ) )
 8:
return  list_combinations
Algorithm 2 computes the positive region for a given subset of attributes C in the decision system D S . The positive region is the set of objects that can be definitively classified into decision classes based on C, used by MLReduct to evaluate each combination.
Algorithm 3 computes the negative region for the full set of attributes in D S . The negative region represents objects that cannot be classified into any decision class based on all attributes, providing context for MLReduct’s focus on positive region preservation.
Algorithm 4 computes the positive region for the full set of attributes in D S . It serves as the reference for MLReduct to compare against the positive regions of attribute subsets during the reduction process.
Algorithm 2 POS
Require: Decision system D S , decision attribute d, attribute list C
Ensure: Positive region of C
  1:
a t t r C . c o p y ( )
  2:
a t t r . a p p e n d ( d )
  3:
d s D S [ a t t r ]
  4:
i n d I N D ( d s , d , C )
  5:
d_values list ( d s [ d ] )
  6:
d_values set ( d_values )
  7:
P O S [ ]
  8:
for  d_value in d_values  do
  9:
     P O S P O S + b_lower ( d s , i n d , d , d_value )
10:
end for
11:
P O S . s o r t ( )
12:
return  P O S
Algorithm 3 Negative Region with All Attributes
Require: Decision system D S , decision attribute d
Ensure: Negative region of all attributes n e g
  1:
d_values list ( D S [ d ] )
  2:
d_values set ( d_values )
  3:
NEG_C [ ]
  4:
ind_c IND_C ( D S , d )
  5:
for  d_value in d_values  do
  6:
     NEG_C NEG_C + b_upper ( D S , ind_c , d , d_value )
  7:
end for
  8:
NEG_C set ( NEG_C )
  9:
n e g diff_list ( list ( D S . i n d e x ) , NEG_C )
10:
n e g . s o r t ( )
11:
return  n e g
Algorithm 4 POS_C
Require: Decision system D S , decision column d
Ensure: Positive region of all attributes
  1:
d_values list ( D S [ d ] )
  2:
d_values set ( d_values )
  3:
POS_C [ ]
  4:
ind_c IND_C ( D S , d )
  5:
for  d_value in d_values  do
  6:
     b l o w e r b_lower ( D S , ind_c , d , d_value )
  7:
     POS_C POS_C + b l o w e r
  8:
end for
  9:
POS_C . s o r t ( )
10:
return  POS_C
Algorithm 5 is the main algorithm of MLReduct, which identifies all minimal reductions in the decision system D S .
It iterates through all attribute combinations generated by Algorithm 1 and compares their positive regions (computed using Algorithm 2) against the positive region of the whole attribute set (computed using Algorithm 4). Combinations that preserve the positive region are retained as reductions. Finally, the smallest reductions are selected to optimize the classification process.
Algorithm 5 MLReduct
Require: Decision system D S , decision column d
Ensure: List of reductions r e d u c t s
  1:
r e d u c t s [ ]
  2:
pos_c POS_C ( D S , d )
  3:
C list ( D S . c o l u m n s )
  4:
C . r e m o v e ( d )
  5:
count_reduct_found 0
  6:
for  c o m b i in c o m b i n a i s o n s ( D S , d )  do
  7:
     liste_combi list ( c o m b i )
  8:
    if  liste_combi C  then
  9:
         p o s P O S ( D S , d , liste_combi )
10:
        if  p o s = pos_c  then
11:
            count_reduct_found count_reduct_found + 1
12:
            r e d u c t s . a p p e n d ( liste_combi )
13:
        end if
14:
    end if
15:
end for
16:
r e d u c t s . s o r t ( )
17:
if  l e n ( r e d u c t s ) 0 then
18:
     min_len min ( [ l e n ( x ) for x in r e d u c t s ] )
19:
     r e d u c t s [ x for x in r e d u c t s if ( l e n ( x ) = min_len ]
20:
end if
21:
return  r e d u c t s

4.2.2. MLSpecialReduct

The MLSpecialReduct algorithm computes a minimal subset of attributes from a decision system D S that preserves the dependency of the decision column d. It iteratively builds the reduct R by adding attributes from the full set C that maximize dependency, stopping when R matches the dependency of all attributes or fails to improve. This process optimizes attribute selection for classification tasks.
Algorithm 6 computes the indiscernibility relation ind_c for the decision system D S . The indiscernibility relation groups indistinguishable objects based on the given attributes. In MLSpecialReduct, ind_c is used within the Dependance_Attributs function to evaluate attribute dependency by grouping objects for positive region computation.
Algorithm 6 IND_C
Require: Decision system D S , decision column d
Ensure: Indiscernibility relation ind_c
  1:
i n d [ ]
  2:
I S D S . d r o p ( d , a x i s = 1 )
  3:
g r o u p I S . g r o u p b y ( list ( I S . c o l u m n s ) )
  4:
for g in g r o u p  do
  5:
     g D S pd . DataFrame ( g [ 1 ] )
  6:
     i n d . a p p e n d ( list ( g D S . i n d e x ) )
  7:
end for
  8:
i n d . s o r t ( )
  9:
return  i n d
Algorithm 7 computes the dependency of a given set of attributes C in the decision system D S . Dependency measures the proportion of objects correctly classifiable into decision classes based on C. In MLSpecialReduct, Dependance_Attributs is repeatedly called (e.g., lines 8, 13, 17, 25) to assess the dependency of the current reduct R and potential attribute additions, guiding the iterative selection process.
Algorithm 7 Dependance_Attributs
Require: Decision system D S , attribute list C, decision column d
Ensure: Dependency of attributes
  1:
d s D S [ C ]
  2:
if  len ( list ( d s . c o l u m n s ) ) = 1 and list ( d s . c o l u m n s ) [ 0 ] = d  then
      return 0
  3:
end if
  4:
ind_c IND_C ( d s , d )
  5:
pos_c POS_C ( d s , d )
  6:
d e p float ( len ( pos_c ) ) / len ( d s . i n d e x )
  7:
return  d e p
B-lower approximation, Algorithm 8, a core concept in Rough Set Theory, identifies objects in a decision system D S that are certainly classifiable into a specific decision class d_value based on an indiscernibility relation ind_c . In MLSpecialReduct, the B-lower approximation is indirectly used via POS_C (called by Dependance_Attributs) to compute the positive region, assessing how well attributes classify objects.
Algorithm 8 B-Lower Approximation
Require: Decision system D S , indiscernibility relation ind_c , decision column d, decision value d_value
Ensure: B-lower approximation C X i or “error” if d_value not found
  1:
X D S . g r o u p b y ( d )
  2:
X i None
  3:
for  ( n a m e , g r o u p ) in X do
  4:
    if  n a m e = d_value  then
  5:
         X i pd . DataFrame ( g r o u p )
  6:
    end if
  7:
end for
  8:
if  X i None   then
  9:
     C X i [ ]
10:
    for  i n d e x in X i . i n d e x  do
11:
         idc_obj groupe_ind_c_obj , i n d e x )
12:
        if  ¬ any ( D S . a t [ index_obj 2 , d ] D S . a t [ i n d e x , d ] for index_obj 2 in idc_obj )  then
13:
            C X i . a p p e n d ( i n d e x )
14:
        end if
15:
    end for
16:
     C X i . s o r t ( )
      return  C X i
17:
else
      return “error”
18:
end if
B-upper approximation, Algorithm 9, another cornerstone of Rough Set Theory, defines objects in a decision system D S possibly belonging to a specific decision class d_value based on an indiscernibility relation ind_c . In MLSpecialReduct, the B-upper approximation is indirectly utilized through POS_C (via Dependance_Attributs) to support positive region calculations, though its primary role is secondary to the dependency focus.
Algorithm 10 is the main algorithm for computing a minimal subset of attributes (reduct) that preserves the dependency of the decision column d. It iteratively adds attributes to the reduct R that maximize dependency, stopping when R matches the dependency of the full attribute set or no further improvement is possible.
Algorithm 9 B-Upper Approximation
Require: Decision system D S , indiscernibility relation ind_c , decision column d, decision value d_value
Ensure: B-upper approximation C X i or “error” if d_value not found
  1:
X D S . g r o u p b y ( d )
  2:
X i None
  3:
for  ( n a m e , g r o u p ) in X do
  4:
    if  n a m e = d_value  then
  5:
         X i pd . DataFrame ( g r o u p )
  6:
    end if
  7:
end for
  8:
if  X i None   then
  9:
     C X i list ( X i . i n d e x )
10:
    for  i n d e x in X i . i n d e x  do
11:
         idc_obj groupe_ind_c_obj ( ind_c , i n d e x )
12:
         list_add [ index_obj 2 for index_obj 2 in idc_obj if D S . a t [ index_obj 2 , d ] D S . a t [ i n d e x , d ] ]
13:
         C X i C X i + list_add
14:
    end for
15:
     C X i . s o r t ( )
        return  C X i
16:
else
        return “error”
17:
end if
Algorithm 10 MLSpecialReduct
Require: Decision system D S , decision column d
Ensure: Subset of attributes R
  1:
C list ( D S . c o l u m n s )
  2:
C . r e m o v e ( d )
  3:
dep_C dependance_attributs ( D S , D S . c o l u m n s , d )
  4:
R [ ]
  5:
while True do
  6:
     T R
  7:
     a t t r T . c o p y ( ) + [ d ]
  8:
     dep_T dependance_attributs ( D S , a t t r , d )
  9:
     C_R diff_list ( C , R )
10:
     change_flag False
11:
    for x in C_R  do
12:
         a t t r 2 R . c o p y ( ) + [ x , d ]
13:
         dep_RUx dependance_attributs ( D S , a t t r 2 , d )
14:
        if  dep_RUx > dep_T  then
15:
            T R . c o p y ( ) + [ x ]
16:
            a t t r 3 T . c o p y ( ) + [ d ]
17:
            dep_T dependance_attributs ( D S , a t t r 3 , d )
18:
            change_flag True
19:
        end if
20:
    end for
21:
    if  ¬ change_flag  then
      return “error”
22:
    end if
23:
     R T . c o p y ( )
24:
     a t t r 4 R . c o p y ( ) + [ d ]
25:
     dep_R dependance_attributs ( D S , a t t r 4 , d )
26:
    if  dep_R = dep_C  then
      return  R
27:
    end if
28:
end while

4.2.3. MLVariance Threshold

The MLVarianceThreshold technique, an adaptation of the traditional VarianceThreshold method, removes features from a dataset with variance below a specified threshold. Low-variance attributes are assumed to have minimal impact on distinguishing data points. By filtering out these features, MLVarianceThreshold reduces dimensionality (Algorithm 11), accelerates computation, and improves model performance, especially for algorithms sensitive to irrelevant or redundant inputs.
Algorithm 11 MLVarianceThreshold
Require: DataFrame d f , variance threshold t h r e s h o l d
Ensure: DataFrame d f with low-variance features removed
  1:
v a r i a n c e s df . var ( )
  2:
selected_features [ c o l for c o l in d f . c o l u m n s if
    v a r i a n c e s [ c o l ] > t h r e s h o l d ]
  3:
d f d f [ selected_features ]
  4:
return  d f

4.2.4. MLFuzzyRoughSet

The MLFuzzyRoughSet method extends traditional Rough Set Theory by integrating fuzzy logic to manage uncertainty and imprecision in data. It approximates decision classes with fuzzy lower and upper sets, facilitating attribute reduction while maintaining classification power. This approach (Algorithm 12) excels with continuous or noisy datasets, enhancing robustness for machine learning tasks. The fuzzy lower approximation, a key component, computes membership degrees to refine class boundaries under uncertainty.
Algorithm 12 MLFuzzyRoughSet
Require: Decision system D S , decision column d, attribute subset C
Ensure: Fuzzy lower approximation for C
  1:
fuzzy_rels compute_fuzzy_relations ( D S , C )
  2:
d_values set ( D S [ d ] )
  3:
lower_approx { }
  4:
for  d_value in d_values  do
  5:
     class_rows [ i for i in D S . i n d e x if D S [ d ] [ i ] = d_value ]
  6:
    for x in D S . i n d e x  do
  7:
         m e m b e r s h i p min ( [ fuzzy_rels [ x , y ] for
    y in class_rows ] )
  8:
         lower_approx [ x , d_value ] m e m b e r s h i p
  9:
    end for
10:
end for
11:
return  lower_approx

4.3. Scalability Considerations

The computational characteristics (Table 3) of our methods reveal key trade-offs:
Table 3. Computational complexity comparison.
  • MLReduct’s exhaustive search ( O ( 2 n ) complexity) limits it to small-to-medium feature spaces ( n 20 ), but guarantees optimal reducts.
  • MLSpecialReduct’s heuristic approach ( O ( n 2 ) ) scales better while maintaining accuracy.
  • For high-dimensional data, we recommend the following:
    • Pre-filtering with fast methods (e.g., MLVarianceThreshold);
    • Hybrid approaches combining RST with sampling techniques.

5. Experimentation and Validation

5.1. Used Dataset

5.1.1. Dataset Description

The dataset used in this study was a private dataset meticulously collected through extensive research and data acquisition efforts over a significant period. It was specifically designed for analyzing the impact of various medical indicators on cardiovascular health. This dataset contains a comprehensive set of features related to heart function and disease diagnosis, making it highly valuable for machine learning applications in medical research.
Unlike publicly available datasets, our dataset is the result of extensive research efforts aimed at capturing the complexities of heart diseases. It was collected over a long period, covering a diverse range of patients with varying degrees of cardiovascular conditions. This dataset provides a unique opportunity to develop robust models for detecting heart disease patterns and predicting risk factors with high accuracy.

5.1.2. Dataset Attributes

The dataset consists of 14 attributes, which are described below:
  • Age: Age of the patient (years).
  • Sex: Gender of the patient (1 = male, 0 = female).
  • CP (Chest Pain Type): Categorized as follows:
    1 = Typical Angina;
    2 = Atypical Angina;
    3 = Non-Anginal Pain;
    4 = Asymptomatic.
  • Trestbps (Resting Blood Pressure): Resting blood pressure (mm Hg).
  • Chol (Serum Cholesterol): Serum cholesterol level (mg/dL).
  • FBS (Fasting Blood Sugar): Fasting blood sugar level (>120 mg/dL: 1 = True, 0 = False).
  • RestECG (Resting Electrocardiographic Results): Categorized as follows:
    0 = Normal;
    1 = ST-T wave abnormality;
    2 = Left ventricular hypertrophy.
  • Thalach (Maximum Heart Rate Achieved): Maximum heart rate during exercise.
  • Exang (Exercise-Induced Angina): Indicates presence of angina (1 = Yes, 0 = No).
  • Oldpeak (ST Depression Induced by Exercise): ST depression relative to rest.
  • Slope: Slope of the peak exercise ST segment:
    0 = Upsloping;
    1 = Flat;
    2 = Downsloping.
  • CA (Number of Major Vessels Colored by Fluoroscopy): Ranges from 0 to 3.
  • Thal: Thalassemia categories:
    1 = Normal;
    2 = Fixed defect;
    3 = Reversible defect.
  • Target: Indicates presence of cardiovascular disease (1 = Yes, 0 = No).

5.1.3. Sample Data

Table 4 presents a sample of the dataset.
Table 4. Sample of the dataset used in this study.
This dataset was used to analyze correlations between medical attributes and cardiovascular risk factors. The classification models aim to predict the likelihood of cardiovascular disease based on these attributes.

5.1.4. Dataset Characteristics

Table 5 compares the specifications of our private dataset with the UCI Heart Disease dataset.
Table 5. Comparative dataset specifications.
Data Collection: Our private cardiovascular dataset was prospectively collected over 3 years (2020–2023) from partner hospitals. It contains the following:
  • A total of 14 Clinically Validated Features:
    6 continuous (age, BP, cholesterol, etc.);
    5 ordinal (chest pain type, ECG results);
    3 nominal (gender, thalassemia, etc.).
  • Strict Inclusion Criteria:
    Adults (29–77 years) with complete labwork;
    Confirmed diagnosis via angiography (gold standard).
Preprocessing Pipeline:
Table 6 summarizes the preprocessing steps applied to the clinical data, along with their medical rationale.
Table 6. Preprocessing steps with clinical rationale.
Reproducibility Measures:
  • Identical preprocessing applied to both datasets;
  • Publicly available UCI dataset used for benchmarking;
  • Full preprocessing code available upon request from the corresponding author (see Data Availability Statement).

5.2. Experimental Protocol

To ensure statistical rigor and reproducibility, we implemented the following evaluation framework:
  • Ten-fold stratified cross-validation:
    Fixed random seed (42) for reproducible splits;
    Stratification by both class labels and key demographics (age, gender);
    9:1 training/validation ratio maintained across all folds.
  • L2 regularization ( λ = 0.01 ):
    Applied consistently across all classifiers (SVM, Neural Net, etc.);
    Penalty strength selected via grid search on validation folds;
    Regularization terms normalized by feature counts.
  • Held-out validation set:
    20% of data (n = 200) reserved for final evaluation;
    Balanced for class distribution (50% CVD positive/negative);
    Never used during model development or hyperparameter tuning.
  • Statistical testing:
    Paired t-tests ( α = 0.01 ) on fold-wise performance metrics;
    Bonferroni correction for multiple comparisons;
    Effect sizes reported via Cohen’s d.
All results are reported as the mean ± standard deviation across 10 folds.

5.3. Model Evaluation

5.3.1. Model Evaluation Without Rough Set Theory Feature Selection

In this section, we evaluate our model’s performance without applying Rough Set Theory (RST) for feature selection. This comparison is important as it highlights the substantial advantages RST provides.
To establish a baseline, we first used traditional feature selection methods to train our models. These included common techniques like Principal Component Analysis (PCA), Recursive Feature Elimination (RFE), and basic statistical methods. While these approaches are popular and have their strengths, they often fail to fully capture the complex patterns and relationships present in the data.
In Table 7 below, we compare the performance of several machine learning models without applying RST feature selection. The key metrics for comparison are precision, recall, and F1-score.
Table 7. Comparison classifier models without RST feature selection.
Among all classifier models, we found that the best models with this first model evaluation (without feature selection) were Gaussian Process (Figure 2) and AdaBoost (Figure 3).
Figure 2. Evaluation of Gaussian Process without RST feature selection.
Figure 3. Evaluation of AdaBoost without RST feature selection.

5.3.2. Evaluation with MLReduct Feature Selection

Feature selection using MLReduct excels in high-dimensional datasets, where redundant or irrelevant attributes can mask patterns and hinder model efficiency. By applying RST to reduce dimensionality, MLReduct retains critical predictive features, simplifying models and boosting computational performance. Benefits include the following:
  • Model Simplification: Reducing the number of features simplifies the machine learning model, which enhances its interpretability. Simpler models are often easier to debug and analyze, making the decision-making process more transparent.
  • Increased Efficiency: With fewer attributes, the training process becomes faster and requires less computational power. This is especially useful for large datasets where processing time can be a bottleneck.
  • Noise Reduction: Irrelevant features can introduce noise, decreasing model accuracy. By eliminating unnecessary attributes, the MLReduct method improves the quality of the model, leading to better generalization on unseen data.
  • Improved Model Performance: Feature selection via the MLReduct method often results in improved predictive performance. By focusing only on the essential attributes, the model is better equipped to make accurate predictions.
  • Baseline for Comparison: Comparing models with all attributes versus MLReduct-selected features validates its impact on optimization.
In Table 8 below, we compare the performance of several machine learning models both before and after applying the MLReduct method. The key metrics for comparison are precision, recall, and F1-score.
Table 8. Performance comparison of classifier models with MLReduct feature selection.
Among the models tested, the Random Forest algorithm (Figure 4) performed the best when using the MLReduct method. The use of reducts helped in reducing model complexity while maintaining high performance across all metrics (precision, recall, F1-score).
Figure 4. Evaluation of Random Forest model with MLReduct method.
This analysis demonstrates that integrating the MLReduct method can lead to more efficient and effective machine learning models. By focusing on the most relevant features, we reduce noise and complexity, which results in improved model performance.

5.3.3. Evaluation with MLSpecialReduct Feature Selection

The MLSpecialReduct algorithm offers a robust approach to feature selection by identifying a minimal subset of attributes that maximizes dependency with the decision attribute. Unlike the general reduct method, which focuses on finding all possible reducts, MLSpecialReduct seeks an optimal set of features by iteratively evaluating the dependency of attribute subsets. This ensures that the selected subset not only retains essential information but also eliminates redundant or irrelevant features. The method enhances computational efficiency and model interpretability, making it especially valuable for large and complex datasets. Benefits of using MLSpecialReduct in evaluation include the following:
  • Optimal Attribute Selection: MLSpecialReduct ensures that only the most influential attributes are selected, which improves the accuracy of the model. The subset of attributes found by this method retains all the relevant information while discarding redundant or irrelevant features, resulting in a more concise and interpretable model.
  • Computational Efficiency: Reducing the number of attributes decreases the computational cost of training machine learning models. This is especially important for large datasets where computational resources may be limited. A smaller attribute set results in faster training times and less memory usage.
  • Noise Reduction: By focusing only on the attributes that have the highest dependency with the decision attribute, the method minimizes the inclusion of noisy or irrelevant data. This can lead to better generalization on unseen data, as the model is less likely to overfit to irrelevant details in the training set.
  • Performance Improvement: Using MLSpecialReduct, we can compare models built with and without feature selection. Typically, models using a reduced attribute set will perform similarly or better in terms of accuracy, precision, recall, and F1-score, while being more efficient and easier to interpret.
Table 9 below compares various machine learning models after applying the MLSpecialReduce algorithm. Key performance metrics such as precision, recall, and F1-score are evaluated for each model.
Table 9. Comparison of classifier models with MLSpecialReduct feature selection.
We observe that the Random Forest algorithm (Figure 5) achieves the highest performance across all metrics when paired with the MLSpecialReduct feature selection method. This suggests that applying MLSpecialReduct not only enhances the model’s computational efficiency but also maintains or improves its predictive capabilities. By focusing on the most critical attributes, Random Forest outperforms other models in terms of precision, recall, and F1-score.
Figure 5. Evaluation of Random Forest model with MLSpecialReduct method.

5.3.4. Evaluation with MLVarianceThreshold

We benchmarked model performance using MLVarianceThreshold, our adapted version of the traditional VarianceThreshold method, to compare against RST-based approaches. MLVarianceThreshold removes low-variance features—attributes with minimal variability and limited predictive value—enhancing efficiency and simplifying models. Benefits include the following:
  • Noise Reduction: Low-variance features often add noise rather than signal; removing them improves dataset quality and model stability.
  • Increased Efficiency: A reduced feature set lowers computational demands, speeding up training and evaluation.
  • Model Simplification: Retaining high-variance attributes enhances interpretability and reduces complexity.
  • Baseline Comparison: MLVarianceThreshold provides a baseline to evaluate feature variability’s impact versus advanced RST methods.
Table 10 shows the classifier performance post-MLVarianceThreshold, using precision, recall, and F1-score.
Table 10. Performance of classifier models with MLVarianceThreshold.
MLVarianceThreshold yields modest performance (F1-scores: 0.72–0.77), consistent with its range in Section 5.1 (0.72–0.77), but lags behind RST methods like MLSpecialReduct (F1: 0.99). This underscores the limitations of variance-based selection compared to dependency-driven approaches.

5.3.5. Evaluation with MLFuzzyRoughSet

The MLFuzzyRoughSet method profoundly impacts model evaluation by pinpointing crucial features. This advanced method prioritizes attributes that decisively influence decision-making processes, ensuring that only the most pertinent data points are retained. By streamlining the attribute selection process, MLFRS not only enhances model performance but also improves interpretability and computational efficiency. Benefits include the following:
  • Optimal Feature Selection: MLFuzzyRoughSet selects influential attributes, preserving essential information.
  • Improved Performance: Focusing on relevant features boosts model metrics over using all attributes.
  • Enhanced Interpretability: A simpler feature set improves model transparency.
  • Noise Reduction: Eliminating less relevant attributes reduces noise, enhancing robustness.
Below is Table 11 comparing various machine learning models after applying the MLFuzzyRoughSet method. Key performance metrics such as precision, recall, and F1-score are evaluated for each model.
Table 11. Comparison classifier models with MLFuzzyRoughSet.
Among all the classifier models, we found that the best models with this type of evaluation (with MLFuzzyRoughSet) are Random Forest (Figure 6) and Naive Bayes (Figure 7).
Figure 6. Evaluation of Random Forest with MLFuzzyRoughSet method.
Figure 7. Evaluation of Naive Bayes with MLFuzzyRoughSet method.

5.4. Computational Efficiency Analysis

Table 12 quantifies the resource–accuracy trade-offs across methods, while Figure 8 visualizes the non-linear relationships.
Table 12. Computational efficiency comparison of feature selection methods.
Figure 8. Resource–accuracy trade-offs: (left) Accuracy plateaus with increased training time, with MLSpecialReduct (the star) achieving optimal balance. (right) Memory–accuracy relationship shows diminishing returns beyond 200 MB.
Our analysis reveals the following:
  • MLSpecialReduct achieves 45% faster training than MLReduct (68 s vs. 142 s) with higher accuracy (0.99 vs. 0.87);
  • Memory usage scales linearly with method complexity (Figure 8, right);
  • The accuracy–time curve (Figure 8, left) suggests diminishing returns beyond 70 s.

5.5. Public Dataset Validation

To ensure generalizability, we replicated our analysis on two benchmark datasets (Table 13):
Table 13. Performance comparison across datasets.
Figure 9 illustrates the performance comparison across datasets with error bars showing the standard deviation.
Figure 9. Performance comparison across datasets. Error bars show standard deviation across 10 folds. MLSpecialReduct maintains consistent superiority on both datasets, with marginal performance differences attributable to sample size (303 vs. 1000) and feature distribution variations.
Key Observations:
  • Ranking Consistency: All methods maintained identical performance rankings across datasets (Kendall’s τ = 1.0, p < 0.01).
  • Performance Gap:
    Absolute accuracy drop: 2% (UCI) vs. private dataset.
    Relative F1-score stability: Δ 1.5% across all methods.
  • Statistical Significance: Paired t-tests confirm differences are significant (p < 0.05) for all method pairs.

5.6. Statistical Validation

Statistical Analysis:
The cross-validation results (Table 14) and distribution (Figure 10) reveal three key insights:
Table 14. Cross-validation performance (10 folds).
Figure 10. Accuracy distribution across 10-fold cross-validation. The box represents the interquartile range (IQR: Q1–Q3), the horizontal line indicates the median, and whiskers extend to 1.5 × IQR. Outliers are shown as individual points. MLSpecialReduct demonstrates both superior accuracy and consistency across folds.
  • Performance Superiority:
    • MLSpecialReduct achieved significantly higher accuracy than MLReduct (12% improvement, p = 2.3 × 10 6 ) and MLVarianceThreshold (22% improvement, p = 9.1 × 10 9 ) based on paired t-tests.
    • The narrow IQR (0.98–1.00) in Figure 10 shows 75% of folds achieved ≥0.98 accuracy.
  • Robustness:
    • Minimal standard deviations (≤0.01) indicate consistent performance regardless of data partitioning.
    • No outliers were observed for MLSpecialReduct, unlike for MLVarianceThreshold which had two folds below 0.76 accuracy.
  • Statistical Significance:
    • Effect sizes (Cohen’s d) were large: 6.2 vs. MLReduct and 9.8 vs. MLVarianceThreshold.
    • Bonferroni-corrected p-values remained significant ( p a d j < 0.001 ).
Clinical/Engineering Implications: The combination of high accuracy (0.99) and low variability ( σ = 0.01) makes MLSpecialReduct particularly suitable for the following:
  • High-stakes medical diagnostics where false negatives are critical;
  • Real-time systems requiring predictable performance.

5.7. Comparison with State of the Art

In this section, we compare the performance of our proposed algorithms for feature selection like MLReduct and MLSpecialReduct with state-of-the-art techniques in feature selection and machine learning. Our goal is to demonstrate the superiority of our methods in terms of accuracy, interpretability, and computational efficiency.

5.7.1. Comparison with Traditional Feature Selection Methods

Traditional feature selection methods, such as Principal Component Analysis (PCA), Recursive Feature Elimination (RFE), and MLVarianceThreshold, have been widely adopted in machine learning workflows as established baselines. However, these methods often struggle with high-dimensional datasets and fail to capture complex relationships between attributes, particularly under uncertainty and noise. Table 15 provides a comparative analysis of our RST-based methods against these traditional approaches.
Table 15. Comparison of RST-based methods with traditional feature selection techniques.
As shown in Table 15, our RST-based methods consistently outperform traditional techniques. The MLSpecialReduct method, in particular, achieves a peak F1-score of 0.98, significantly surpassing the best-performing traditional method (RFE with an F1-score of 0.79). This highlights the ability of RST-based methods to identify and retain the most relevant features, leading to superior model performance under challenging conditions.

5.7.2. Comparison with State-of-the-Art Feature Selection Techniques

Our RST-based methods were benchmarked against seven categories of modern feature selectors, as shown in Table 16. The analysis reveals three key advantages:
Table 16. Comprehensive benchmark of feature selection methods.
Key Findings:
  • Performance Superiority:
    11% higher accuracy than the best non-RST method (GA + SVM);
    12% improvement over prior RST work (Tri-Level Reduct);
    Consistent F1-score advantage ( Δ 0.10 ).
  • Efficiency Gains:
    3× faster than comparable RST methods;
    Real-time capable (<70 s) for clinical applications.
  • Interpretability:
    Only method achieving both “High” interpretability and >0.95 accuracy;
    Generates human-readable rules.
Limitations: Our approach shows marginally higher runtime than filter methods (e.g., Mutual Info) but provides significantly better accuracy (+17%) and explainability. This trade-off is justified in medical applications where both performance and interpretability are critical.

5.7.3. Comparison with State-of-the-Art Classifiers

In addition to feature selection, we compare the performance of classifiers trained using RST-based feature selection methods against state-of-the-art classifiers. Table 17 presents the results of this comparison.
Table 17. Comparison of classifiers using RST-based feature selection with state-of-the-art classifiers.
As shown in Table 17, the Random Forest classifier trained using the MLSpecialReduct method achieves an F1-score of 0.98, outperforming state-of-the-art classifiers such as XGBoost (F1-score of 0.90) and LightGBM (F1-score of 0.91). This demonstrates the potential of RST-based feature selection to enhance the performance of even the most advanced classifiers.

5.7.4. Discussion

The results of our comparisons highlight the significant advantages of RST-based feature selection methods over traditional and state-of-the-art techniques. The MLSpecialReduct method, in particular, stands out for its ability to achieve high accuracy while maintaining interpretability and computational efficiency. These advantages make RST-based methods particularly well suited for real-world applications, such as healthcare diagnostics and IoT systems, where both performance and interpretability are critical.
Furthermore, our findings suggest that RST-based methods can serve as a foundation for future research in explainable AI (XAI) and hybrid feature selection models. By combining the strengths of RST with other advanced techniques, it may be possible to develop even more powerful and interpretable machine learning workflows.
Our study demonstrates that the MLSpecialReduct algorithm represents a significant advancement over existing state-of-the-art techniques. These methods not only improve model performance but also enhance interpretability and computational efficiency, making them a valuable tool for modern data-driven applications.

6. Conclusions

6.1. General Conclusions

This study assessed the impact of Rough Set Theory (RST)-based feature selection methods—integrated with preprocessing steps like encoding, normalization, discretization, and outlier removal—on machine learning classifier performance. Key findings include the following:
  • MLSpecialReduct: Achieved a peak Random Forest accuracy of 0.99, demonstrating its superior ability to minimize attribute sets while maximizing predictive power.
  • MLReduct: Boosted Random Forest accuracy to 0.87, confirming its effectiveness as a foundational RST method for feature selection.
  • MLFuzzyRoughSet: Improved Naive Bayes and Random Forest accuracies to 0.83, showcasing its robustness in handling uncertainty and imprecision.
  • MLVarianceThreshold: Yielded accuracies of 0.72–0.77 across classifiers, underscoring the limitations of traditional variance-based selection compared to RST approaches.
Despite achieving 99% accuracy, we mitigated overfitting through the following:
  • Stratified 10-fold cross-validation;
  • L2 regularization in all classifiers (e.g., SVM, Neural Net);
  • Hold-out validation (20% unseen data).
The consistency of results across folds (std. dev. 0.03 ) further supports model robustness. These results highlight the transformative potential of RST-based methods, enhancing accuracy, efficiency, and interpretability in machine learning, especially for imperfect or uncertain data.

6.2. Practical Implications and Future Directions

Our findings offer practical benefits for optimizing machine learning in fields like healthcare diagnostics and IoT systems. Methods like MLSpecialReduct and MLReduct distill minimal, discriminative feature sets, reducing computational load while preserving performance. Their inherent interpretability makes them ideal for explainable AI (XAI), fostering trust in high-stakes applications.
Future research should explore integrating RST with paradigms like deep learning, ensemble methods, or reinforcement learning to enhance performance further. Applying these techniques to diverse, real-world datasets—featuring imbalance, noise, or high dimensionality—will test their adaptability. Investigating their use in dynamic contexts, such as real-time or online learning, could enable adaptive models. Developing automated, scalable RST-based frameworks and evaluating their impact on interpretability and efficiency will drive progress toward advanced, transparent AI systems.

Author Contributions

Conceptualization, S.N. and O.E.O.; Methodology, O.E.O.; Software, O.E.O.; Writing—review & editing, S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU251576].

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restrictions, as it’s a private cardiovascular dataset collected from partner hospitals.

Acknowledgments

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU251576].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Som, T.; Shreevastava, S.; Tiwari, A.K.; Singh, S. Fuzzy Rough Set Theory-Based Feature Selection: A Review. Math. Methods Interdiscip. Sci. 2020, 12, 145–166. [Google Scholar]
  2. Ye, J.; Sun, B.; Bao, Q.; Che, C.; Huang, Q.; Chu, X. A new multi-objective decision-making method with diversified weights and Pythagorean fuzzy rough sets. Comput. Ind. Eng. 2023, 182, 109406. [Google Scholar] [CrossRef]
  3. Singh, A.; Singh, A.; Sharma, H.K.; Majumder, S. Criteria selection of housing loan based on dominance-based rough set theory: An Indian case. J. Risk Finan. Manag. 2023, 16, 309. [Google Scholar] [CrossRef]
  4. Chen, R.-C.; Dewi, C.; Huang, S.-W.; Caraka, R.E. Selecting critical features for data classification based on machine learning methods. J. Big Data 2020, 7, 52. [Google Scholar] [CrossRef]
  5. Khosravi, F.; Izbirak, G. A framework of index system for gauging the sustainability of Iranian provinces by fusing Analytical Hierarchy Process (AHP) and Rough Set Theory (RST). Socio-Econ. Plan. Sci. 2024, 95, 101975. [Google Scholar] [CrossRef]
  6. Strasser, S.; Klettke, M. Transparent Data Preprocessing for Machine Learning. In Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics, Santiago, Chile, 14 June 2024. [Google Scholar]
  7. Liu, H.; Zhou, M.; Liu, Q. An embedded feature selection method for imbalanced data classification. IEEE/CAA J. Autom. Sin. 2019, 6, 703–715. [Google Scholar] [CrossRef]
  8. Zong, Z.; Guan, Y. AI-driven intelligent data analytics and predictive analysis in Industry 4.0: Transforming knowledge, innovation, and efficiency. J. Knowl. Econ. 2024, 15, 1–40. [Google Scholar] [CrossRef]
  9. Islam, A.; Majumder, Z.H.; Miah, S.; Jannaty, S. Precision healthcare: A deep dive into machine learning algorithms and feature selection strategies for accurate heart disease prediction. Comput. Biol. Med. 2024, 176, 108432. [Google Scholar] [CrossRef]
  10. Theng, D.; Bhoyar, K.K. Feature selection techniques for machine learning: A survey of more than two decades of research. Knowl. Inf. Syst. 2024, 66, 1575–1637. [Google Scholar] [CrossRef]
  11. Singh, K.N.; Mantri, J.K. Clinical decision support system based on RST with machine learning for medical data classification. Multimed. Tools Appl. 2024, 83, 39707–39730. [Google Scholar] [CrossRef]
  12. Akram, M.; Zahid, S. Group decision-making method with Pythagorean fuzzy rough number for the evaluation of best design concept. Granul. Comput. 2023, 8, 1121–1148. [Google Scholar] [CrossRef]
  13. Chen, T.; Carlos, G. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  14. Chen, H.; Li, T.; Fan, X.; Luo, C. Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 2019, 483, 1–2. [Google Scholar] [CrossRef]
  15. Zhang, X.; Yao, Y. Tri-level attribute reduction in rough set theory. Expert Syst. Appl. 2022, 190, 116187. [Google Scholar] [CrossRef]
  16. Manna, T.; Anitha, A. Hybridization of rough set–wrapper method with regularized combinational LSTM for seasonal air quality index prediction. Neural Comput. Appl. 2024, 36, 2921–2940. [Google Scholar] [CrossRef]
  17. Liu, T.; Yang, L. Financial risk early warning model for listed companies using bp neural network and rough set theory. IEEE Access 2024, 12, 27456–27464. [Google Scholar] [CrossRef]
  18. Lin, Q.; Chen, X.; Chen, C.; Garibaldi, J.M. Boundary-wise loss for medical image segmentation based on fuzzy rough sets. Inf. Sci. 2024, 661, 120183. [Google Scholar] [CrossRef]
  19. Fatima, A.; Javaid, I. Rough set theory applied to finite dimensional vector spaces. Inf. Sci. 2024, 659, 120072. [Google Scholar] [CrossRef]
  20. Singh, K.N.; Mantri, J.K. An intelligent recommender system using machine learning association rules and rough set for disease prediction from incomplete symptom set. Decis. Anal. J. 2024, 11, 100468. [Google Scholar] [CrossRef]
  21. Nayani, S.; Rao, P.S.; Lakshmi, D.R. Combination of deep learning models for student’s performance prediction with a development of entropy weighted rough set feature mining. Cybern. Syst. 2025, 56, 170–212. [Google Scholar] [CrossRef]
  22. Xu, W.; Yan, Y.; Li, X. Sequential rough set: A conservative extension of Pawlak’s classical rough set. Artif. Intell. Rev. 2025, 58, 9. [Google Scholar] [CrossRef]
  23. Kumari, N.; Acharjya, D.P. Data classification using rough set and bioinspired computing in healthcare applications—An extensive review. Multimed. Tools Appl. 2023, 82, 13479–13505. [Google Scholar] [CrossRef]
  24. Bohrer, J.d.S.; Dorn, M. Enhancing classification with hybrid feature selection: A multi-objective genetic algorithm for high-dimensional data. Expert Syst. Appl. 2024, 255, 124518. [Google Scholar] [CrossRef]
  25. Wang, C.; Wang, C.; Qian, Y.; Leng, Q. Feature selection based on weighted fuzzy rough sets. IEEE Trans. Fuzzy Syst. 2024, 32, 4027–4037. [Google Scholar] [CrossRef]
  26. Onu, O.P.; Muriana, B. Rough set theory and its applications in data mining. Technology 2024, 7, 84–92. [Google Scholar]
  27. Yadav, J. Fuzzy Logic and Fuzzy Set Theory: Overview of Mathematical Preliminaries. In Fuzzy Systems Modeling in Environmental and Health Risk Assessment; Elsevier: Amsterdam, The Netherlands, 2023; pp. 11–29. [Google Scholar]
  28. Guo, X.; Li, H. Attribute reduction algorithm of rough sets based on spatial optimization. arXiv 2024, arXiv:2405.09292. [Google Scholar]
  29. Pulinkala, G. Predicting Biomarkers/Candidate Genes Involved in iALL Using Rough Sets Based Interpretable Machine Learning Model. Master’s Thesis, Uppsala University, Uppsala, Sweden, 2023. Available online: https://www.diva-portal.org/smash/get/diva2:1803700/FULLTEXT01.pdf (accessed on 25 April 2025).
  30. Chen, Q.; Xie, L.; Zeng, L.; Jiang, S.; Ding, W.; Huang, X.; Wang, H. Neighborhood rough residual network–based outlier detection method in IoT-enabled maritime transportation systems. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11800–11811. [Google Scholar] [CrossRef]
  31. Mwangi, I.K.; Nderu, L.; Mwangi, R.W.; Njagi, D.G. Hybrid interpretable model using roughset theory and association rule mining to detect interaction terms in a generalized linear model. Expert Syst. Appl. 2023, 234, 121092. [Google Scholar] [CrossRef]
  32. Guo, S.; Han, L.; Guo, Y. Advanced Technologies in Healthcare; Springer: Singapore, 2024. [Google Scholar]
  33. Chen, Q.; Zeng, L.; Ding, W. FRCNN: A Combination of Fuzzy-Rough-Set-Based Feature Discretization and Convolutional Neural Network for Segmenting Subretinal Fluid Lesions. IEEE Trans. Fuzzy Syst. 2024, 33, 350–364. [Google Scholar] [CrossRef]
  34. Kaya, Y.; Ramazan, T. Comparison of discretization methods for classifier decision trees and decision rules on medical data sets. Avrupa Bilim Teknol. Derg. 2022, 35, 275–281. [Google Scholar] [CrossRef]
  35. Dahouda, M.K.; Joe, I. A Deep-learned embedding technique for categorical features encoding. IEEE Access 2021, 9, 114381–114391. [Google Scholar] [CrossRef]
  36. Huang, L.; Qin, J.; Zhou, Y.; Zhu, F.; Liu, L.; Shao, L. Normalization techniques in training dnns: Methodology, analysis and application. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10173–10196. [Google Scholar] [CrossRef] [PubMed]
  37. Li, F.; Chen, W. The Role of Attribute Normalization in Data Preprocessing for Machine Learning. Knowl.-Based Syst. 2019, 170, 1–10. [Google Scholar] [CrossRef]
  38. Cabello-Solorzano, K.; Ortigosa de Araujo, I.; Peña, M.; Correia, L.; Tallón-Ballesteros, A.J. The impact of data normalization on the accuracy of machine learning algorithms: A comparative analysis. In International Conference on Soft Computing Models in Industrial and Environmental Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 344–353. [Google Scholar]
  39. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
  40. Takefuji, Y. Beyond XGBoost and SHAP: Unveiling true feature importance. J. Hazard. Mater. 2025, 488, 137382. [Google Scholar] [CrossRef]
  41. Li, Y.; Chen, C.-Y.; Wasserman, W.W. Deep feature selection: Theory and application to identify enhancers and promoters. J. Comput. Biol. 2016, 23, 322–336. [Google Scholar] [CrossRef]
  42. Cai, M.; Yan, M.; Wang, P.; Xu, F. Multi-label feature selection based on fuzzy rough sets with metric learning and label enhancement. Int. J. Approx. Reasoning 2024, 168, 109149. [Google Scholar] [CrossRef]
  43. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.