Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm

Nassih, Rym; Berrado, Abdelaziz

doi:10.3390/a18030136

Open AccessArticle

Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm

by

Rym Nassih

^*

and

Abdelaziz Berrado

Equipe AMIPS-Ecole Mohammadia d’Ingénieurs, Mohammed V University in Rabat, Avenue Ibn Sina, Agdal, Rabat BP765, Morocco

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(3), 136; https://doi.org/10.3390/a18030136

Submission received: 18 December 2024 / Revised: 19 February 2025 / Accepted: 20 February 2025 / Published: 3 March 2025

(This article belongs to the Special Issue Artificial Intelligence Algorithms for Medicine (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

The Patient Rule Induction Method is a data mining technique used for identifying patterns in datasets, particularly focusing on discovering regions of the chosen input space where the response variable is unusually high or low. It falls in the subgroup discovery field, where finding small groups is more relevant for the explainability of the results, although it is not a classification technique, per se. In this paper, we introduce a new framework for breast cancer classification based on the PRIM. This new method involves, first, the random choice of different input spaces for each class label; second, the organization and pruning of the rules using metarules; and finally, it also includes the proposition of a way to handle the class overlapping and, hence, define the final classifier. The framework is tested on five real-life breast cancer datasets compared to three often-used algorithms for breast cancer classification: XG Boost, Logistic Regression, and Random Forest. Across the four metrics and datasets, both our PRIM-based framework and Random Forest demonstrate robust performance, with our framework showing notable accuracy and recall. XGBoost maintains strong F1-scores across the board, indicating balanced precision and recall. On the other hand, Logistic Regression, while competent, generally underperforms compared to the other algorithms, especially in terms of accuracy and recall, achieving 94.1% accuracy against 96.8% and 85.4% recall against 94.2% for the PRIM-based framework on the Wisconsin dataset.

Keywords:

breast cancer; machine learning; bump hunting; patient rule induction method; explainability

1. Introduction

Machine learning has been applied to several fields, but the medical field is considered the most delicate since it interferes with the lives of patients. Breast cancer is the most prevalent cancer among women. Compared to other cancer types, breast cancer has a comparatively high mortality rate [1]. It also has the greatest fatality rate of any cancer in women due to the huge number of incidences [2]. For survival, early diagnosis is essential. A lack of funding and undeveloped healthcare infrastructures, especially in developing countries, make it challenging for patients to see a doctor. The patient survival rate can be improved by creating early diagnosis programs based on early indications and symptoms, hence the need for explainable supervised machine learning models for the classification of breast cancer patients and the identification of conditions that explain different breast cancer patient classes. To conclude, the objective of this paper is to introduce a new framework for explainable breast cancer classification based on an adapted version of the PRIM, a bump-hunting algorithm [3].

Even though scientists have worked on improving the accuracy of ML models, explainability remains necessary to help practitioners in their decision-making process. Current research has focused on making the models understandable and usable [4]. In [5], the authors give a review on explainable artificial intelligence (AI) and accentuate the focus lately on this field involving the interaction with humans.

In this context, it is essential to focus on both sides: the high accuracy of the model and the explainable and interpretable results. One of the approaches that has been constructed to deal with research in an input space of interesting groups and behave well with high-dimensional data is bump hunting. Introduced by Friedman and Fisher [3], the Patient Rule Induction Method is a bump-hunting algorithm that consists of searching at each iteration for the most optimal rule. Since it peels one dimension at a time, it constructs, at the end, a rule shaped like a box, hence the term box induction used in the literature. The PRIM is not a classification method. The output is a set of rules corresponding to one class. It is, therefore, a powerful data mining procedure to explore the dataset. We have chosen to adapt this method in a way that builds an interpretable and explainable classifier for different types of breast cancer classification problems.

To this end, we introduce, in this paper, a framework for breast cancer classification based on an adaptation of the Patient Rule Induction Method handling both categorical and numeric features based on the Random PRIM Classifier developed in [6]. The suggested framework allows us to build classification models that are as accurate as cutting-edge ensemble techniques while still being easy to understand and explain. Indeed, discovering new subgroups in the data space could lead, in this medical field, to new insights into breast cancer diagnosis and prognosis problems.

In the proposed framework for breast cancer classification, we run, for each class, several iterations of the PRIM algorithm on random subsets of features to find all the rules. Then, cross-validation is used to select the rules to produce the correct classifier for each dataset. Finally, the resulting classification rules are organized and pruned using metarules [7]. At this level, we identify containment between the rules of the same class and identify the overlapping regions between the rules representing different classes.

The rest of the paper is organized as follows. In the next section, we briefly review the prediction of breast cancer using machine learning. Then, we present the three algorithms for classification selected to be part of the comparison with the adapted PRIM, namely, Random Forest, XGBoost, and Logistic Regression. We explain, after, the Patient Rule Induction Method, its functioning, and why it is more interesting to work with a bump-hunting method in the medical field. In Section 3, we explain in detail the framework proposed for breast cancer classification. We review the methods to validate a classifier, the way to handle rule conflict, and explain how metarules allow for pruning redundant and irrelevant rules. An empirical evaluation follows in Section 4 with the results, Section 5 includes the discussion and limitations of the work, and finally, in Section 6, we provide the conclusion.

2. Material and Methods

In this section, we start by explaining how breast cancer classification is handled in the literature. This allowed us to identify three major supervised learning algorithms used for this classification problem, namely Random Forest [8], XGBOOST [9], and Logistic Regression [10], for which we give an overview in Section 2.2. Finally, we present the PRIM algorithm, as it is an essential component in our framework, and provide a literature review of the PRIM’s applications, which indicates its popularity in medical applications.

2.1. State of the Art of Breast Cancer Classification

Breast cancer is very popular among cancer predictions in machine learning. Several researchers have developed and implemented techniques over the years to look for new patient profiles or feature combinations that can lead medical teams to better breast cancer detection [11]. In this section, we present several relevant papers related to using machine learning algorithms to predict breast cancer to determine the algorithms to adopt for our empirical study. More detailed reviews can be found in [12,13,14].

The authors in [15] have investigated breast cancer in Chinese women by looking at the signs before the symptoms are even shown. It has been shown that it is as successful as the accuracy of the prediction model used for the screening. The article evaluates and compares the performance of four ML algorithms, XGBoost, Random Forest, Deep Neural Network, and Logistic Regression, on predicting breast cancer. The model training was performed using a dataset consisting of 7127 breast cancer cases and 7127 matched healthy controls, while the performance was measured based on the AUC, sensitivity, specificity, and accuracy following a repeated 5-fold cross-validation procedure. The results showed that all three novel ML algorithms outperformed Logistic Regression in terms of discriminatory accuracy in identifying women at high risk of cancer. XGBoost also proved to be the best to develop a breast cancer model using breast cancer risk factors.

Five ML algorithms were applied to the Breast Cancer Wisconsin Diagnostic dataset in [16] to determine which are the best and most effective in terms of accuracy, confusion matrix, AUC, sensitivity, and precision to predict and diagnose breast cancer. The algorithms used for the study are SVM, Random Forest, Logistic Regression, Decision Tree, and K-NN. Comparing the results showed that SVM scored a higher accuracy of 97.2%, with a precision of 97.5% and AUC of 96.6%, and outperformed the rest of the algorithms.

The authors in [17] conducted a study to evaluate and compare 8 ML algorithms: Gaussian Naïve Bayes (GNB), K-Nearest Neighbors (K-NN), SVM, RF, AdaBoost, Gradient Boosting (GB), XGBoost, and Multi-Layer Perceptron (MLP). The algorithms were applied to the Breast Cancer Wisconsin dataset, using 5-fold cross-validation. The XGBoost results showed great performance and accuracy, concluding that this algorithm is the best to predict breast cancer in the Wisconsin dataset, with an accuracy of 97.1%, recall of 96.75%, a precision of 97.28%, an F1-score of 96.99%, and an AUC of 99.61%.

Three machine learning techniques for predicting breast cancer were used in [18]. The article aims to develop predictive models for breast cancer recurrence in patients who followed up for two years by applying data mining techniques. The main goal was to compare the performance through the sensitivity, specificity, and accuracy of well-known ML algorithms: the Decision Tree, Support Vector Machine, and Artificial Neural Network algorithms.

The study was applied to a dataset of 1189 records, 22 predictor variables, and 1 outcome variable using 10-fold cross-validation for measuring the unbiased prediction accuracy.

The results showed that the SVM classification model predicted breast cancer with the lowest error rate and the highest accuracy of 0.957, while the DT model was the lowest of all in terms of accuracy with 0.936, and ANN scored 0.947.

To predict breast cancer metastasis in [19], the researchers used 4 ML algorithms, RF, SVM, LR, and Bayesian classification, to evaluate serum human epidermal growth factor receptor 2 (sHER2) as part of a combination of clinicopathological features used to predict breast cancer metastasis. These algorithms were applied to a sample cohort that comprised 302 patients. The results showed that the Random Forest-based model outperformed the rest of the algorithms, with an AUC value of 0.75 (p < 0.001), an accuracy of 0.75, a sensitivity of 0.80, and a specificity of 0.71.

Other articles with systematic and literature reviews exist in the literature, such as [20,21,22,23,24]. The aim of this section is to identify the three ML algorithms to which we will compare our PRIM-based framework.

2.2. Overview of Major Supervised Algorithms Used for Breast Cancer Classification

We have chosen to use three algorithms to compare with our approach, namely, Random Forest, Logistic Regression, and XGBoost, due to their recurring use in the literature for breast cancer classification and because they would give us an accurate evaluation of our algorithm.

XGBoost [9] is a supervised learning algorithm whose principle is to combine the results of a set of simpler and weaker models to provide a better prediction. This method is called model aggregation. The idea is simple: instead of using a single model, the algorithm uses several in a sequential way which will then be combined to obtain an aggregated model. It is, above all, a pragmatic approach that allows us to manage regression and classification problems. The algorithm works in a sequential way. Contrary to Random Forest, for example, this way of doing things will make it slower, but it will especially allow the algorithm to improve itself by capitalizing on previous executions. It starts by building an initial model that it will evaluate. From this first evaluation, everyone will then be weighed according to the performance of the prediction.

Random Forest [8] is a tree-based classifier consisting of many decision trees that operate as an ensemble. For classification tasks, the output of the Random Forest is the class selected by the most trees. For regression tasks, the mean or average prediction of the individual trees is returned. It generates mostly black box models, and it is known for lacking in interpretability.

Logistic Regression [10] is a supervised method that is used for classification problems. Logistic Regression predicts the probability of an event or class that is dependent on other factors; therefore, the output is always between 0 and 1. It is a very commonly used algorithm for constructing predictive models in the medical field.

2.3. The Patient Rule Induction Method

2.3.1. Overview

In a supervised learning environment, the Patient Rule Induction Method [3] is a bump-hunting algorithm that locates areas in the input variables subspace, selected by the decision maker, that are connected to the highest or lowest occurrence of a target label of a class variable.

Two steps make up the Patient Rule Induction Method. The first phase is the top-down peeling, where the boxes are built and each box is bound by the variables defining the selected subspace, given a subset of the input variables subspace. The data analyst may control how many data are eliminated at each peel; typically, this amount is 5%, thus the name “patient”. Due to earlier decisions made by suboptimal greed, the final box discovered after the peeling process might not be ideal.

The boxes are extended by repeatedly extending their bounds as long as the result’s density grows in the second PRIM phase, known as the bottom-up pasting. Figure 1 shows the PRIM’s method for locating a box in the initial stage of top-down peeling. Up until halting requirements are met, these two procedures are continued iteratively to locate further areas. The PRIM is a greedy algorithm as a result. A collection of boxes or regions that provide rules connecting the targeted label of the dependent variable to the explanatory qualities in the data subspace initially selected by the analysts is the result of the PRIM.

Even if the resulting areas and rules are actionable, the selection of variables can consistently alter the answer to the original problem since the search dimension is decided by a human component, making it possible to ignore important rules while attempting to forecast a phenomenon.

Here is a more formal formulation of the problem to help better understand the search in the input space based on the type of variables and whether they are categorical or numeric:

In the same dataset, let y be a target variable and p be the number of input variables represented by x₁, x₂…, x_p. It is necessary to reframe the problem as binary subproblems if y is a multiclass variable. Let S_j represent the range of values that x_j (j = 1, … p) can have. The input space is as follows:

S = S₁ ∗ S₂ ∗ …… ∗ S_p

(1)

The objective is to find subregions R ⊂ S for which:

{\bar{y}}_{R} ≫ \bar{y}

(2)

where

\bar{y}

is the global mean of y and

{\bar{y}}_{R}

is the mean of y in R.

Let s_i ⊂ S_i, we define a box:

B = s₁ ∗ s₂ ∗ …… s_p

(3)

where x

\in

B

\equiv ⋂_{j = 1}^{p} (

x_j

\in

s_j). When s_i = S_i, we leave x_i out of the box definition since it may take any value in its domain. Figure 2a shows an example of a box defined on two numeric variables, where x

\in

B

\equiv

x₁

\in

[a,b]

\cap

x₂

\in

[c,d]. Figure 2b shows the definition of a box if the variables are categorical, where x

\in

B

\equiv

x₁

\in

{a,b}

\cap

x₂

\in

{c,d}. A box can also be defined by numeric and categorical variables at the same time.

2.3.2. The PRIM’s Metrics

Four metrics should be examined when assessing a box: the box’s dimension, density, coverage, and support.

First, we have the coverage. In relation to the total number of target cases in the dataset, it is the percentage of target cases (cases of interest) that are in the box. The following is the mathematical expression:

c o v e r a g e = \frac{N u m b e r o f t a r g e t p o i n t s i n t h e b o x}{T o t a l n u m b e r o f t a r g e t p o i n t s i n t h e d a t a s e t}

(4)

The coverage begins at one and decreases as the algorithm advances because the PRIM is based on the unrestricted full set of cases, which comprises all outcomes of interest. A box that has a high coverage rate identifies the behavior of interest because it captures a greater percentage of the target cases. Although it may miss out on more general patterns or only apply to a small number of cases, low coverage can yield more accurate and useful insights. The box’s ability to isolate the intended patterns or responses within the dataset is assessed with the aid of the coverage.

Second, we have the support. The percentage of points from the dataset that falls inside a box is known as its support; in the python prim package, this is called mass, and it can be expressed mathematically as follows:

m a s s = \frac{N u m b e r o f p o i n t s i n t h e b o x}{T o t a l n u m b e r o f p o i n t s i n t h e d a t a s e t}

(5)

The mass helps determine the box’s size. While a lower mass indicates a smaller amount of data captured, thereby focusing more on a subset that satisfies the box’s criteria, a higher mass indicates a greater amount of data contained in the box, meaning it is less specific but more general. When identifying interesting regions within the data space, this statistic is essential for understanding the trade-off between the box’s specificity and coverage.

Third, we have the density of the box. In PRIM, the density is the same as the confidence level. It gauges the target cases’ concentration inside the box, giving information about the box’s purity with regard to the target cases. The density makes more sense with the PRIM because we work with rectangles or hyper-rectangles. The density can be expressed as follows:

d e n s i t y = \frac{N u m b e r o f t a r g e t p o i n t s i n t h e b o x}{T o t a l n u m b e r o f p o i n t s i n t h e b o x}

(6)

A high density indicates that a greater percentage of the points in the box are target cases, indicating that the box is specifically targeted to the desired behavior or result. On the other hand, a lower density indicates that there are a lot of non-target cases in the box, which reduces its precision.

The final metric is known as restricted dimensions (res dim). This metric suggests the number of variables that define a box. Knowing how many features are used to identify the target cases inside the box and assessing the complexity of the box are useful because they provide information about the patterns that are generated, as well as their specificity and interpretability. It measures the complexity of each box.

Knowing how mass, density, and coverage relate to one another in the PRIM is essential to understanding how well a box represents the target cases. In contrast to the coverage, which calculates the proportion of target cases included in the box compared to all target cases in the dataset, the density displays the concentration of those target cases within the box, indicating their purity. A box with high coverage captures a lot of target cases, but its density might be low if it also contains a lot of non-target cases. This is the trade-off between capturing a lot of targets and maintaining specificity. The mass, on the other hand, shows what proportion of the entire dataset, including target and non-target cases, is contained in the box.

The relationship between mass and coverage shows that a high mass does not necessarily indicate high coverage because the mass encompasses all data points inside the box, not just the targets. A box with high coverage but low mass is ideal for efficiently isolating the target cases and reducing the inclusion of non-target cases. As a result, the box is effective and targeted. To determine the proper balance between specificity and generality, it is useful to comprehend these relationships when defining the boundaries of a box in PRIM.

To understand better, let us suppose, as an example, that we have a dataset with 2000 instances and a two-dimensional input space where each dimension ranges from 0 to 150 and we found two boxes. Box 1 contains 400 points and has a support of 20% and a coverage of 50%, whereas Box 2 contains 100 points with a support of 5% and a coverage of 0.03%. These measures show that Box 1 is more general since it has both higher coverage and mass compared to Box 2, which is more specific. Hence, the selection of the boxes depends on the domain knowledge and the search interest.

2.3.3. Related Works

Numerous applications of the PRIM are available in the literature. The Patient Rule Induction Method for parameter estimation (PRIM-PE) is one of them [25]. Due to over-parameterization and a lack of knowledge, it is advantageous to locate every behavioral parameter vector that might possibly exist when using hydrologic models, according to the authors. The Patient Rule Induction Method for parameter estimation (PRIM-PE) is a novel method developed. It identifies all areas of the parameter space that have a model behavior that is acceptable. The response surface is adequately represented by the parameter sample created by the authors, which has a uniform distribution inside the “good-enough” area. The area in the parameter space where the acceptable parameters are found is then defined using the PRIM. Markov chain Monte Carlo sampling was contrasted with PRIM-PE sampling. The approach worked well, successfully capturing the desirable areas of parameter space.

For finding the multidimensional tradeoffs between coverage, density, and interpretability, another work [26] proposes then contrasts a many-objective optimization strategy with an upgraded usage of the PRIM. Although this method is marginally superior to the enhanced PRIM version, it qualitatively identifies the same subspace, according to the article. The method addresses other pertinent issues including consistency and variety, prevents overfitting, and facilitates the use of more complex metaheuristic optimization techniques for scenario discovery.

The authors of [27] have suggested a technique for analyzing huge amounts of data using the PRIM. The first stage entails using the PRIM to create an area that has a subset with the greatest output variable using chosen input variables and output variables. Aggregating, sorting, comparing, and calculating new metrics make up the second stage. A different option is to decrease the subset using the weighted item set function, and then use the smaller subset in online analytical processing to look for patterns. In [28], the researchers use a modified PRIM to try to statistically identify significant subgroups based on clinical and demographic characteristics. The results were satisfactory enough to draw the conclusion that the data can be used to understand why the clinical trial failed and to aid with the design of subsequent studies. You can find other literature reviews in [29,30] that show the advantage of working with the PRIM, exploring all the feature spaces until discovering the smallest subgroups that can create a tendency for our target variable and large groups that can quickly explain a phenomenon. The literature has also shown that the PRIM performs better than CART in several cases since CART can peel up to 50% of the data at each split. The use of a bump-hunting procedure is not popular among the medical community, and this work aims to show the potential of a new interpretable approach.

To evaluate the interpretability of ML models, the authors in [31] introduced 4 levels of interpretability in models. There are four different levels of interpretability. Level 1 regroups all the approaches that gave only black box models, without any means to interpret the results as induced by Random Forest. The second level concerns the approaches in which the domain knowledge is part of the construction of the models; it participates in the design of the model and leads the algorithm toward the optimum model for the problem encountered. The third level concerns not only the participation of the domain knowledge in the construction of the final model but also the integration of tools to further improve interpretability. At this level, interpretability concerns the clarity of the model and explainability concerns the domain knowledge required. The final level is the best level of interpretability, where the model is crystal clear to the experts. According to this paper, the PRIM is situated at level 4 because by choosing the input research space and tuning the parameters according to the domain knowledge, the PRIM provides the user with non-complex rules.

3. PRIM-Based Framework for Breast Cancer Classification and Explanation

This section aims to present the proposed approach. Indeed, we detail the steps of the framework by explaining the procedure and giving, each time, a review of the literature that guided us through the elaboration of the framework.

3.1. Presentation of the Framework

As covered previously in the literature review section, the PRIM is well adapted for medical field prediction by detecting interesting subgroups. Therefore, we introduce a PRIM-based framework to build accurate, interpretable, and explainable classifiers for breast cancer classification. The suggested framework used the PRIM algorithm implemented on random selections of features. This allows the extraction of rules for all the class labels in the breast cancer dataset under consideration, according to predefined support, peeling, and pasting thresholds. In addition, the framework also contains a method to detect regions in the space where conflicting rules may overlap and a method to handle these rule conflicts. We also find in the framework a method to prune irrelevant rules that are usually redundant or contained in one another. Finally, the last step consists of validating the classifier of breast cancer prediction. Figure 3 displays the different steps contained in the proposed framework. Step 1 consists of the data preprocessing and the establishment of the learning objectives. Step 2 defines the rules generated by the PRIM on random features. These two first steps are mandatory. Steps 3 and 4 are optional depending on the dataset and the objectives of the use but we highly recommend them to increase the interpretability and the explainability of the model. Finally, Step 5 is where the classifier, hence the model, is validated. Hereafter, we develop the main steps of the proposed framework.

Here is a formal statement and explanation of the Random Feature Selection PRIM-based framework:

Input:

Feature space X = {x₁, x₂, ..., x_n} with categorical and numeric features
Target variable y = {0,1}
Training data D = {(x₁,y₁), (x₂,y₂), ..., (x_n,y_n)}
Minimum support threshold β
Peeling parameter α

Procedure:

Random Feature Selection phase with random number of features for each feature combination
Box construction phase for each subspace having the peeling criteria α and the minimum support β
Metarules applied to all boxes to find associations between boxes and overlapping between boxes
Assessing the quality of boxes using the density, the coverage, and the support
Selection of the final boxes for the classifier

Output:

Final classifier for prediction
Metarules for detecting overlapping regions
Weak boxes for subgroup and knowledge discovery

The metarules phase is optional in case the experts wish to detect new regions in the input space to discover insights in the data to explain some phenomena.

3.1.1. Step 1: Data Preparation and Defining Learning Objectives

Preparing the data and defining the learning objectives constitute the first stage of the framework. To choose the thresholds optimally and, if necessary, to better orient feature engineering, the learning objectives must be established first. Although preprocessing data is an essential part of machine learning, our framework only handles data that has a target variable, y, which is frequently binary, y = {0,1}. Our framework is not designed to include survival analysis. As per the various applications in the literature review, there are two kinds of breast cancer datasets for classification: one that uses the target variable to determine whether the patient is alive or dead and another that uses the tumor type to determine whether it is benign or malignant. On the other hand, data preparation concerns cleaning data from missing values, encoding the categorical data, and splitting the data into the training set, the validation set, and the testing set as follows: Let D be the dataset

D = 0.8 ∗ D_train + 0.2 ∗ D_test

(7)

and

D_train = 0.8 ∗ D_{trainfor validation} + 0.2 ∗ D_validation

(8)

3.1.2. Step 2: Building the Boxes with the PRIM on Random Feature Selection

As previously noted, a significant limitation of employing the PRIM for classification tasks lies in its reliance on feature selection. Specifically, selecting a fixed input space may preclude the identification of all potential boxes (i.e., rules) corresponding to interesting subgroups. To address this challenge, we propose executing the PRIM multiple times, each with a randomly chosen subset of features for every class. In each iteration, the PRIM will generate boxes along with associated metrics that evaluate the suitability of the selected features in representing the underlying real-world problem. This approach not only broadens the discovery of relevant subgroups but, more importantly, enhances the interpretability of the model. Unlike many high-performing algorithms that operate as black boxes, our method strives to yield transparent and actionable insights, thereby facilitating more effective optimization and decision-making by domain experts.

The algorithm begins by finding rules for each class in the training set and then moves on to the next step. From here, we can either jump directly to validating the classifier with cross-validation or proceed with overlapping detection and pruning rules using metarules, which is recommended for creating an interpretable and explainable model. This step helps refine the model to improve clarity and reliability.

This framework advocates for constructing a classifier within a randomized feature space. The input dataset may include both categorical and numerical attributes, as the PRIM is designed to handle both data types. The target variable is expected to be binary; if it is not, the problem can be decomposed into multiple binary sub-problems. The algorithm further requires the definition of parameters such as minimum support, a peeling threshold, and a pasting threshold.

Initially, the algorithm extracts rules for each class from the training set, after which the framework proceeds to the next phase. Depending on the output, one can either advance directly to a validation stage—employing techniques like cross-validation on the validation set—or further refine the model by detecting overlapping regions and pruning the rules via metarules. This additional step is recommended to enhance the interpretability and explainability of the resulting model.

3.1.3. Step 3: Handling Rule Conflict

Rule conflict refers to the rules of different classes that can overlap in a region of the input space. The handling of the rule conflict, also known as overlapping classes, is a big part of machine learning research since it changes the classification of some instances that fall into overlapping regions [32,33,34,35].

Going through the literature on this area, we have found that the most used way to resolve the conflict is to analyze the problem depending on the support of the conflicted region. If the region Rc is too small in comparison to the biggest one, as in the case of Figure 4a, then all the data that fall into it will be classified as one of the biggest rules with higher confidence and coverage. If the region of conflict, R_c, has the biggest support, which means more data points fall into it then into the other one that constitutes the conflicted region, as in Figure 4c, then we should reconsider the classification and analyze the input space. And if the region has an average size in comparison with the ones that constitute it, as in Figure 4b, then we calculate which of the classes has the majority in the conflict region R_c.

The problem of overlapping classes often depends on the data and the sparsity of the points. There are no means to predict whether we will have class conflict or not. In the medical field, since classification takes on a human dimension, we suggest providing the experts with the overlapping regions and letting them decide, according to the issue, how to handle the conflict and if another classification is necessary.

3.1.4. Step 4: Organizing and Pruning Rules Using Metarules

Pruning in machine learning consists of reducing and/or organizing the ruleset to make it more interpretable. It is often used in black box models such as models induced by Random Forest. There are several pruning techniques: optimization, minimum support, and metarules. Other pruning techniques can be found in [36,37]. We chose to use metarules [6] because it is the most adapted to our classifier. Indeed, we intend to organize the whole ruleset without removing any rules. The important side for our classifier is to give interpretable results.

The metarule is an association between one rule in the antecedent and another rule in the consequent. It can be discovered using association rules mining algorithms from a set of rules that share the same consequent of interest. To find all the associations between rules that share the same consequent, the support for metarules can be set to 0%. The confidence is specified by the user as in association rules mining. The uncovered metarules divide the rules into disjointed clusters of rules, and each cluster covers a local region in the data space, which allows them to reveal containment and overlap between rules that belong to each cluster. Figure 5 displays an example of how the rules are organized using metarules. Indeed, Rule 3 is contained in Rule 2, Rule 4 is associated with Rule 5, and Rule 6 is associated with Rule 7. Therefore, Rules 3, 4, and 6 can be pruned because they are already represented, respectively, by Rules 2, 5, and 7.

3.1.5. Step 5: Selecting the Final Classifier

As we intend to create a classifier based on the boxes generated for each class by the PRIM, which falls in the subgroup of discovery algorithms, we have investigated the statistical research field on the criteria and tools to build a classifier. According to the original paper of the Classification and Regression Tree by Breiman et al. [38], cross-validation is the tool to validate the classifier. Hence, we use it in Step 5 to reinforce our model.

3.2. Illustrative Example of the PRIM-Based Classification Framework

To illustrate how our framework works, we have applied it to the Diabetes Dataset available in the UCI Machine Learning Repository. The dataset consists of 768 instances, 8 attributes, and 2 classes: 1 if the patient is diabetic and if 0 the patient is not diabetic.

Step 1: For this first step, we first generated random combinations for the features. For the sake of proving the concept, we generated only 10 random combinations of attributes.

Step 2: The second step is to run the PRIM on each combination for the diabetics first and then for the non-diabetics. We obtained a total of 56 rules for class 1 and 37 for class 0. We used the prim library from python. Table 1 displays all the rules found for both classes and their characteristics: coverage, density (confidence), support, and the dimension of the rules. A lower rule dimension allows for less complex rules which makes them more actionable.

Steps 3 and 4: The third step is to detect the overlapping classes using metarules. For this, we first set the matrix of instances where we cross instances with the rules, and then we applied association rules between the rules as explained in Figure 6. Thanks to this method, we detect not only the rules of class 1 that are associated with the rules of class 0 but also the contained rules for each class label. To implement the metarules, we worked in Orange [39] with the package “association rule mining”. For the 56 rules for class = 1, we set the confidence threshold of the metarules at 90% and their support threshold at 0%, we found:

Ten rules that were contained within each other;
Five rules that were associated with another five rules.

This pruned the ruleset for class 1 from 56 rules to 48 rules.

The same procedure was carried out for the class = 0 ruleset, moving from 37 rules to 21 rules because of the redundancy of some of them.

To apply metarules, we first create a binary matrix with the instances as rows and the rules as columns. If the instance is in a rule, we code it as 1; if not, we code it as 0. For example, in Figure 6a, instance 1 does not belong to rule 1 but belongs to rule 2. By doing this we can apply the Apriori algorithm and find the association between the rules. In our example, due to the fact that we have some containment in the rules and redundancy, there is no major rule conflict. As we stated before, the class that has the majority in the region wins the conflict.

Step 5: The final step to construct the classifier is to perform cross-validation to retain the important rules that build an accurate model. We performed a 5-fold cross-validation. We obtained an average accuracy of 95.63%, an average recall of 89.43%, an average precision of 92.8%, and an average F1-score of 91.06%. The rules and metarules retained after pruning do not exceed four features, which means that the maximum dimension is four, making them interpretable and readable. But it is important to note that none of the rules are removed; they are organized in the metarules to enhance the interpretability, and experts have the possibility to look inside a rule that covers a small region for subgroup discovery. Finally, a total of 69 rules for both classes were retained.

4. Results

4.1. Experimental Setup

This section aims to present the setting for the experiment with details about the datasets and the materials used to compute the empirical evaluation.

To evaluate the PRIM-based framework, we conducted a comparative study for the classifier from our PRIM-based framework against three classification algorithms for five real-life breast cancer datasets based on four measures to measure the performance of the classifiers namely: accuracy, precision, recall, and F1-score. We also provide the ROC and AUC for each dataset of the models to show model performance across all possible thresholds, making it particularly valuable for medical applications where the sensitivity–specificity trade-off may need adjustment.

The five breast cancer datasets used can be displayed in Table 2. Wisconsin is taken from the UCI Machine Learning Repository while SEER and the ISPY1-clinica datasets are both taken from the National Cancer Institute of USA and they give as a target variable whether the patient is dead or alive according to some features. The NKI dataset was published by the Netherland Cancer Institute and has 1570 columns. Finally, the Mammographic masses dataset has been published by the Image Processing and Medical Engineering Department (BMT) at the Fraunhofer Institute for Integrated Circuits (IIS) in Germany. The datasets did not need preprocessing since PRIM can handle both categoric and numeric features, we therefore cleaned the missing values.

For the four algorithms, we used a 10-fold cross-validation, and we relied on accuracy to choose, for each dataset, the best model. Therefore, the other measures allow us to measure the performance of our algorithm in classifying the different instances in comparison with the other methods.

To generate XGboost, Random Forest, and Logistic Regression models, we worked in Python, with the framework Orange [38], setting the parameters as the best combination already selected by default. For the PRIM, we used the package “prim” in Python. The parameters of the PRIM were set according to the literature: the peeling criterion was 5%, the pasting criterion was 5%, and the support threshold was 30%.

The imbalance situation in the SEER dataset and the ISPY1-clinica dataset call on the use of recall, precision, and F1-score. The recall is the proportion of the actual positive cases we can predict correctly with the model. It is a useful metric in medical cases where it does not matter if we raise a false alarm, but the actual positive cases should not go undetected. Precision is the proportion of the correctly predicted cases that actually turned out to be positive. It is useful if the false positives are a higher concern than the false negatives. In some cases, there is no clear distinction between whether precision is more important than recall, so we combine both of them and we create the F1-score. In practice, when we try to increase the precision, the recall goes down, and vice versa. The F1-score captures both trends in a single value. More explanations on the metrics in machine learning can be found in [40].

4.2. Empirical Results

In this section, we present empirical results to determine which model is the best for each dataset to situate our framework, followed by the limitations of the study to light the way for future work and improve the framework.

Table 3 and Table 4 summarize the results of the empirical comparison. They display the datasets and the four scores that are measured for the classifiers produced by the Random Forest, XGBoost, Logistic Regression, and R-PRIM frameworks. Three scores (recall, precision, F1-score) reflect the importance of classifying positive examples along with accuracy, as seen in Figure 7, which is a widely used score for evaluating the overall effectiveness of classifiers in classification problems with approximately similar proportions of data samples for each class label. Additionally, Figure 8 provides the ROC and AUC for every dataset.

The results given in Table 3 and Table 4 show that the classifiers produced by the PRIM-based framework perform as well as those produced by the three other algorithms. Being able to detect isolated regions in the input spaces, the resulting classifier handled the imbalanced data quite well since the boxes found were small. Figure 7 offers a visualization of the performance of the four algorithms on the seven datasets for each measure, Figure 7a showing the accuracy, Figure 7b the recall, Figure 7c the precision, and Figure 7d the F1-score.

Across all datasets, R-PRIM-CL’s performance exhibits exceptional consistency. The PRIM framework outperforms both Random Forest and XGBoost in terms of accuracy, achieving 98.4% in the SEER dataset, which is especially noteworthy. This is particularly noteworthy because, with 4024 samples, SEER is the largest dataset, indicating that the PRIM framework scales well with larger medical datasets.

The PRIM framework exhibits well-balanced performance when looking at precision and recall metrics. For instance, the PRIM framework maintains a good balance between false positives and false negatives in the Wisconsin dataset, achieving 95.6 percent precision and 94.2 percent recall. In medical applications, where minimizing missed diagnoses and needless procedures is essential, this balance is essential. The PRIM framework consistently performs above 94 percent across all datasets according to the F1-scores, which give a harmonic mean of precision and recall. The PRIM framework’s robustness in handling complex medical data is demonstrated by its 96.9 percent F1-score in the NKI dataset, which is especially impressive given that it deals with high-dimensional data (1570 features).

The consistency of the PRIM framework’s performance across datasets with different characteristics is what makes it so remarkable. Whether working with high-dimensional genetic data (NKI), imaging results (Mammogram masses), survival data (SEER), or clinical measurements (Wisconsin), the PRIM framework consistently maintains strong performance metrics that are competitive with, and occasionally surpass, those of well-known algorithms like Random Forest and XGBoost. Even though the PRIM framework is sometimes marginally better than traditional ensemble methods, the PRIM framework’s performance is still dependable and clinically relevant. When taking into account the PRIM framework’s interpretability advantage over “black box” techniques like Random Forest and XGBoost, this becomes even more important. The framework is especially useful for medical applications where it is essential for patients and healthcare providers to comprehend the decision-making process because it can maintain such high performance while producing results that are interpretable.

The number of rules generated per class depends on the dimensionality of the data. The more dimensions we have, the more random feature combinations we obtain and the more subgroups we mine. For the NKI dataset for instance, even if the number of instances is 272, the number of attributes is very high at 1570, so the number of subgroups selected for the final classifier before going through the metarules processed can be perceived as high. One should keep in mind that our approach aims at finding the subgroups, so the high number is totally normal in this case since the boxes are very dense and small in comparison with traditional classification procedures. We gave the number of rules per class before and after the metarules for transparency reasons and also to provide the experts with the whole ruleset if they wish to analyze the output carefully in case there is a new set of features that define a subgroup they never suspected. In the metarules process, as stated in Section 3, the association between the rules shows the containment of rules and overlaps, which makes the number considerably decrease in some datasets, like the rules for class 1 in the SEER dataset, which went from 72 to 44 after the metarules. This analysis, displayed in Table 5, shows that the higher the dimensions, the more the rules found will not be contained or overlapped because the framework selects different subspaces each time.

The difference in the datasets provides a good insight into the performance of the framework given that the datasets are very different from each other. The Mammographic masses dataset contains low-dimensional data with only six attributes to choose random combinations from for the search space, but they are nearly balanced data that give us a high accuracy for the model of 97.2%, while the NKI dataset is more complex to work with because it only has about 272 cases for 1570 attributes, which make it a small, high-dimensional dataset where the framework also performed well with an accuracy of 95.6%.

After going over the fundamentals of our model’s performance, we now focus on the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) in Figure 8 to assess the model’s overall efficacy and discriminative capacity.

The ROC curves for the Wisconsin dataset in Figure 8a demonstrate superior classification performance for every algorithm, with AUC values above 0.95. XGBoost and Random Forest perform especially well, with AUC values between 0.98, suggesting almost flawless differentiation between benign and malignant cases. Closely behind is the PRIM classifier, while Logistic Regression exhibits marginally poorer discriminative ability despite still doing well. The high AUC values imply that these models would be trustworthy in supporting clinical breast cancer diagnosis.

All classifiers consistently perform well when it comes to the SEER dataset in Figure 8b. With AUC values greater than 0.97, XGBoost and the PRIM framework demonstrate exceptional discrimination capabilities. Similar performance levels are maintained by the Random Forest algorithm, while Logistic Regression again performs marginally worse but still maintains clinically meaningful accuracy above 0.90. For survival prediction, it is especially crucial that the curves in the high-sensitivity region group tightly.

The models continue to perform well for the ISPY1-clinica dataset in Figure 8c, which has a smaller sample size (168 patients). With an AUC close to 0.98, Random Forest has the best ability to distinguish between different survival outcomes. With AUC values above 0.95, XGBoost and the PRIM framework exhibit performance patterns that are strikingly similar. Compared to the other algorithms, Logistic Regression exhibits greater separation while maintaining clinical performance that is acceptable. Good sensitivity at low false positive rates is indicated by the curves’ steeper initial rise.

All algorithms show good performance for the Mammographic masses dataset in Figure 8d. With an AUC above 0.97, XGBoost produces especially remarkable results, closely followed by Random Forest and the PRIM framework. The curves’ outstanding early progression suggests that the models can maintain low false positive rates while achieving high true positive rates. For mammography screening applications, where false positives may result in needless procedures, this is essential.

Finally, with AUC values above 0.96, Random Forest and XGBoost once more show superior performance for the relatively small but feature-rich NKI dataset in Figure 8e. In terms of discriminative ability, the PRIM framework maintains competitive performance, whereas Logistic Regression displays more noticeable disparities. The high AUC values indicate strong prognostic ability in spite of the dataset’s difficulty, especially for the best-performing algorithms.

Across all datasets, we observe several consistent patterns:

The best results are consistently obtained by Random Forest and XGBoost.
The PRIM framework maintains strong performance close to the best algorithms.
Despite its clinical viability, Logistic Regression usually exhibits a lower capacity for discrimination.
The models generally perform better in the high-sensitivity region, which is crucial for medical applications.
The performance remains robust across varying dataset sizes and feature dimensions.

These results suggest that ensemble methods (RF and XGBoost) are particularly well-suited for these medical classification tasks, while the PRIM framework offers a strong alternative that might be preferred when model interpretability is important because it focuses on small groups.

According to the purpose of the study, the experts could choose the model with good accuracy and the highest recall or precision or the highest accuracy and a good F1-score. In this case, the PRIM-based classifier is the best model to classify the SEER dataset. XGBoost is the best model for the Mammographic masses dataset with the highest accuracy of 98.6% and an F1-score of 97.4%, nearly the same as Random Forest with 97.6%. For the three other datasets, we find small differences between the values, so it will depend on the decision-makers.

5. Discussion

First, the PRIM framework showed outstanding performance on the Breast Cancer Wisconsin dataset, achieving high accuracy (96.8%) and balanced precision (95.6%) and recall (94.2%). The confusion matrix demonstrates that the PRIM framework had a remarkably low false positive rate, misclassifying only 37 out of 569 cases. This implies that the PRIM framework is a very accurate method for differentiating between benign and malignant tumors.

The PRIM framework demonstrated even more remarkable results with 98.4% accuracy in the larger SEER dataset (4024 samples). Only 45 misclassifications (18 false positives and 27 false negatives) out of 4024 cases were found by the confusion matrix. This performance is especially remarkable considering the size of the dataset and how important survival prediction is. While the PRIM’s strong recall (95.6%) indicates that it captures the majority of critical cases, its high precision (97.1%) indicates it rarely issues false alarms.

Even though the ISPY1-clinica dataset had 168 samples, the PRIM framework still performed well, achieving an accuracy of 95%. With seven false positives and seven false negatives, the confusion matrix displayed balanced errors, indicating that the algorithm effectively manages class imbalance. Harmonious precision and recall are indicated by the F1-score of 94.7%, which is significant for clinical applications.

With strong precision (96.3%) and recall (95.8%), the PRIM framework obtained 92.4 percent accuracy in the Mammographic masses dataset (961 samples). In image-based diagnosis, the confusion matrix reveals consistent performance with only 35 misclassifications (16 false positives and 19 false negatives) out of 961 cases.

For the NKI dataset, despite its high dimensionality (1570 features), the PRIM framework maintained excellent performance with 95.6% accuracy. The confusion matrix shows only 12 misclassifications (6 false positives and 6 false negatives) out of 272 cases, suggesting that the PRIM handles high-dimensional data effectively.

The consistency of the PRIM framework’s performance across various medical data types is especially noteworthy. Its interpretability advantage over black-box models and versatility make it especially useful for medical applications where comprehension of the decision-making process is essential. All datasets’ ROC curves demonstrate that the PRIM framework continuously achieves AUC values above 0.95, demonstrating exceptional discrimination ability. The differences are frequently negligible, particularly in the high-sensitivity areas that are the most crucial for medical applications, even though they occasionally fall just short of the best performers (RF and XGBoost).

Concretely, our approach proved to be adaptable to different sorts of datasets. It also provides decision-makers with two important elements: a classifier to predict and rules for knowledge discovery. This framework would be particularly useful in medical healthcare because it is a sensitive domain where practitioners make decisions consciously and carefully. Indeed, bringing the possibility of discovering new subgroups can help in discovering factors and correlations in some diseases while in the process of classifying and predicting.

6. Limitations

One of this work’s shortcomings is the absence of expert explainability. The goal is to achieve a high level of explainability, but we cannot be certain of this without expert validation because it necessitates domain knowledge. In fact, our work was not validated by medical professionals.

The comparison against the three algorithms is the second drawback. To demonstrate the power and appeal of the proposed classifier, we will in fact need to expand to other algorithms. Another drawback is the absence of new datasets; it would be more intriguing to examine current data and offer some innovative models utilizing PRIM-based classifiers.

Additionally, we recommend investigating alternative pruning algorithms and contrasting them with the metarules to select the optimal one for the classifier.

The final restriction is related to the cost of computing the modified PRIM version. It took time to generate the classifier for the high-dimensional dataset NKI, especially without any feature preprocessing, but we did not investigate the computational cost because our work is to introduce a new classification procedure. For other big data or high-dimensional datasets, testing cloud computing or parallel computing is necessary to strengthen this point.

7. Conclusions

We presented the Patient Rule Induction Method, a novel method based on the bump-hunting process for predicting and comprehending the underlying predictive structure of breast cancer. We used the traditional PRIM process to determine the rules, but we did not give the expert a choice in space. Other intriguing groupings were discovered thanks to this random feature space selection. The interpretability of our classifier was improved by using metarules to identify the overlapping regions and prune the rules.

This PRIM-based framework emerges as a highly reliable algorithm for medical classification tasks, offering a compelling combination of accuracy, balanced performance metrics, and interpretability. Its consistent performance across diverse medical datasets suggests it could be a valuable tool for clinical decision support systems.

This PRIM-based framework exhibits good performance in comparison with powerful algorithms for classification, namely Random Forest, XGBoost, and Logistic Regression.

Finally, we point out that our approach is versatile since it can be used on categorical features or numeric features without forgetting that it is highly robust to high-dimensional data.

However, the lack of an expert’s validation gives a weak explainability dimension to the study. In addition to this, the difference between balanced and imbalanced datasets should be studied to make the algorithm even more versatile since imbalanced datasets can cause severe overlapping between the classes which can misclassify a new instance.

Future works will investigate other methods to prune the results. They will also compare other supervised algorithms like SVM to the PRIM-based classifier and implement it on big data to evaluate its performance. And finally, we will have the results validated by experts.

Author Contributions

Both authors have contributed equally to the research and writing of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data is available publicly on UCI Machine Learning repository and NKI website.

Acknowledgments

This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA), and the CNRST of Morocco (Alkhawarizmi/2020/12).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef]
Worldwide Cancer Data | World Cancer Research Fund International. Available online: http://wcrf.org (accessed on 19 February 2025).
Friedman, J.H.; Fisher, N.I. Bump hunting in high-dimensional data. Stat. Comput. 1999, 9, 123–143. [Google Scholar] [CrossRef]
Oviedo, F.; Ferres, J.L.; Buonassisi, T.; Butler, K.T. Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 2022, 3, 597–607. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
Nassih, R.; Berrado, A. A Random PRIM Based Algorithm for Interpretable Classification and Advanced Subgroup Discovery. Algorithms 2024, 17, 565. [Google Scholar] [CrossRef]
Berrado, A.; Runger, G.C. Using metarules to organize and group discovered association rules. Data Min. Knowl. Discov. 2007, 14, 409–431. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Academic Press: Cambridge, MA, USA, 2016; pp. 785–794. [Google Scholar]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef]
Asri, H.; Mousannif, H.; Al Moatassime, H.; Noel, T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput. Sci. 2016, 83, 1064–1069. [Google Scholar] [CrossRef]
Fatima, N.; Liu, L.; Hong, S.; Ahmed, H. Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis. IEEE Access 2020, 8, 150360–150376. [Google Scholar] [CrossRef]
Jacob, D.S.; Viswan, R.; Manju, V.; PadmaSuresh, L.; Raj, S. A survey on breast cancer prediction using data miningtechniques. In Proceedings of the 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), Tiruchengode, India, 2–3 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 256–258. [Google Scholar]
Zand, H.K.K. A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian J. Fundam. Appl. Life Sci. 2015, 5, 4330–4339. [Google Scholar]
Hou, C.; Zhong, X.; He, P.; Xu, B.; Diao, S.; Yi, F.; Zheng, H.; Li, J. Predicting breast cancer in Chinese women using machine learning techniques: Algorithm development. JMIR Med. Inform. 2020, 8, e17364. [Google Scholar] [CrossRef] [PubMed]
Naji, M.A.; El Filali, S.; Aarika, K.; Benlahmar, E.H.; Abdelouhahid, R.A.; Debauche, O. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci. 2021, 191, 487–492. [Google Scholar] [CrossRef]
Prastyo, P.H.; Paramartha, I.G.Y.; Pakpahan, M.S.M.; Ardiyanto, I. Predicting Breast Cancer: A Comparative Analysis of Machine Learning Algorithms. In Proceedings of the International Conference on Science and Engineering, Antalya, Turkey, 21–25 October 2020; IEEE: Piscataway, NJ, USA, 2020; Volume 3, pp. 455–459. [Google Scholar]
Ahmad, L.G.; Eshlaghy, A.T.; Poorebrahimi, A.; Ebrahimi, M.; Razavi, A.R. Using three machine learning techniques for predicting breast cancer recurrence. J. Health Med. Inf. 2013, 4, 3. [Google Scholar]
Tseng, Y.-J.; Huang, C.-E.; Wen, C.-N.; Lai, P.-Y.; Wu, M.-H.; Sun, Y.-C.; Wang, H.-Y.; Lu, J.-J. Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int. J. Med. Inform. 2019, 128, 79–86. [Google Scholar] [CrossRef]
Gupta, S.; Kumar, D.; Sharma, A. Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J. Comput. Sci. Eng. (IJCSE) 2011, 2, 188–195. [Google Scholar]
Li, J.; Zhou, Z.; Dong, J.; Fu, Y.; Li, Y.; Luan, Z.; Peng, X. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS ONE 2021, 16, e0250370. [Google Scholar] [CrossRef]
Nassif, A.B.; Talib, M.A.; Nasir, Q.; Afadar, Y.; Elgendy, O. Breast cancer detection using artificial intelligence techniques: A systematic literature review. Artif. Intell. Med. 2022, 127, 102276. [Google Scholar] [CrossRef]
Abreu, P.H.; Santos, M.S.; Abreu, M.H.; Andrade, B.; Silva, D.C. Predicting breast cancer recurrence using machine learning techniques: A systematic review. ACM Comput. Surv. (CSUR) 2016, 49, 1–40. [Google Scholar] [CrossRef]
Houfani, D.; Slatnia, S.; Kazar, O.; Zerhouni, N.; Merizig, A.; Saouli, H. Machine learning techniques for breast cancer diagnosis: Literature review. In Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development, Tangier, Morocco, 21–26 December 2020; Springer: Cham, Switzerland, 2020; pp. 247–254. [Google Scholar]
Shokri, A.; Walker, J.P.; van Dijk, A.I.; Wright, A.J.; Pauwels, V.R. Application of the patient rule induction method to detect hydrologic model behavioral parameters and quantify uncertainty. Hydrol. Process. 2018, 32, 1005–1025. [Google Scholar] [CrossRef]
Kwakkel, J.H. A generalized many-objective optimization approach for scenario discovery. Futures Foresight Sci. 2019, 1, e8. [Google Scholar] [CrossRef]
Su, H.C.; Sakata, T.; Herman, C.; Dolins, S. Analysis of Massive Data Accumulations Using Patient Rule Induction Method and Online Analytical Processing. U.S. Patent 6,643,646, 4 November 2003. [Google Scholar]
Dyson, G. An Application of the Patient Rule-Induction Method to Detect Clinically Meaningful Subgroups from Failed Phase III Clinical Trials. Int. J. Clin. Biostat. Biom. 2021, 7, 38. [Google Scholar] [CrossRef]
Nassih, R.; Berrado, A. Towards a patient rule induction methodbased classifier. In Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 3–4 October 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
Nassih, R.; Berrado, A. Potential for PRIM based classification: A literature review. In Proceedings of the International Conference on Industrial Engineering and Operations Management, Pilsen, Czech Republic, 23–26 July 2019. [Google Scholar]
Nassih, R.; Berrado, A. State of the art of Fairness, Interpretability and Explainability in Machine Learning: Case of PRIM. In Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications, Rabat, Morocco, 23–24 September 2020. [Google Scholar]
Sáez, J.A.; Galar, M.; Krawczyk, B. Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy. IEEE Access 2019, 7, 83396–83411. [Google Scholar] [CrossRef]
Das, B.; Krishnan, N.C.; Cook, D.J. Handling class overlap and imbalance to detect prompt situations in smart homes. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Washington, DC, USA, 7–10 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 266–273. [Google Scholar]
Lindgren, T. On handling conflicts between rules with numerical features. In Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon, France, 23–27 April 2006; pp. 37–41. [Google Scholar]
Lindgren, T. Methods for rule conflict resolution. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2024; pp. 262–273. [Google Scholar]
Reed, R. Pruning algorithms-a survey. IEEE Trans. Neural Netw. 1993, 4, 740–747. [Google Scholar] [CrossRef]
Fürnkranz, J. Pruning Algorithms for Rule Learning. Mach. Learn. 1997, 27, 139–172. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Cart. In Classification and Regression Trees; Taylor & Francis: London, UK, 1984. [Google Scholar]
Demsar, J.; Curk, T.; Erjavec, A.; Gorup, C.; Hocevar, T.; Milutinovic, M.; Mozina, M.; Polajnar, M.; Toplak, M.; Staric, A.; et al. Orange: Data Mining Toolbox in Python. J. Mach. Learn. Res. 2013, 14, 2349–2353. [Google Scholar]
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]

Figure 1. The top-down peeling method used in the first phase of the PRIM to locate a single box is depicted in this picture. In fact, the algorithm peels the space’s dimensions one at a time while verifying the thresholds provided and whether the goal variable is satisfied until it reaches the bump and the area where the size exceeds the threshold. On the left, we can see the different iterations until we reach the interesting box on the right. Then, it iterates the process for the next box, starting with another dimension.

Figure 2. An illustration of a box. (a) A box defined by numeric features. (b) A box defined by categorical features.

Figure 3. The 5 steps in the R-PRIM framework for breast cancer classification.

Figure 4. An illustration of the overlap between class labels and their different cases. (a) The region of conflict Rc is too small compared to the other rules. (b) The region of conflict Rc has an average size regarding the support. (c) The region of conflict Rc has an important size and the classification should be reconsidered.

Figure 5. An illustration of the organization of the rule space using metarules.

Figure 6. Example of the construction of instance matrices. For every instance in the table, we put 1 if the instance is in the rule and 0 if it is not so that for every class, the matrix of instances is the entry for the association rule. Thus, we can find all the associations between the rules and select the most important ones to reorganize our ruleset according to confidence and support. (a,b) Both show that the procedure is the same for both classes.

Figure 7. Visualization of the four measures obtained in the experiment: (a) the accuracy of each model; (b) the recall of each model; (c) the precision of each model; and (d) the F1-score of each model.

Figure 8. Visualization of the ROC and AUC for each dataset.

Table 1. Rules constructed by the modified version of PRIM for “Class = 1” and “Class = 0” with their corresponding coverage, density, support, and the dimension of each rule.

Rules for Class = 1	Coverage	Density	Dimension	Support
R1: 128.0 < Glucose < 199.0 AND 17.0 < SkinThickness < 99.0 AND 0.0 < Insulin < 520.0 AND 0.257 < DiabetesPedigreeFunction < 2.42 AND 25.0 < Age < 57.0	0.29	0.77	5	0.13
R2: 111.0 < Glucose < 199.0 AND 56.0 < BloodPressure < 122.0 AND 0.0 < SkinThickness < 43.0 AND 0.078 < DiabetesPedigreeFunction < 1.37 AND 32.0 < Age < 54.0	0.24	0.63	5	0.13
R3: 100.0 < Glucose < 199.0 AND 0.253 < DiabetesPedigreeFunction < 1.16 AND 29.0 < Age < 62.0	0.16	0.52	3	0.10
R4: 90.0 < Glucose < 199.0 AND 0.1495 < DiabetesPedigreeFunction < 2.42 AND 22.0 < Age < 81.0	0.24	0.22	3	0.38
R5: 0.0 < BloodPressure < 82.0 AND 12.0 < SkinThickness < 99.0 AND 0.0 < Insulin < 99.0 AND 0.1265 < DiabetesPedigreeFunction < 2.42 AND 24.0 < Age < 81.0	0.02	0.15	5	0.05
R6: 89.0 < Glucose < 199.0 AND 0.1265 < DiabetesPedigreeFunction < 2.42	0.03	0.13	2	0.08
R7: 128.0 < Glucose < 199.0 AND 17.0 < SkinThickness < 99.0 AND 25.0 < Age < 56.0	0.34	0.75	3	0.16
R8: 101.0 < Glucose < 199.0 AND 60.0 < BloodPressure < 85.0 AND 0.0 < SkinThickness < 26.0 AND 33.0 < Age < 52.0	0.14	0.63	4	0.08
R9: 109.0 < Glucose < 199.0 AND 12.0 < SkinThickness < 99.0 AND 31.0 < Age < 59.0	0.08	0.53	3	0.05
R10: 124.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 0.0 AND 25.0 < Age < 53.0	0.11	0.71	3	0.05
R11: 95.0 < Glucose < 199.0 AND 22.0 < Age < 62.0	0.23	0.22	2	0.37
R12: 0.0 < BloodPressure < 85.0 AND 7.0 < SkinThickness < 99.0 AND 26.0 < Age < 56.0	0.03	0.25	3	0.05
R13: 93.0 < Glucose < 199.0 AND 60.0 < BloodPressure < 92.0	0.03	0.16	2	0.08
R14: 130.0 < Glucose < 199.0 AND 30.05 < BMI < 67.1	0.51	0.73	2	0.24
R15: 109.0 < Glucose < 199.0 AND 27.85 < BMI < 67.1	0.26	0.39	2	0.22
R16: 95.0 < Glucose < 199.0 AND 22.79 < BMI < 67.1	0.18	0.22	2	0.28
R17: 7.0 < Pregnancies < 9.0 AND 145.0 < Glucose < 199.0 AND 0.0 < Insulin < 495.0	0.14	0.88	3	0.05
R18: 128.0 < Glucose < 199.0 AND 18.0 < SkinThickness < 99.0 AND 74.0 < Insulin < 478.0	0.22	0.64	3	0.12
R19: 109.0 < Glucose < 199.0	0.48	0.39	1	0.42
R20: 7.0 < Pregnancies < 17.0 AND 84.0 < Glucose < 199.0	0.06	0.4	2	0.05
R21: 0.0 < Pregnancies < 3.0 AND 77.0 < Glucose < 199.0 AND 13.0 < SkinThickness < 45.0 AND 36.0 < Insulin < 846.0	0.04	0.12	4	0.12
R22: 4.0 < Pregnancies < 6.0 AND 0.0 < Glucose < 105.0 AND 0.0 < SkinThickness < 42.0 AND 0.0 < Insulin < 156.0	0.02	0.14	4	0.06
R23: 0.0 < Pregnancies < 2.0 AND 90.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 42.0 AND 0.0 < Insulin < 15.0	0.01	0.09	4	0.05
R24: 8.0 < Pregnancies < 17.0 AND 24.0 < SkinThickness < 43.0 AND 31.0 < BMI < 45.90	0.11	0.75	3	0.05
R25: 30.05 < BMI < 67.1	0.69	0.43	1	0.55
R26: 7.0 < Pregnancies < 17.0 AND 0.0 < BloodPressure < 94.0 AND 23.15 < BMI < 67.1	0.08	0.45	3	0.05
R27: 4.0 < Pregnancies < 17.0 AND 0.0 < BloodPressure < 80.0 AND 0.0 < SkinThickness < 24.0	0.06	0.27	3	0.07
R28: 3.0 < Pregnancies < 17.0 AND 0.0 < SkinThickness < 33.0	0.03	0.14	2	0.07
R29: 0.0 < BloodPressure < 86.0 AND 23.15 < BMI < 29.5	0.03	0.09	2	0.12
R30: 28.1 < BMI < 67.1 AND 0.20 < DiabetesPedigreeFunction < 2.42 AND 31.0 < Age < 60.0	0.5	0.62	3	0.27
R31: 29.0 < Insulin < 846.0 AND 26.1 < BMI < 67.1 AND 0.1275 < DiabetesPedigreeFunction < 2.42 AND 28.0 < Age < 53.0	0.1	0.49	4	0.07
R32: 26.9 < BMI < 67.1 AND 0.1265 < DiabetesPedigreeFunction < 2.42 AND 25.0 < Age < 62.0	0.21	0.36	3	0.20
R33: 0.0 < Insulin < 194.0 AND 22.79 < BMI < 67.1 AND 0.1195 < DiabetesPedigreeFunction < 0.817 AND 23.0 < Age < 81.0	0.11	0.22	4	0.17
R34: 22.0 < Age < 54.0	0.059	0.11	1	0.18
R35: 24.75 < BMI < 67.1	0.018	0.11	1	0.05
R36: 30.85 < BMI < 67.1	0.74	0.46	1	0.56
R37: 23.25 < BMI < 67.1	0.24	0.24	1	0.34
R38: 0.0 < BMI < 23.05	0.01	0.04	1	0.08
R39: 7.0 < Pregnancies < 12.0 AND 110.0 < Insulin < 846.0 AND 0.188 < DiabetesPedigreeFunction < 2.42	0.12	0.82	3	0.05
R40: 0.3235 < DiabetesPedigreeFunction < 2.42	0.56	0.37	1	0.52
R41: 7.0 < Pregnancies < 12.0 AND 64.0 < BloodPressure < 122.0 AND 0.1215 < DiabetesPedigreeFunction < 0.2825	0.067	0.43	3	0.05
R42: 0.11 < DiabetesPedigreeFunction < 0.2825	0.20	0.25	1	0.27
R43: 0.0 < Insulin < 140.0 AND 0.086 < DiabetesPedigreeFunction < 2.42	0.03	0.16	2	0.07
R44: 7.0 < Pregnancies < 9.0 AND 145.0 < Glucose < 199.0 AND 0.0 < Insulin < 495.0	0.14	0.88	3	0.06
R45: 134.0 < Glucose < 199.0 AND 0.0 < Insulin < 478.0	0.40	0.59	2	0.23
R46: 109.0 < Glucose < 199.0	0.31	0.34	1	0.31
R47: 7.0 < Pregnancies < 17.0 AND 84.0 < Glucose < 199.0	0.05	0.4	2	0.05
R48: 0.0 < Pregnancies < 3.0 AND 78.0 < Glucose < 199.0 AND 36.0 < Insulin < 846.0	0.04	0.11	3	0.13
R49: 4.0 < Pregnancies < 6.0 AND 0.0 < Glucose < 104.0 AND 0.0 < Insulin < 156.0	0.02	0.14	3	0.06
R50: 0.0 < Pregnancies < 2.0 AND 90.0 < Glucose < 199.0 AND 0.0 < Insulin < 15.0	0.01	0.09	3	0.05
R51: 28.1 < BMI < 67.1 AND 0.20 < DiabetesPedigreeFunction < 2.42 AND 31.0 < Age < 60.0	0.5	0.62	3	0.27
R52: 26.70 < BMI < 35.45 AND 0.1275 < DiabetesPedigreeFunction < 2.42 AND 30.0 < Age < 53.0	0.09	0.53	3	0.06
R53: 29.95 < BMI < 67.1 AND 0.1265 < DiabetesPedigreeFunction < 2.42 AND 25.0 < Age < 81.0	0.21	0.38	3	0.18
R54: 23.35 < BMI < 67.1 AND 0.1275 < DiabetesPedigreeFunction < 0.6535 AND 28.0 < Age < 61.0	0.05	0.32	3	0.05
R55:0.1195 < DiabetesPedigreeFunction < 2.42 AND 22.0 < Age < 60.0	0.12	0.14	2	0.29
R56: 27.85 < BMI < 67.1 AND 21.0 < Age < 62.0	0.02	0.17	2	0.05
Rules for Class = 0
R1: 94.0 < Glucose < 157.0 AND 0.0 < BloodPressure < 88.0 AND 60.0 < Insulin < 228.0 AND 0.078 < DiabetesPedigreeFunction < 0.899 AND 21.0 < Age < 49.0	0.25	0.76	5	0.22
R2: 89.0 < Glucose < 183.0 AND 0.0 < BloodPressure < 90.0 AND 0.0 < SkinThickness < 41.0 AND 0.0 < Insulin < 190.0 AND 0.078 < DiabetesPedigreeFunction < 1.1855 AND 21.0 < Age < 59.0	0.42	0.66	6	0.42
R3: 80.0 < Glucose < 189.0 AND 52.0 < BloodPressure < 82.0 AND 12.0 < SkinThickness < 39.0 AND 49.0 < Insulin < 394.0 AND 0.259 < DiabetesPedigreeFunction < 2.42	0.07	0.84	5	0.06
R4: 70.0 < BloodPressure < 106.0 AND 16.0 < SkinThickness < 50.0 AND 0.0 < Insulin < 145.0 AND 0.1535 < DiabetesPedigreeFunction < 0.712	0.06	0.71	4	0.06
R5: 52.0 < BloodPressure < 122.0 AND 0.0 < Insulin < 485.0 AND 0.239 < DiabetesPedigreeFunction < 2.42	0.14	0.54	3	0.16
R6: 0.0 < Glucose < 189.0 AND 0.11 < DiabetesPedigreeFunction < 1.143	0.05	0.43	2	0.07
R7: 93.0 < Glucose < 137.0 AND 54.0 < BloodPressure < 88.0 AND 7.0 < SkinThickness < 40.0 AND 21.0 < Age < 52.0	0.32	0.71	4	0.30
R8: 90.0 < Glucose < 157.0 AND 23.25 < BMI < 41.65	0.61	0.68	2	0.58
R9: 19.20 < BMI < 47.34	0.36	0.62	1	0.37
R10: 0.0 < Pregnancies < 0.0 AND 13.0 < SkinThickness < 45.0 AND 63.0 < Insulin < 291.0	0.07	0.86	3	0.05
R11: 2.0 < Pregnancies < 7.0 AND 92.0 < Glucose < 133.0 AND 0.0 < SkinThickness < 39.0 AND 73.0 < Insulin < 267.0	0.10	0.79	4	0.09
R12: 1.0 < Pregnancies < 8.0 AND 105.0 < Glucose < 169.0 AND 0.0 < SkinThickness < 47.0 AND 74.0 < Insulin < 846.0	0.14	0.70	4	0.13
R13: 1.0 < Pregnancies < 17.0 AND 80.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 41.0 AND 0.0 < Insulin < 220.0	0.51	0.63	4	0.53
R14: 56.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 51.0 AND 0.0 < Insulin < 474.0	0.15	0.57	3	0.17
R15: 0.0 < BloodPressure < 88.0 AND 21.45 < BMI < 43.55	0.84	0.66	2	0.83
R16: 0.0 < Pregnancies < 10.0 AND 17.0 < SkinThickness < 46.0 AND 20.6 < BMI < 46.15	0.06	0.78	3	0.05
R17: 0.0 < BloodPressure < 94.0 AND 0.0 < SkinThickness < 47.0 AND 0.0 < BMI < 51.15	0.08	0.57	3	0.09
R18: 40.0 < Insulin < 215.0 AND 25.1 < BMI < 41.65 AND 0.078 < DiabetesPedigreeFunction < 1.18 AND 21.0 < Age < 46.0	0.31	0.75	4	0.27
R19: 20.6 < BMI < 43.34 AND 0.078 < DiabetesPedigreeFunction < 0.9155	0.55	0.63	2	0.56
R20: 15.0 < Insulin < 846.0 AND 0.0 < BMI < 46.6 AND 0.247 < DiabetesPedigreeFunction < 2.2125000000000004	0.06	0.67	3	0.06
R21: 0.0 < Insulin < 14.0	0.07	0.56	1	0.08
R22: 0.0 < BloodPressure < 88.0 AND 21.45 < BMI < 43.55	0.84	0.66	2	0.83
R23: 0.0 < BloodPressure < 106.0 AND 19.20 < BMI < 46.150	0.11	0.66	2	0.11
R24: 0.0 < BloodPressure < 108.0	0.04	0.54	1	0.05
R25: 0.0 < Pregnancies < 1.0 AND 62.0 < BloodPressure < 84.0 AND 60.0 < Insulin < 265.0	0.12	0.85	3	0.09
R26: 2.0 < Pregnancies < 7.0 AND 70.0 < BloodPressure < 88.0 AND 56.0 < Insulin < 160.0 AND 0.078 < DiabetesPedigreeFunction < 0.69	0.07	0.81	4	0.05
R27: 0.0 < BloodPressure < 90.0 AND 0.0 < Insulin < 220.0 AND 0.1405 < DiabetesPedigreeFunction < 1.1855	0.64	0.64	3	0.65
R28: 0.0 < Pregnancies < 3.0 AND 52.0 < BloodPressure < 106.0 AND 14.0 < Insulin < 540.0 AND 0.094 < DiabetesPedigreeFunction < 2.42	0.06	0.78	4	0.05
R29: 1.0 < Pregnancies < 17.0 AND 64.0 < BloodPressure < 108.0 AND 0.098 < DiabetesPedigreeFunction < 2.42	0.08	0.57	3	0.10
R30: 0.0 < Pregnancies < 2.0 AND 65.0 < Insulin < 291.0	0.24	0.78	2	0.20
R31: 0.0 < Pregnancies < 10.0 AND 89.0 < Glucose < 169.0 AND 56.0 < Insulin < 846.0	0.18	0.66	3	0.18
R32: 1.0 < Pregnancies < 17.0 AND 75.0 < Glucose < 199.0 AND 0.0 < Insulin < 0.0	0.38	0.63	3	0.40
R33: 0.0 < Pregnancies < 6.0 AND 56.0 < Glucose < 187.0	0.16	0.61	2	0.17
R34: 0.0 < Glucose < 195.0	0.04	0.48	1	0.06
R35: 23.25 < BMI < 42.5 AND 0.078 < DiabetesPedigreeFunction < 1.09AND 21.0 < Age < 58.0	0.77	0.67	3	0.74
R36: 19.45 < BMI < 49.65AND 0.2355 < DiabetesPedigreeFunction < 1.31	0.16	0.67	2	0.16
R37: 0.10 < DiabetesPedigreeFunction < 2.42 AND 22.0 < Age < 81.0	0.07	0.49	2	0.09

Table 2. Major properties of the datasets considered in the evaluation.

Datasets	Nb of Instances	Nb of Attributes	Class Labels	Class Distribution
Wisconsin	569	32	Malignant: 1	359
			Benign: 0	210
SEER	4024	12	Alive: 0	3408
			Dead: 1	616
ISPY1-clinica	168	18	No: not dead: 0	32
			Yes: dead: 1	136
Mammographic masses	961	6	1: malignant	445
			0: benign	516
NKI dataset	272	1570	1: dead	195
			0: alive	77

Table 3. Results of the F1-score and accuracy of every algorithm on the seven datasets in %.

	Accuracy				F1-Score
Datasets	RF	XGB	LG	R-PRIM-Cl	RF	XGB	LG	R-PRIM-Cl
Wisconsin	97.6	95.4	94.1	96.8	96	97.3	89.9	94.9
SEER	94.4	97.5	88.7	98.4	96.7	97.3	87.4	96.3
IYSP1-clinica	98.7	96.3	89.8	95.3	98	95.3	88.2	94.7
Mammographic masses	95.8	98.6	92.4	97.2	97.6	97.4	92.8	96.1
NKI dataset	98	97.5	85.6	95.6	97.8	95.6	93.7	96.9

Table 4. Results of the precision and recall of every algorithm on the seven datasets in %.

	Precision				Recall
Datasets	RF	XGB	LG	R-PRIM-Cl	RF	XGB	LG	R-PRIM-Cl
Wisconsin	98.2	98	94.8	95.6	93.9	96.7	85.4	94.2
SEER	95.6	97.6	88.1	97.1	98	97.2	86.7	95.6
IYSP1-clinica	98.9	95.2	86.5	94.6	97.2	95.4	89.9	94.8
Mammographic masses	97.7	96.3	92.4	96.3	97.5	98.7	93.2	95.8
NKI dataset	97.9	94.5	93.3	96.7	97.8	96.7	94.1	97.1

Table 5. Number of rules generated by the framework before and after the metarules step.

	Number of Rules Per Class
	Before the Metarules		After the Metarules
	0	1	0	1
Wisconsin	36	45	19	34
SEER	28	72	15	44
IYSP1-clinica	5	12	5	11
Mammographic masses	14	12	9	9
NKI dataset	38	54	12	33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nassih, R.; Berrado, A. Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm. Algorithms 2025, 18, 136. https://doi.org/10.3390/a18030136

AMA Style

Nassih R, Berrado A. Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm. Algorithms. 2025; 18(3):136. https://doi.org/10.3390/a18030136

Chicago/Turabian Style

Nassih, Rym, and Abdelaziz Berrado. 2025. "Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm" Algorithms 18, no. 3: 136. https://doi.org/10.3390/a18030136

APA Style

Nassih, R., & Berrado, A. (2025). Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm. Algorithms, 18(3), 136. https://doi.org/10.3390/a18030136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm

Abstract

1. Introduction

2. Material and Methods

2.1. State of the Art of Breast Cancer Classification

2.2. Overview of Major Supervised Algorithms Used for Breast Cancer Classification

2.3. The Patient Rule Induction Method

2.3.1. Overview

2.3.2. The PRIM’s Metrics

2.3.3. Related Works

3. PRIM-Based Framework for Breast Cancer Classification and Explanation

3.1. Presentation of the Framework

3.1.1. Step 1: Data Preparation and Defining Learning Objectives

3.1.2. Step 2: Building the Boxes with the PRIM on Random Feature Selection

3.1.3. Step 3: Handling Rule Conflict

3.1.4. Step 4: Organizing and Pruning Rules Using Metarules

3.1.5. Step 5: Selecting the Final Classifier

3.2. Illustrative Example of the PRIM-Based Classification Framework

4. Results

4.1. Experimental Setup

4.2. Empirical Results

5. Discussion

6. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI