Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective

Awaad, Amira S.; Elbarawy, Yomna M.; Mancy, H.; Ghannam, Naglaa E.

doi:10.3390/biomedinformatics5030035

Open AccessArticle

Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective

¹

Department of Mathematics, College of Science and Humanities, Prince Sattam bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia

²

Department of Mathematics, Faculty of Science (Girls), Al-Azhar University, Cairo 11754, Egypt

³

Department of Computer Science, College of Engineering and Computer Sciences, Prince Sattam Bin Abdulaziz University, Al-kharj 11942, Saudi Arabia

⁴

Department of Computer Engineering and Information, College of Engineering in Wadi Alddawasir, Prince Sattam bin Abdulaziz University, Wadi Alddawasir 16278, Saudi Arabia

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2025, 5(3), 35; https://doi.org/10.3390/biomedinformatics5030035

Submission received: 17 May 2025 / Revised: 26 June 2025 / Accepted: 30 June 2025 / Published: 2 July 2025

Download

Browse Figures

Versions Notes

Abstract

Background: Anemia, a common health disorder affecting populations globally, demands timely and accurate diagnosis for treatment to be effective. The aim of this paper is to detect and classify four types of anemia: hgb, iron-deficiency, folate-deficiency, and B12-deficiency anemia. Methods: This paper proposes an ontology-enhanced machine learning (ML) framework to classify types of anemia from CBC data obtained from Kaggle, which contains 15,300 patient records. It evaluates the effects of classical versus deep classifiers on imbalanced and oversampled training samples. Tests include KNN, SVM, DT, RF, CNN, CNN+SVM, CNN+RF, and XGBoost. Another interesting contribution is the use of ontological reasoning via SPARQL queries to semantically enrich clinical features with categories like “Low Hemoglobin” or “Macrocytic MCV”. These semantic features were then used in both classical (SVM) and deep hybrid models (CNN+SVM). Results: Ontology-enhanced and CNN hybrid models perform competitively when paired with ROS or ADASYN, but their performance degrades significantly on the original dataset. There were tremendous performance gains with ontology-enhanced models in that Onto-CNN+SVM achieved an F1-score (1.00) for all the four types of anemia under ROS sampling, while Onto-SVM exhibited more than 20% improvement in F1-scores for minority categories like folate and B12 when compared to baseline models, except XGBoost. Conclusions: Ontology-driven knowledge coalescence has been shown to improve classification results; however, XGBoost consistently outperformed all other classifiers across all data conditions, making it the most robust and reliable model for clinically relevant decision-support systems in anemia diagnosis.

Keywords:

anemia classification; complete blood count; diagnostics; iron deficiency; machine learning; balancing data; ontology; SPARQL query; XGBoost

1. Introduction

Anemia is a worldwide health issue caused by a lack of healthy red blood cells or hemoglobin necessary for oxygen transport throughout the body [1]. Its causative factors can include substantial hemorrhage, chronic disease, genetic disorders, or nutrient deficiency [2]. Types of anemia that have a more significant rule of diagnosis include iron-deficiency anemia [3,4], vitamin B12-deficiency anemia [5,6], and folate-deficiency anemia [7]. Collectively, these types of anemia affect millions across the globe, particularly at-risk groups like pregnant women and young children under five years in developing nations [8,9].

Recognizing the pathology of anemia is vital for an accurate diagnosis or treatment plan. Physicians need to be proficient in the clinical aspects of anemia and decide on the relevant laboratory investigations to attempt the correction of nutritional deficits, where applicable [10]. However, understanding anemia can be complex due to the wide range of symptoms one might experience and the lack of necessary resources to spend on specialized equipment [11].

The recent advances in machine learning present promising possibilities to enhance anemia-diagnosis methods [12,13]. This paper tries to analyze complete blood count (CBC) data for the diagnosis of anemia by implementing various machine learning algorithms [14], including K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Convolutional Neural Network (CNN). The present study emphasizes the need for balancing data using methods like Random Oversampling (ROS), the Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic sampling (ADASYN) to solve the imbalanced data problem and improve the performance of these classification methods [15].

However, the clinical adoption of ML models has been hampered frequently by issues like data imbalance and limited explainability. Because many anemia datasets are skewed and underrepresent specific subtypes of anemia, such as B12- or folate-deficiency anemia, resampling methods such as SMOTE, ROS, and ADASYN were invoked to achieve a rebalance of datasets and improve classifier generalization [16].

The ontology-driven advances in AI applications are being realized to improve the performance and interpretability of AI models in healthcare. Ontologies are domain knowledge-structured representations formalizing relationships between medical concepts, laboratory parameters, and clinical conditions. In anemia diagnosis, ontologies such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), Human Phenotype Ontology (HPO), and Logical Observation Identifiers Names and Codes (LOINC) provide standardized vocabularies and the semantic rules to enhance information with clinically relevant annotations on CBC features [17]. For example, hemoglobin levels could be categorized as low, normal, or high by applying ontology-driven thresholds and abstracting the features for better human-interpretable AI models [18].

Several advantages emerge from bringing ontological knowledge into machine learning channels: it improves explainability, manages some edge cases better than others, and semantically validates included predictions with the possibility of inferring missing data using reasoning [19]. Earlier developments have shown that ontology-based feature engineering and rule-based enrichment can increase classification accuracy and transparency in a variety of clinical applications–from sepsis detection to screening for diabetes and phenotyping for rare diseases [20,21]. This was seen as a possible pathway for anemia detection, where augmentation helps with the shortcomings of data-only models through the incorporation of domain knowledge into the training [22].

The proposed framework is a combination of ontology-enhanced feature engineering and conventional deep-learning classification models to enhance the diagnostic process and classification of anemia. Specifically, through the input space of the models (SVM and CNN+SVM), it receives semantic features through categorical transformations based on medical ontologies. These models were trained and evaluated on both imbalanced and oversampled datasets, while performance was measured using multiple metrics. The proposed enhancements in machine ontology not only increased diagnostic accuracy, especially in the minority classes, but also enabled the creation of more interpretable and clinically trustworthy AI systems.

The results indicate that machine learning integrated with ontology methods can help physicians make trained diagnostic decisions, which would benefit patients with better treatment. The study emphasizes the potential for AI to revolutionize the diagnosis of anemia, especially when traditional diagnostic methods may not be feasible or may be too expensive, by looking at historical data and advanced algorithms.

The rest of this paper is organized as follows: Section 2 presents related work about anemia diagnosis. Section 3 demonstrates the methodology and explains the integration of ML with ontology. Section 4 presents the dataset description, analysis, and SPARQL queries used in the integration process. Experimental findings are presented in Section 5, including data preprocessing, system specifications, performance analysis, and discussion. The conclusions are presented in Section 6.

2. Related Works

A 2020 study introduced a method for detecting microcytic hypochromia by integrating features from blood smear images and clinical data from CBC tests. This approach utilized a deep CNN to extract image features, which were then combined with clinical features, resulting in a comprehensive dataset. The study achieved high accuracy, sensitivity, and specificity using classifiers like KNN, SVM, and NN, demonstrating the effectiveness of feature fusion in improving diagnostic accuracy, even with limited patient samples [23].

In 2023, Saleem et al. conducted a study on machine learning techniques for predicting thalassemia, focusing on feature selection methods such as the chi-square test and recursive feature elimination. The study found that combining feature selection with oversampling techniques like SMOTE significantly improved prediction accuracy, with the gradient boosting classifier achieving the highest accuracy of 93.46% [24].

Another study explored the use of Extreme Learning Machine (ELM) to predict various types of anemia, including beta thalassemia trait (BTT) and iron-deficiency anemia. The authors developed a model based on historical data, achieving high performance metrics (99.21% accuracy, 98.44% sensitivity). This research highlighted the potential of AI in enhancing diagnostic processes, particularly in resource-limited settings [25].

In 2021, a review emphasized the clinical utility of interpreting full blood count (FBC) parameters in diagnosing conditions like anemia and infections. It discussed the importance of correlating FBC results with clinical history and the advantages of automated hematology analyzers, while also stressing the need for technical validation of results [26].

A recent study developed innovative techniques, achieving high accuracy in detecting anemia from textual and image datasets. The authors proposed an AlexNet-based Multiple Spatial Attention (AMSA) model, which demonstrated exceptional performance, reaching 99.58% accuracy, 99.97% precision, 99.95% sensitivity, and a 99.97% F1-score. Combining textual and image data, this integrated approach outperformed individual modalities and existing methods, showcasing the transformative potential of machine learning in automated anemia detection [27].

Integrating ontology with machine learning frameworks offers a promising way to improve the accuracy and efficiency of diagnosing and classifying anemia based on CBC data. By defining relations between data elements, ontology provides a structured representation of knowledge that has the potential to improve interpretability and performance in machine learning models.

While machine learning does provide appropriate tools to diagnose anemia, one prominent issue that complex models face is interpretability—a so-called “black box” scenario, which refers to trusting the output of a given model but scarcely being able to explain how it was produced. Medical applications require understanding model decisions as a pre-requisite to gaining clinician acceptance, deliberating on errors, and calibrating their predictions with present knowledge of pathophysiology [28]. Recently, attempts have been made to incorporate domain ontologies during the earlier stages of ML modelling, such as data preprocessing or feature engineering. An ontology in the biomedical realm is a formal representation of knowledge describing concepts (such as diseases, symptoms, or lab tests) and the relationships between those concepts.

Ontologies encode relationships and domain rules, allowing the ML model to reason with a layer of domain context that would otherwise be lost in one-hot encoding or raw numerical values [29]. As a simple example, an ontology might know that “low hemoglobin” is related to “anemia”. Feature engineering might incorporate this so that the model can directly grasp the relationship between hemoglobin levels and the diagnosis of anemia rather than infer it from scratch. In this way, ontology-based encoding acts like in-built domain knowledge. Sun and Dumontier [30] introduced Onto-CGAN, a new system that combines organized information about rare diseases with conditional GANs to create fake electronic health records (EHRs) for diseases that are not included in the training data. They also introduce Onto-CGAN, a novel generative framework that combines structured knowledge from rare disease ontologies with conditional GANs to produce synthetic patient data for diseases that are absent from the training set [31]. By embedding domain-specific ontologies such as ORDO and HPO, the model addresses the challenge of data lack in rare disease research, enabling the creation of realistic and semantically consistent data. This ontology-driven approach significantly improves the quality of synthetic data and enhances the performance of machine learning models in unseen disease classification tasks.

Research has shown that incorporating ontologies can improve the accuracy of machine learning models in clinical tasks. Sahoo et al. [32] demonstrated this in an unrelated domain by using an ontology to inform feature creation; their ontology-enriched models significantly outperformed models using raw text features, boosting recall and balanced accuracy by over 30–50%. Ontologies help in machine learning by providing a structured framework, where relations between data elements are defined, which increases algorithm accuracy and interpretability. Ontology can further enhance these models by providing a comprehensive framework for understanding the complex relationships between different anemia types and their symptoms.

In summary, incorporating an ontology-based approach into ML for anemia diagnosis has several potential benefits: (1) The model accuracy can be boosted by the introduction of expert knowledge along with the set of features, directing the model to concentrate on medically meaningful patterns. (2) It permits transparency and interpretability: the features and their relations may be explained in terms of the ontology’s concepts and relations, within the framework of clinical reasoning. (3) It enhances the consistency and generalizability of the model, and because decisions are grounded in known medical logic, the model may be less likely to latch onto spurious correlations in the data (a common risk in purely data-driven models). Given these advantages, recent works advocate the integration of biomedical ontologies with AI. Indeed, making AI models more explainable and trustworthy is a top priority in the healthcare ML community [33], and ontology-guided feature engineering is a promising strategy to achieve this. Building an ontology of anemia for types, causes, lab findings, and clinical symptoms—and applying this in the ML pipeline—may allow an anemia-diagnosis model to predict anemia accurately and provide reasoning that a clinician can follow and validate.

3. Methodology

This model applied several classification methods, including classic and deep learning, to investigate and diagnose four types of anemia. The selected methods were implemented separately on the imbalanced and oversampled datasets. As illustrated in Figure 1, we employ KNN, SVM, DT, RF, and XGBoost as interpretable, well-established baseline classifiers widely adopted in medical diagnostic tasks. We also use CNN, CNN+SVM, and CNN+RF to leverage feature extraction capabilities and compare hybrid model effectiveness. Each method has its tuning parameters; all use 80% of data for training and 20% for testing. Input data are illustrated in detail in the next section.

The model used KNN with its default parameters: n neighbors = 5, all neighbors have equal weight, leaf-size = 30, and the Euclidean distance metric. The decision function used Multiclass SVM with a linear kernel and One-vs-Rest strategy. The Decision Tree method uses the Gini Impurity metric for splitting nodes. In Random Forest, the n estimators’ parameter, which specifies the number of decision trees in the forest, equals 100. In this study, the XGBoost classifier was implemented with the parameters use_label_encoder = False, eval_metric = ’mlogloss’, and random_state = 42. These settings disable the deprecated label encoder, specify multiclass logarithmic loss as the evaluation metric appropriate for multiclass classification, and ensure the reproducibility of results. All other hyperparameters, such as n_estimators, max_depth, and learning_rate, were left at their default values as provided by the XGBoost library.

Regarding the CNN model, it consists of seven layers, structured as follows: first, a Conv1D Layer, with 64 filters and a kernel size of three, applies 64 convolution filters, each with a kernel size of three, to the input data with shape (15,300, 25). This reduces the data’s height (sequence length), but the width (number of features) remains unchanged. Second, the MaxPooling1D layer with a kernel size of two reduces the output size by half (downsampling), which helps reduce overfitting and computation costs. Third, the second Conv1D layer has 128 filters and a kernel size of three; it applies 128 filters to the pooled output, further extracting features. Fourth, Second MaxPooling1D applies another downsampling operation to reduce the height of the feature map. Fifth, there is a Flatten Layer that converts the output from the convolutional layers into a 1D vector, which is fed into the fully connected layers. The sixth layer is a Dense Layer—a fully connected layer with 128 neurons for higher-level reasoning. Finally, the seventh layer is the Dense Output Layer, which has five neurons (one for each class) and SoftMax activation for multiclass classification.

Other models, CNN+RF and CNN+SVM, have the same earlier architecture as CNN but replace the SoftMax activation function with RF and SVM classifiers.

This study used ontology-driven feature augmentation to enhance the accuracy, semantic sense, and clinical interpretability of the machine learning models employed for anemia classification. Anemia classification using traditional models, trained on raw numerical values from complete blood count (CBC) tests, does not discriminate based on clinical significance or whether a value for hemoglobin or MCV is considered medically low or high. To fill this gap, several additional semantic features were generated based on ontologies of hematology, which led to capturing domain knowledge in a structured and machine-readable form. Key laboratory parameters including hemoglobin (HGB), mean corpuscular volume (MCV), ferritin, vitamin B12, and folate were categorized into medically meaningful intervals based on widely accepted clinical guidelines, including World Health Organization (WHO) standards. For instance, hemoglobin values were categorized as “Low”, “Normal”, or “High” depending on gender-specific reference thresholds (e.g., HGB < 12 g/dL for females or <13 g/dL for males was labeled as “Low”). Similarly, MCV values below 80 fL were categorized as “Low” (microcytic), those between 80 and 100 fL were “Normal”, and those above 100 fL were “High” (macrocytic). This was translated from categorical interpretations into ontology concepts like LowHemoglobin, MicrocyticIndicator, or LowFerritin to adequately encompass the medical reasoning patterns a clinician might use to diagnose anemia.

To make these semantic features usable by machine learning algorithms, each categorical value was converted into a binary vector using one-hot encoding. For example, a sample with “HGB_Low” would have the corresponding binary feature set to one, while “HGB_Normal” and “HGB_High” would be set to zero. This process produced a set of ontology-based binary semantic features that were then concatenated with the original numerical features. In this way, each sample in the dataset contained not only the raw CBC values but also some clinically annotated indicators that carried more semantic meaning. For example, a numerical hemoglobin value of 11.5 g/dL would be accompanied by the binary indicator “HGB_Low = 1”, signaling to the model that this feature falls below the diagnostic threshold.

To reason from patterns that made sense in the medical logic, including ontology-informed features, proved helpful. For example, the presence of “Low HGB”, “Low MCV”, and “Low Ferritin” would diagnose iron-deficiency anemia, whereas “Low HGB”, “High MCV”, and “Low B12” would select a diagnosis of vitamin B12-deficiency anemia. Embedding such domain knowledge into the feature space not only strengthened the model’s discriminative power, particularly for minority anemia classes, but also provided interpretability in line with clinical heuristics.

These semantic features were incorporated into both classical and deep-learning pipelines. With the enriched feature set for the SVM classifier, more concrete decision boundaries were formulated for improving the classification efficiency among overlapping types of anemia. In the CNN, the semantic and numerical features were reshaped for input into Conv1D layers. Hence, the network learns spatial correlations between raw and ontology-driven inputs. Upon the hybridization of CNN+SVM or CNN+RF, the semantic features entered the CNN before being convolved, pooled, and then flattened for secondary model classification. This architectural design provided the competitive edge of both low-level pattern extraction and high-level semantic abstractions in a single application using a parallel model setup.

Figure 2 presents the workflow of this ontology-enhanced machine learning framework for anemia classification. In this workflow, the raw CBC data is combined with semantic feature information extracted from the anemia ontology through SPARQL queries. These semantic features are merged and then encoded into a format to be ingested by the ML models (SVM and CNN). Thus, this semantic integration increases the accuracy of the diagnosis and the interpretability of the model inputs in terms of clinical knowledge.

The ontology visualization in Figure 3 shows a structured knowledge representation simulating the semantic augmentation applied in the proposed anemia-diagnosis framework. By explicitly modeling clinical entities, such as lab tests, types of anemia, and value ranges as classes and relationships, the ontology allows for machine reasoning over the medical data. Object properties like hasLevel and hasAnemiaType encode diagnostic logic, while hasValue captures quantitative input used in classification. This structured schema supports SPARQL-based feature annotation and facilitates alignment between raw CBC values and medically meaningful categories (e.g., “Low HGB” → Iron-Deficiency Anemia). As a result, ontology enhances model interpretability and improves performance, particularly in low-resource or class-imbalanced scenarios, by guiding the model with clinically validated semantic constraints.

Details of the SPARQL query logic, mappings, and integration are further illustrated in Section 4, which shows how semantic categories were programmatically derived and incorporated into the learning pipeline.

To compare the extracted results, an average rank method was used. Since this study applied over one dataset, formal statistical tests (e.g., Friedman test with post-hoc analysis) would not have a significant output. The average rank method is a non-parametric, simplified approach often used in machine learning and statistical comparisons to compare algorithms/methods across multiple metrics or datasets. Although it does not test whether observed differences are statistically significant, it is widely used for its simplicity and interpretability [34,35]. Ranks are summed across all 20 combinations (five classes × four metrics) and averaged. For each class and metric, methods are ranked from one (best) to four (worst). Ties are resolved by averaging ranks. Algorithm 1 illustrates the average rank method steps.

Algorithm 1: Average Rank Method

Input: A list of numerical values.
Sort the values in ascending order, keeping track of their original indices.
Assign ranks to each value based on their sorted positions.
Handle ties (duplicate values) by assigning them the average of their positions.
Calculate the average rank by summing all ranks and dividing by the total number of elements.
Return the average rank.

4. Data Description and Analysis

The original data [36] contains 15,300 patient records collected in the 5-year interval between 2013 and 2018. These records represent the complete blood count test results for these patients. Table 1 shows dataset attributes and their descriptions. Figure 4 illustrates the feature correlation heatmap in the dataset attributes. The colors in the figure portrays strong positive correlations closer to 1 in red and strong negative correlations closer to −1 in blue. This shows that some features have high positive correlations, such as HGB and HCT, and MCH and MCHC, and some features have high negative correlations, like RDW and MCV.

To support ontology-based feature engineering and improve the semantic interpretation of clinical values, this study incorporated SPARQL queries for the structured retrieval of medical concepts and their relationships from the anemia ontology. SPARQL (SPARQL Protocol and RDF Query Language) is a standardized query language for extracting information from RDF/OWL-based ontologies. The annotation of categorical lab features (e.g., “Low Hemoglobin”, “Microcytic Indicator”) and dynamic integration of semantic rules in the preprocessing pipeline are two tasks that are automated by using SPARQL. By this means, the numerical values in the dataset are linked to clinically relevant terms defined in the ontology, thus improving model interpretability and performance.

Below are three example SPARQL queries used in the system:

Query 1: Retrieve all anemia types and their diagnostic indicators

Sparql
CopyEdit
PREFIX : <http://www.example.org/anemia#>, accessed on 1 June 2025
SELECT ?anemiaType ?indicator
WHERE {
?anemiaType a :AnemiaType .
?anemiaType :hasIndicator ?indicator .
}

This query was used to identify the feature combinations related to each anemia type. This query retrieved mappings like:

IronDeficiencyAnemia → LowFerritin, LowMCV, LowHemoglobin.
B12DeficiencyAnemia → HighMCV, LowB12.

These mappings were used to encode rules during preprocessing to assign categorical features like IronDeficiency_Pattern = 1.

Query 2: Get threshold values for each lab test used in categorization

Sparql
CopyEdit
PREFIX : <http://www.example.org/anemia#>
SELECT ?test ?lowThreshold ?highThreshold
WHERE {
?test a :LabTest .
?test :hasLowThreshold ?lowThreshold .
?test :hasHighThreshold ?highThreshold .
}

This helped programmatically label features such as HGB or MCV levels as “Low”, “Normal”, or “High”. For example, this returned

HGB → 12.0 (low), 17.0 (high).
MCV → 80.0 (low), 100.0 (high).

These were used to discretize continuous values into ontology-aligned semantic categories (MCV_Low, HGB_Normal, etc.).

Query 3: Infer possible anemia type given a combination of abnormal lab values

Sparql
CopyEdit
PREFIX : <http://www.example.org/anemia#>
SELECT ?type
WHERE {
?type a :AnemiaType .
?type :hasIndicator :LowMCV .
?type :hasIndicator :LowFerritin .
?type :hasIndicator :LowHemoglobin .
}

When a patient’s CBC profile matched all three of these indicators, the label Likely_IronDeficiencyAnemia = 1 was assigned as a derived semantic feature.

With semantic reasoning using SPARQL introduced into the preprocessing pipeline, the ML models were supplied with ontology-aligned features that would represent some clinically important patterns in the data. This had an enrichment effect on the input space, especially concerning the identification of subtypes like B12- or folate-deficiency anemia that generally had a much lesser representation in the dataset. Further, unlike black-box processes, adopting these queries for feature generation makes the process rule-based and explainable—extremely important aspects in medical AI systems. This explanation provides physicians with the ability to comprehend the medical logic that gave rise to a certain prediction, thereby increasing their willingness to rely on and employ this tool in practice.

5. Experimental Results

The experimental results of this paper framework diverge in two directions, both of which investigate the diagnosis and classification of anemia. The first applies different classification methods (KNN, SVM, DT, RF, CNN, CNN+SVM, CNN+RF, Onto-SVM, Onto-CNN+SVM, XGBoost) over the original data. The second direction applies the same earlier classifications over data after being oversampled with different methods (ADASYN, SMOTE, and ROS), to check if this could improve the performance of the first direction. The main objective of this work is to differentiate between several methods to find one with higher performance, so this paper applies its framework over the same dataset to produce more specific results. Four main measures were used to evaluate the results: accuracy, precision, recall (sensitivity), and F1-score and specificity [37].

5.1. Data Preprocessing

The original data used contains records of 15,300 patients distributed over classes, as shown in Figure 5. The figure shows a minority class imbalance in data—high in the Folate and B12 classes, with a distribution percentage of 63.71:6.66:27.33:1.00:1.30—which leads to using oversampling techniques to overcome this issue.

Three methods—ADASYN (adaptive synthetic sampling), SMOTE (Synthetic Minority Oversampling Technique), and ROS (Random Oversampling)—were used to solve the minority class imbalance problem. A fourth method, Crossover, which randomly mixed features from two records, was also used, but it was excluded as it did not completely solve the problem, as shown in Table 2.

Table 2 illustrates the class distribution before and after applying the oversampling techniques. It shows that the minority classes considered Hgb, Iron, Folate, And B12 for ADASYN, SMOTE, and ROS techniques, but only Folate and B12 were considered minority classes for Crossover techniques. The output of the ADASYN technique distributed data among classes of 19.59:19.98:21.09:19.59:19.75. The SMOTE and ROS techniques distribute data among classes with 20:20:20:20:20. SMOTE generates synthetic samples for the minority class by interpolating between existing minority class samples. ADASYN uses density distribution to determine the number of synthetic samples needed for each minority sample. ROS does not generate synthetic samples; it randomly duplicates existing samples from the minority class until the desired class balance is achieved.

5.2. System Specification

The earlier framework was utilized over the system with specifications of 11th Gen Intel(R) Core (TM) i7-1165G7 @ 2.80GHz, 8G RAM, and a 64-bit operating system (Windows 11), using Python 8.25 programming.

5.3. Analysis and Discussion

Table 3 provides classification metrics and precision, sensitivity, F1-score, and specificity values for each class while applying KNN over data before and after oversampling. It shows that ADASYN and SMOTE perform strongly, especially for B12 and Folate, where precision and sensitivity are near 1.00. Regarding Hgb and Iron, these classes have balanced precision and sensitivity, resulting in good F1-scores ~0.88. ADASYN slightly outperforms SMOTE in some cases, but both methods work similarly. Also, ROS performs similarly to ADASYN and SMOTE for most classes, but with somewhat lower precision for the No-Anemia and Iron classes. It produces higher results with Folate and B12, achieving nearly perfect scores. Folate gives 0.97, 1, and 0.99 for precision, sensitivity, and F1-scores, respectively. As for classification data without applying oversampling methods, the results show poor performance, particularly for Hgb (F1-score = 0.23), Folate (F1-score = 0.06), and B12 (F1-score = 0.30). Moreover, resampling improves specificity for imbalanced classes (No-Anemia, Iron) and slightly reduces it for Hgb vs. the original dataset. Folate/B12 remains near perfect across methods. Without oversampling, it is hard for KNN to identify these classes accurately.

The performance of the SVM model is shown in Table 4, which is influenced by the oversampling method; however, ROS performs optimally. For example, ROS approaches ideal performance for all classes like No-Anemia (precision 1.00, sensitivity 0.98, F1-score 0.99, specificity 1), Folate (precision 0.99, sensitivity 1.00, F1-score 0.99, specificity 1), and B12 (precision 0.97, sensitivity 1.00, F1-score 0.98, specificity 0.99). The model also shows solid F1-score performance across the Hgb and Iron classes, with F1-scores of 0.96 and 0.93, respectively. SMOTE obtains satisfactory results but is still behind ROS (sensitivity 0.92, F1-score 0.93, and precision 0.94 in the No-Anemia class, and the Folate and B12 classes have F1-scores of 0.99 and 0.97, respectively). ADASYN performed well for Folate (F1-score, 0.99, precision 0.98, and sensitivity 0.99, specificity 1) but had lower performance with Hgb (F1-score 0.69, precision 0.81, and sensitivity 0.60, specificity 0.97) and Iron (F1-score = 0.75, precision 0.67, and sensitivity 0.85, specificity 0.93). Without oversampling, the data has problems dealing with Folate (F1-score = 0.86) and B12 (F1-score = 0.70), revealing a need for oversampling for minority classes. In this case, only the No-Anemia and Iron classes are managed well, with these scores: precision 0.94, sensitivity 0.97, F1-score 0.95, and specificity 1.0 for the No-Anemia class and 0.99 precision, 1.0 sensitivity, 1.0 F1-score, and specificity 0.99 for Iron.

Performance results after using the RF are shown in Table 5. ROS remains the best method, with an F1-score of 1.00 and specificity of 1.00 for No-Anemia, Hgb, Iron, Folate, and B12. ADASYN and SMOTE perform very well, with F1-scores generally in the 0.97–1.00 range for most classes. Both techniques have excellent performance, especially for Folate and B12, with a F1-score of one hundred percent. There is a minimal difference between the two; otherwise, ADASYN and SMOTE are the same. In contrast, the data without oversampling had difficulty with Folate and B12, where the sensitivity was low, at approximately 0.76. This led to F1-scores of 0.86 and 0.85 for Folate and B12, respectively. For Iron and No-Anemia, however, it performs well, with F1-scores around 1.00. It is most of the minority classes (Folate and B12) that are the challenges here.

The Decision Tree model performs exceptionally well with all oversampling techniques (ADASYN, SMOTE, and ROS), achieving high precision, sensitivity, and F1-scores across most classes. Table 6 shows that ROS stands out with a perfect F1-score of 1.00 and specificity of 1.00 for No-Anemia, Hgb, Iron, Folate, and B12. ADASYN and SMOTE also perform excellently, with F1-scores of 0.96–1.00. The results for Folate and B12 are powerful, with perfect scores (1.00) in all cases. There is little difference between ADASYN and SMOTE, making them equally effective. Although original, performing well for No-Anemia (F1 = 1.00) and Iron (F1 = 1.00), it faces challenges with smaller classes like Folate and B12, where sensitivity drops (0.98 for Folate, 0.96 for B12), resulting in slightly lower F1-scores (0.98 and 0.98).

Table 7 provides classification metrics and precision, sensitivity, F1-score, and specificity values for each class while applying CNN over the data before and after oversampling. It shows that CNN performs best with ROS, showing high results across all classes. For example, Folate and B12 classes have a perfect F1-score of 1.00, and the No-Anemia, Hgb, and Iron classes also show excellent F1-scores (~0.99). With a perfect specificity of 1.00, it is highly specific for No-Anemia, Hgb, Iron, Folate, and B12. As the ROS effectively balances the dataset, this improves performance in the case of both the majority and minority conceptual classes. While ROS beats the latter two in performance, the latter two provide very convincing results as well, achieving F1-scores ranging from 0.95 to 0.99 for most classes. Using clustered data without an oversampling procedure is, on the other hand, severely detrimental to performance, especially for Folate and B12, obtaining atrociously low F1-scores of 0.2; Hgb also has a weak F1-score of 0.6.

Table 8 highlights that the CNN+RF experimentally performs superlatively under all oversampling techniques (ADASYN, SMOTE, and ROS), especially for Folate and B12, for which the precision, sensitivity, F1-score, and specificity are constantly high. ROS, however, records a perfect specificity of 1.00 across No-Anemia, Hgb, Iron, Folate, and B12. Accuracy-wise, ROS does about as well as one can strive to do, with the precision, sensitivity, and F1-score all nearly perfect, at 1.00 for Folate and B12 and 0.99 for the rest (No-Anemia, Hgb, Iron, Folate, and B12). For example, Hgb registers a 0.98 precision, a 1.00 sensitivity, and a 0.99 F1-score, representing a good balance in performance and prediction. ADASYN and SMOTE are also highly satisfactory, with F1-scores ranging from 0.94 to 0.99 for almost all classes. For instance, Folate scored 0.98 F1 in ADASYN (0.97 being precision and 0.99 being the sensitivity) and 0.99 in SMOTE (precision of 0.99 and sensitivity of 1.00), promising strong performance. Going anywhere without oversampling results from poor performances on Folate and B12, where both precision and sensitivity are low. Folate stood with an ungainly F1-score of 0.23 accompanied by an equally ungainly 0.23 for both precision and sensitivity, and B12 followed suit with F1 = 0.19, implicating that rare classes were very badly predicted without oversampling. Hgb was impeded, in the same vein, by decent predictions with a poor F1 = 0.66 (precision 0.63, sensitivity 0.68).

The CNN+SVM model performs well across all oversampling techniques (ADASYN, SMOTE, and ROS), with ROS being the best performer, as shown in Table 9. For instance, ROS has a perfect specificity of 1.00 for No-Anemia, Hgb, Iron, Folate, and B12. It also achieves nearly flawless accuracy, sensitivity, and F1-scores in every class, including Folate and B12, where an F1-score of 1.00 results from both precision and sensitivity being 1.00. Additionally, the Iron class has a high F1-score of 0.99, sensitivity of 0.99, and precision. With F1-scores between 0.95 and 0.99, ADASYN and SMOTE perform well but marginally worse than ROS. In ADASYN and SMOTE, for example, the Folate class achieves an F1-score of 0.99, with a sensitivity of 0.99 and 1.00, respectively. Balanced F1-scores result from Hgb and Iron maintaining good sensitivity (0.94–0.95) and precision (0.96–0.98). However, the original loses significantly, especially in minor classes like Folate and B12. B12 likewise has a poor F1-score of 0.18, while Folate has a very low F1-score of 0.19, with precision and sensitivity both at 0.23. With an F1-score of 0.66, Hgb performs poorly, highlighting the significance of oversampling methods.

The usefulness of domain knowledge in classification was tested by including ontology-enhanced versions of SVM and CNN+SVM as classifiers. By including semantic features from the anemia ontology, such as thresholds and diagnostic indicators for hemoglobin, MCV, ferritin, B12, and folate, the model learned clinically useful patterns and could be interpreted better. The SVM and CNN+SVM approaches have been chosen for ontology integration due to their compatibility with structured rule-based semantic features and the clinical interpretability they provide. SVM could handle well-defined clear decision boundaries and thus would work very efficiently in the presence of categorical features that represent thresholds set in ontologies, such as “Low MCV”. It is also quite simple to understand and explain, which is in line with the transparency measures needed for medical AI. CNN+SVM synergizes the deep feature extraction ability of CNNs with the fine-grained classification power of SVM, enabling semantic knowledge to impact both feature learning and the final prediction. Other models, such as KNN or Random Forest, had drawbacks stemming from their susceptibility to semantic feature scaling and lack of interpretability; hence, SVM-type models were both the most productive and explainable choice for ontological reasoning in anemia diagnoses.

Table 10 shows the characteristics of the ontology-enhanced SVM (Onto-SVM) model under different sampling conditions. Compared with the SVM results in Table 4, Onto-SVM outperformed consistently in all classes, with a greater margin for the minority classes of Folate and B12. Using ROS, Onto-SVM attained perfect ½ precision and sensitivity (1.00) for both classes. Also, the Hgb and Iron F1-scores increased by 2–4% over their non-ontology version to 0.98 and 0.96, respectively.

Table 11 illustrates the performance of the ontology-enhanced CNN+SVM (Onto-CNN+SVM) model. Including semantic layers, mapped from SPARQL-inferred concepts, improved the model’s ability to distinguish anemia subtypes, particularly under SMOTE and ROS sampling. Across all categories, F1-scores improved by 1–3% compared to the non-ontology-enhanced CNN+SVM model (Table 9). For the ROS-enhanced dataset, every class achieved perfect precision, recall, and specificity.

Table 12 presents the performance of the XGBoost model. It consistently achieved high performance, with precision, recall, F1-score, and specificity values close to or equal to 1. Notably, ROS and the original dataset yielded perfect scores (1.00) for all classes, except for minor variations in the original dataset’s Hgb, Folate, and B12 classes. Both ADASYN and SMOTE also showed nearly optimal results, with only slight drops in recall and F1-score for certain classes, particularly Iron and B12.

Table 13 provides the average rank across all 10 models for each oversampling method, showing that ROS consistently outperforms other methods, achieving the best average rank. The table also shows the least average rank value achieved by XGBoost and Onto-CNN+SVM models. XGBoost is the top performer, regardless of the used sampling strategy.

The ontology-driven enhancements show measurable improvements in classification performance, especially for underrepresented classes like Folate and B12 deficiency. The added semantic features, derived from structured medical knowledge using SPARQL queries, enabled the classifiers to distinguish nuanced clinical patterns that may be overlooked in raw numerical input. ROS consistently proved to be the most effective oversampling method across all metrics. These results reinforce the importance of integrating ontological reasoning into AI systems for clinical diagnosis to improve accuracy and offer interpretability aligned with clinical guidelines.

As Figure 6 shows F1-score comparisons across machine learning models for each anemia class. It indicates that the XGBoost consistently outperformed other models in all anemia types. Onto-CNN+SVM nearly achieved perfect F1-scores for all classes, supporting that semantic feature representation combined with a strong deep-learning technique is powerful. KNN and DT-classical models performed weakly on Hgb and minority classes. CNN and CNN+RF exhibited decent performances, but the addition of ontology features aided them further. Although integrating clinical knowledge into the ontology framework to build the interpretation layer of the anemia determination system contributes to model generalization and accuracy for multiclass anemia diagnosis, XGBoost is the top performer across all models. Figure 7 illustrates a comparison of the F1-score performance of all models across original and oversampled data. It shows evidence that the XGBoost model outperforms other models for almost all oversampling techniques. More precisely, the ROS technique has the highest gains, with most models recording the best F1-score of 1.00.

Figure 8 shows a heatmap of F1-scores per class for all the tested models. XGBoost stands out with consistently high scores, especially for hard-to-detect categories such as Folate and B12 deficiency. On the contrary, traditional classifiers like KNN and DT perform rather weakly, especially in minority classes. This visual comparison strengthens the argument that XGBoost brings reliability and granularity to multiclass classification.

6. Conclusions

This research investigated ontology-augmented and machine learning frameworks for anemia under multiple classes working with CBC data. Infusing semantic knowledge into a machine classification pipeline ensured that the Onto-SVM and Onto-CNN+SVM models outperformed their regular counterparts, except XGBoost. The Onto-CNN+SVM illustration achieved perfect classification scores with F1 = 1.00 for all anemia classes in combination with ROS oversampling, while Onto-SVM improved precision and recall mainly for underrepresented classes like B12 and Folate deficiency. The enhanced versions demonstrated less sensitivity to class imbalance and better interpretability, owing to the clinical SPARQL-driven semantic features incorporated into the framework. The overall results demonstrate XGBoost’s robustness and effectiveness in classifying various anemia types, even under different data balancing scenarios, with ROS appearing to maximize model performance across all metrics.

These results bear witness to the importance of domain knowledge and machine learning algorithms in enhancing diagnostic performance and clinical credibility. This framework sets a precedent for the development of competent decision-support tools in hematological diagnostics.

Author Contributions

Conceptualization, N.E.G. and Y.M.E.; methodology, A.S.A. and Y.M.E.; software, H.M. and Y.M.E.; validation, H.M. and Y.M.E.; formal analysis, N.E.G. and A.S.A.; investigation, Y.M.E.; resources, N.E.G.; data curation, H.M.; writing—original draft preparation, N.E.G. and H.M.; writing—review and editing, Y.M.E. and A.S.A.; visualization, A.S.A.; supervision, H.M.; project administration, Y.M.E.; funding acquisition, A.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Prince Sattam bin Abdulaziz University through the project number (PSAU/2024/01/31644).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available at https://www.kaggle.com/code/serhathoca/anemia-disease-dataset?select=SKILICARSLAN_Anemia_DataSet.xlsx, accessed on 1 July 2025.

Acknowledgments

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2024/01/31644).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Altinier, S.; Varagnolo, M.; Zaninotto, M.; Plebani, M. Identification and quantification of hemoglobins in whole blood: The analytical and organizational aspects of Capillarys 2 Flex Piercing compared with agarose electrophoresis and HPLC methods. Clin. Chem. Lab. Med. (CCLM) 2013, 51, 791–797. [Google Scholar] [CrossRef] [PubMed]
El Adawy, S.; Tahar, S.; Mostafa, H.; Abou ElFutouh, H. Iron Deficiency Anemia and its Relation to Bone Density. Al-Azhar Int. Med. J. 2024, 5, 26. [Google Scholar] [CrossRef]
Mohammed, S.G.; Mousa, A.; Hamead, M.; Hashim, A.M. Role of Hypochromia and Microcytosis in the prediction of iron deficiency anemia. Minia J. Med. Res. (MJMR) 2020, 31, 262–268. [Google Scholar] [CrossRef]
Lee, Y.P.; Loh, C.H.; Hwang, M.J.; Lin, C.P. Vitamin B12 deficiency and anemia in 140 Taiwanese female lacto-vegetarians. J. Formos. Med. Assoc. 2021, 120, 2003–2009. [Google Scholar] [CrossRef] [PubMed]
Green, R.; Miller, J.W. Vitamin B12 deficiency. In Vitamins and Hormones; Academic Press: Cambridge, MA, USA, 2022; Volume 119, pp. 405–439. [Google Scholar]
Gebremichael, B.; Roba, H.S.; Getachew, A.; Tesfaye, D.; Asmerom, H. Folate deficiency among women of reproductive age in Ethiopia: A systematic review and meta-analysis. PLoS ONE 2023, 18, e0285281. [Google Scholar] [CrossRef]
Lee, D.T.; Plesa, M.L. Anemia. In Family Medicine: Principles and Practice; Springer International Publishing: Cham, Switzerland, 2022; pp. 1815–1829. [Google Scholar]
Beutler, E.; Waalen, J. The definition of anemia: What is the lower limit of normal of the blood hemoglobin concentration? Blood 2006, 107, 1747–1750. [Google Scholar] [CrossRef]
Crichton, R.; Charloteauxwauters, M. Iron transport and storage. Eur. J. Biochem. 1987, 164, 485. [Google Scholar] [CrossRef]
Pasricha, S.R.; Tye-Din, J.; Muckenthaler, M.U.; Swinkels, D.W. Iron deficiency. Lancet 2021, 397, 233–248. [Google Scholar] [CrossRef]
Mohammed, K.K.; Dahmani, N.; Ahmed, R.; Darwish, A.; Hassanien, A.E. An Explainable AI and Optimized Multi-Branch Convolutional Neural Network Model for Eye Anemia Diagnosis. IEEE Access 2025, 13, 71840–71857. [Google Scholar] [CrossRef]
Saleem, M.; Aslam, W.; Lali, M.I.; Rauf, H.T.; Nasr, E.A. Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis. Diagnostics 2023, 13, 3441. [Google Scholar] [CrossRef]
Kurstjens, S.; De Bel, T.; Van Der Horst, A.; Kusters, R.; Krabbe, J.; Van Balveren, J. Automated prediction of low ferritin concentrations using a machine learning algorithm. Clin. Chem. Lab. Med. (CCLM) 2022, 60, 1921–1928. [Google Scholar] [CrossRef] [PubMed]
Gómez, J.G.; Parra Urueta, C.; Álvarez, D.S.; Hernández Riaño, V.; Ramirez-Gonzalez, G. Anemia Classification System Using Machine Learning. Informatics 2025, 12, 19. [Google Scholar] [CrossRef]
Desuky, A.S.; Hussain, S. An improved hybrid approach for handling class imbalance problem. Arab. J. Sci. Eng. 2021, 46, 3853–3864. [Google Scholar] [CrossRef]
Wang, K.; Chen, N.; Chen, T. Joint medical ontology representation learning for healthcare predictions. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA; pp. 1–7. [Google Scholar]
Harahap, M.K.; Nasution, M.K.; Hasibuan, N.A. Ontology for Intelligent Disease Diagnosis. In Proceedings of the 2024 International Conference on Electrical Engineering and Informatics (ICELTICs), Banda Aceh, Indonesia, 12–13 September 2024; IEEE: Piscataway, NJ, USA; pp. 56–61. [Google Scholar]
Riaño, D.; Real, F.; López-Vallverdú, J.A.; Campana, F.; Ercolani, S.; Mecocci, P.; Annicchiarico, R.; Caltagirone, C. An ontology-based personalization of health-care knowledge to support clinical decisions for chronically ill patients. J. Biomed. Inform. 2012, 45, 429–446. [Google Scholar] [CrossRef] [PubMed]
Lu, Y.; Zhao, X.; Wang, J. Medical knowledge-enhanced prompt learning for diagnosis classification from clinical text. In Proceedings of the 5th Clinical Natural Language Processing Workshop, Toronto, ON, Canada, 14 July 2023; pp. 278–288. [Google Scholar]
Bodenreider, O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004, 32, 267–270. [Google Scholar] [CrossRef]
Jing, X.; Min, H.; Gong, Y.; Biondich, P.; Robinson, D.; Law, T.; Nohr, C.; Faxvaag, A.; Rennert, L.; Hubig, N.; et al. Ontologies applied in clinical decision support system rules: Systematic review. JMIR Med. Inform. 2023, 11, e43053. [Google Scholar] [CrossRef]
Al-Naseem, A.; Sallam, A.; Choudhury, S.; Thachil, J. Iron deficiency without anemia: A diagnosis that matters. Clin. Med. 2021, 21, 107–113. [Google Scholar] [CrossRef]
Purwar, S.; Tripathi, R.K.; Ranjan, R.; Saxena, R. Detection of microcytic hypochromia using cbc and blood film features extracted from convolution neural network by different classifiers. Multimed. Tools Appl. 2020, 79, 4573–4595. [Google Scholar] [CrossRef]
Saputra, D.C.; Sunat, K.; Ratnaningsih, T. A new artificial intelligence approach using extreme learning machine as the potentially effective model to predict and analyze the diagnosis of anemia. Healthcare 2023, 11, 697. [Google Scholar] [CrossRef]
Erhabor, O.; Muhammad, H.A.; Muhammad, K.; Onwuchekwa, C.; Egenti, N.B. Interpretation of full blood count parameters in health and disease. Haematol. Int. J. 2021, 5, 10080. [Google Scholar]
Ramzan, M.; Sheng, J.; Saeed, M.U.; Wang, B.; Duraihem, F.Z. Revolutionizing anemia detection: Integrative machine learning models and advanced attention mechanisms. Vis. Comput. Ind. Biomed. Art 2024, 7, 18. [Google Scholar] [CrossRef] [PubMed]
Marey, A.; Arjmand, P.; Alerab, A.D.; Eslami, M.J.; Saad, A.M.; Sanchez, N.; Umair, M. Explainability, transparency and black box challenges of AI in radiology: Impact on patient care in cardiovascular radiology. Egypt. J. Radiol. Nucl. Med. 2024, 55, 183. [Google Scholar] [CrossRef]
Robinson, P.N.; Haendel, M.A. Ontologies, knowledge representation, and machine learning for translational research: Recent contributions. Yearb. Med. Inform. 2020, 29, 159–162. [Google Scholar] [CrossRef]
Sahoo, S.S.; Kobow, K.; Zhang, J.; Buchhalter, J.; Dayyani, M.; Upadhyaya, D.P.; Prantzalos, K.; Bhattacharjee, M.; Blumcke, I.; Wiebe, S.; et al. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci. Rep. 2022, 12, 19430. [Google Scholar] [CrossRef]
Sun, C.; Dumontier, M. Generating Patient’s Electronic Health Records with Unseen Diseases Using Ontology-Enhanced Generative Adversarial Networks; Research Square: Durham, NC, USA, 11 September 2024; PREPRINT (Version 1). [Google Scholar] [CrossRef]
Sun, C.; Dumontier, M. Generating unseen diseases patient data using ontology enhanced generative adversarial networks. npj Digit. Med. 2025, 8, 4. [Google Scholar] [CrossRef]
Elgamal, M.; Abou-Kreisha, M.; Abo Elezz, R.; Hamada, S. An Ontology-based Name Entity Recognition NER and NLP Systems in Arabic Storytelling. Al-Azhar Bull. Sci. 2020, 31, 31–38. [Google Scholar] [CrossRef]
Elsayed, E.K.; Fathy, D.R. Sign Language Semantic Translation System using Ontology and Deep Learning. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2020, 11, 141–147. [Google Scholar] [CrossRef]
Saida, D.; Najib, N.; Abdellah, J. Rank Average for handover decision making in heterogeneous wireless networks. ICST Trans. Mob. Commun. Appl. 2018, 3, 153555. [Google Scholar]
Hashim, M.; Rahayu, S.; Noraini, A.; Azwal, A. Ranking method using multiple weighted score analysis. Borneo Sci. J. Sci. Technol. 2007, 21, 19–26. [Google Scholar]
Available online: https://www.kaggle.com/code/serhathoca/anemia-disease-dataset?select=SKILICARSLAN_Anemia_DataSet.xlsx (accessed on 30 March 2025).
Hand, D.J. Assessing the performance of classification methods. Int. Stat. Rev. 2012, 80, 400–414. [Google Scholar] [CrossRef]

Figure 1. Diagnosis model for anemia classification integrating ontology and data balancing.

Figure 2. Workflow of the ontology-enhanced anemia-diagnosis framework.

Figure 3. Ontology schema for anemia diagnosis structured in OWL.

Figure 4. The feature correlation heatmap.

Figure 5. Class distribution of original data.

Figure 6. F1-score comparison across models for each anemia class.

Figure 7. Comparison of F1-score performance across four classification models.

Figure 8. Heatmap of F1-scores across different machine learning models for each anemia class.

Table 1. Dataset attributes and their descriptions.

Parameter	Desc.	Unit	Reference Range
B12	B12	ng/mL	200–503
BA	Basophils	10³/µL	0.01–0.07
EO	Eozinofiller	10³/µL	0.03–0.59
FERRITE	Ferrite	ng/mL	30–400
FOLATE	Folate	ng/mL	4–175
GENDER	Female/Male	-	0–1
HCT	Hematocrit	%	35–45
HGB	Hemoglobin	gr/dL	13.5–16.9
LY	Lenfosit	10³/µL	1.26–3.31
MCH	Mean Corpuscular Hemoglobin	pg	27–32.3
MCHC	Mean Corpuscular Hemoglobin Concentration	gr/dL	32.35
MCV	Mean Corpuscular Volume	fL	81.8–95.5
MO	Monositler	10³/µL	0.29–0.95
MPV	Mean Platelet Volume	fL	9.3–12.1
NE	Neutrophils	10³/µL	1.8–6.98
PCT	Plateletcrit	K/µL	0.17–0.32
PDW	Platelet Distribution Width	fL	10.1–16.1
PLT	Platelets	K/µL	166–308
RBC	Red Blood Cells	million/µL	4.44–5.61
RDW	Red Cell Distribution Width	%	12–13.6
SD	Serum Iron	µg/dL	20–50
SDTSD	(SD/TSD) × 100	µg/dL	20-50
TSD	Total Serum Iron	µg/dL	250–450
WBC	White Blood Cells	10³/µL	3.91–10.2

Table 2. Distribution of classes before and after applying oversampling techniques.

	No. of Records	No- Anemia	Hgb	Iron	Folate	B12
ADASYN	49,749	9747	9942	10,492	9744	9823
SMOTE	48,736	9747	9747	9747	9747	9747
ROS	48,736	9747	9747	9747	9747	9747
Crossover	24,895	9747	1019	4182	9747	199
Original	15,300	9747	1019	4182	153	199

Table 3. KNN performance metrics per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.86	0.44	0.58	0.99
	Hgb	0.8	0.98	0.88	0.96
	Iron	0.78	0.88	0.83	0.96
	Folate	0.92	0.99	0.96	0.99
	B12	0.95	1	0.98	0.99
SMOTE	No-Anemia	0.81	0.47	0.6	0.98
	Hgb	0.92	0.81	0.98	0.96
	Iron	0.78	0.85	0.81	0.96
	Folate	0.92	0.99	0.95	0.99
	B12	0.95	1	0.97	0.99
ROS	No-Anemia	0.76	0.53	0.63	0.97
	Hgb	0.81	0.99	0.89	0.96
	Iron	0.77	0.78	0.78	0.96
	Folate	0.97	1	0.99	1
	B12	0.96	1	0.98	0.99
Original	No-Anemia	0.73	0.87	0.79	0.89
	Hgb	0.36	0.17	0.23	0.98
	Iron	0.65	0.47	0.55	0.94
	Folate	1	0.03	0.06	1
	B12	0.64	0.20	0.3	1

Table 4. SVM performance metrics per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.83	0.8	0.82	0.97
	Hgb	0.81	0.6	0.69	0.97
	Iron	0.67	0.85	0.75	0.93
	Folate	0.98	0.99	0.99	1
	B12	0.91	0.92	0.91	0.98
SMOTE	No-Anemia	0.94	0.92	0.93	0.99
	Hgb	0.92	0.92	0.83	0.99
	Iron	0.86	0.92	0.89	0.98
	Folate	0.98	1	0.99	1
	B12	0.96	0.99	0.97	0.86
ROS	No-Anemia	1	0.98	0.99	1
	Hgb	0.93	0.98	0.96	0.99
	Iron	0.98	0.9	0.93	1
	Folate	0.99	1	0.99	1
	B12	0.97	1	0.98	0.99
Original	No-Anemia	0.99	1	1	1
	Hgb	0.85	0.78	0.81	0.99
	Iron	0.94	0.97	0.95	0.99
	Folate	0.93	0.81	0.86	1
	B12	0.93	0.57	0.7	1

Table 5. Random Forest performance per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.98	0.95	0.96	0.99
	Hgb	0.97	1	0.98	1
	Iron	0.99	0.98	0.98	1
	Folate	1	1	1	1
	B12	0.99	1	1	1
SMOTE	No-Anemia	0.98	0.96	0.97	0.99
	Hgb	0.92	0.97	1	1
	Iron	0.99	0.97	0.98	1
	Folate	1	1	1	1
	B12	0.99	1	1	1
ROS	No-Anemia	1	0.99	1	1
	Hgb	1	1	1	1
	Iron	1	1	1	1
	Folate	1	1	1	1
	B12	1	1	1	1
Original	No-Anemia	0.99	1	1	1
	Hgb	0.95	0.97	0.96	1
	Iron	1	1	1	1
	Folate	1	0.76	0.86	1
	B12	0.98	0.76	0.85	1

Table 6. Decision Tree performance per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.96	0.95	0.96	0.99
	Hgb	0.98	0.98	0.98	0.99
	Iron	0.98	0.98	0.98	0.99
	Folate	1	1	1	1
	B12	1	1	1	1
SMOTE	No-Anemia	0.96	0.96	0.96	0.99
	Hgb	0.92	0.98	0.98	1
	Iron	0.98	0.98	0.98	1
	Folate	1	1	1	1
	B12	1	1	1	1
ROS	No-Anemia	1	1	1	1
	Hgb	1	1	1	1
	Iron	1	1	1	1
	Folate	1	1	1	1
	B12	1	1	1	1
Original	No-Anemia	1	1	1	1
	Hgb	0.99	0.99	0.99	1
	Iron	1	1	1	1
	Folate	0.98	0.98	0.98	1
	B12	1	0.96	0.98	1

Table 7. CNN performance per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.96	0.96	0.96	0.99
	Hgb	0.97	0.96	0.96	0.99
	Iron	0.97	0.96	0.97	0.99
	Folate	0.98	1	0.99	1
	B12	0.98	0.99	0.98	0.99
SMOTE	No-Anemia	0.97	0.96	0.97	1
	Hgb	0.92	0.95	0.96	1
	Iron	0.98	0.95	0.96	0.99
	Folate	0.99	0.99	0.99	1
	B12	0.97	1	0.98	1
ROS	No-Anemia	0.99	0.99	0.99	1
	Hgb	0.99	1	0.99	1
	Iron	0.99	0.98	0.98	1
	Folate	1	1	1	1
	B12	1	1	1	1
Original	No-Anemia	0.98	0.98	0.98	1
	Hgb	0.6	0.61	0.6	0.98
	Iron	0.93	0.93	0.93	0.99
	Folate	0.24	0.32	0.27	0.99
	B12	0.24	0.17	0.2	0.99

Table 8. CNN+RF performance per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.97	0.94	0.95	0.99
	Hgb	0.95	0.94	0.94	0.99
	Iron	0.96	0.96	0.96	0.99
	Folate	0.97	0.99	0.98	1
	B12	0.96	0.98	0.97	0.99
SMOTE	No-Anemia	0.97	0.96	0.97	1
	Hgb	0.92	0.97	0.96	1
	Iron	0.97	0.96	0.97	0.99
	Folate	0.99	1	0.99	1
	B12	0.98	1	0.99	1
ROS	No-Anemia	1	0.98	0.99	1
	Hgb	0.98	1	0.99	1
	Iron	0.99	0.99	0.99	1
	Folate	1	1	1	1
	B12	1	1	1	1
Original	No-Anemia	0.99	0.99	0.99	1
	Hgb	0.63	0.68	0.66	0.98
	Iron	0.95	0.96	0.95	0.99
	Folate	0.23	0.23	0.23	0.99
	B12	0.27	0.15	0.19	0.99

Table 9. CNN+SVM performance per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.96	0.97	0.96	0.99
	Hgb	0.96	0.95	0.96	0.99
	Iron	0.98	0.95	0.96	1
	Folate	0.98	0.99	0.99	1
	B12	0.97	0.99	0.98	1
SMOTE	No-Anemia	0.97	0.96	0.96	0.99
	Hgb	0.92	0.96	0.94	0.99
	Iron	0.97	0.95	0.96	0.99
	Folate	0.98	1	0.99	1
	B12	0.97	1	0.98	0.99
ROS	No-Anemia	1	0.98	0.99	1
	Hgb	0.99	1	0.99	1
	Iron	0.99	0.99	0.99	1
	Folate	1	1	1	1
	B12	1	1	1	1
Original	No-Anemia	0.99	0.99	0.99	1
	Hgb	0.62	0.71	0.66	0.98
	Iron	0.95	0.96	0.95	0.99
	Folate	0.23	0.16	0.19	0.99
	B12	0.27	0.13	0.18	0.99

Table 10. Onto-SVM performance metrics per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.91	0.92	0.92	0.98
	Hgb	0.87	0.83	0.85	0.97
	Iron	0.92	0.89	0.90	0.98
	Folate	0.99	1.00	0.99	1.00
	B12	0.98	0.99	0.99	0.99
SMOTE	No-Anemia	0.94	0.94	0.94	0.99
	Hgb	0.91	0.94	0.92	0.98
	Iron	0.93	0.91	0.92	0.98
	Folate	1.00	1.00	1.00	1.00
	B12	0.99	1.00	0.99	1.00
ROS	No-Anemia	1.00	0.99	1.00	1.00
	Hgb	0.98	0.99	0.98	1.00
	Iron	0.97	0.95	0.96	0.99
	Folate	1.00	1.00	1.00	1.00
	B12	1.00	1.00	1.00	1.00
Original	No-Anemia	0.95	0.97	0.96	0.99
	Hgb	0.78	0.72	0.75	0.96
	Iron	0.91	0.89	0.90	0.98
	Folate	0.87	0.85	0.86	0.98
	B12	0.82	0.77	0.79	0.98

Table 11. Onto-CNN+ SVM performance metrics per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.97	0.96	0.96	0.99
	Hgb	0.96	0.96	0.96	0.99
	Iron	0.97	0.96	0.96	0.99
	Folate	0.99	0.99	0.99	1.00
	B12	0.98	0.99	0.99	1.00
SMOTE	No-Anemia	0.98	0.97	0.97	1.00
	Hgb	0.96	0.95	0.95	1.00
	Iron	0.97	0.96	0.96	0.99
	Folate	1.00	1.00	1.00	1.00
	B12	1.00	1.00	1.00	1.00
ROS	No-Anemia	1.00	1.00	1.00	1.00
	Hgb	1.00	1.00	1.00	1.00
	Iron	1.00	1.00	1.00	1.00
	Folate	1.00	1.00	1.00	1.00
	B12	1.00	1.00	1.00	1.00
Original	No-Anemia	0.97	0.96	0.96	0.99
	Hgb	0.76	0.73	0.74	0.96
	Iron	0.88	0.84	0.86	0.97
	Folate	0.72	0.70	0.71	0.96
	B12	0.68	0.66	0.67	0.97

Table 12. XGBoost performance metrics per class.

Oversampling	Class	Precision	Recall (Sensitivity)	F1- Score	Specificity
ADASYN	No-Anemia	0.99	0.98	0.98	0.99
	Hgb	0.99	1	0.99	0.99
	Iron	1	0.99	0.99	0.99
	Folate	1	1	1	0.99
	B12	1	1	1	0.99
SMOTE	No-Anemia	0.99	0.98	0.99	0.99
	Hgb	0.99	1	0.99	0.99
	Iron	1	0.98	0.99	0.99
	Folate	1	1	1	0.99
	B12	1	1	1	0.99
ROS	No-Anemia	1	1	1	1
	Hgb	1	1	1	1
	Iron	1	1	1	1
	Folate	1	1	1	1
	B12	1	1	1	1
Original	No-Anemia	1	1	1	1
	Hgb	0.99	1	0.99	0.99
	Iron	1	1	1	1
	Folate	0.97	1	0.98	0.99
	B12	0.99	0.95	0.97	0.99

Table 13. Average rank for all used methods.

	ROS	ADASYN	SMOTE	Original
KNN	2.10	2.35	2.43	3.13
SVM	1.53	3.55	2.48	2.45
RF	1.38	1.98	1.7	2.95
DT	1.78	3.05	2.85	2.4
CNN	1.15	1.93	2.25	3.68
CNN+RF	1.15	1.8	2.45	4.6
CNN+SVM	1.15	1.95	2.5	4.4
Onto-SVM	1.10	2	3.15	3.85
Onto-CNN+SVM	1.0	1.8	2.50	3.70
XGBoost	1.0	1.0	1.0	1.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Awaad, A.S.; Elbarawy, Y.M.; Mancy, H.; Ghannam, N.E. Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective. BioMedInformatics 2025, 5, 35. https://doi.org/10.3390/biomedinformatics5030035

AMA Style

Awaad AS, Elbarawy YM, Mancy H, Ghannam NE. Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective. BioMedInformatics. 2025; 5(3):35. https://doi.org/10.3390/biomedinformatics5030035

Chicago/Turabian Style

Awaad, Amira S., Yomna M. Elbarawy, H. Mancy, and Naglaa E. Ghannam. 2025. "Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective" BioMedInformatics 5, no. 3: 35. https://doi.org/10.3390/biomedinformatics5030035

APA Style

Awaad, A. S., Elbarawy, Y. M., Mancy, H., & Ghannam, N. E. (2025). Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective. BioMedInformatics, 5(3), 35. https://doi.org/10.3390/biomedinformatics5030035

Article Menu

Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective

Abstract

1. Introduction

2. Related Works

3. Methodology

4. Data Description and Analysis

5. Experimental Results

5.1. Data Preprocessing

5.2. System Specification

5.3. Analysis and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI