Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Anemia Classification System Using Machine Learning

Informatics 2025, 12(1), 19; https://doi.org/10.3390/informatics12010019

by Jorge Gómez Gómez^1,*

, Camilo Parra Urueta¹

, Daniel Salas Álvarez¹, Velssy Hernández Riaño¹ and Gustavo Ramirez-Gonzalez²

Reviewer 1:

Sunil Karamchandani

Reviewer 2:

Stella Christopoulou

Reviewer 3:

José Rafael Escorcia-Gutierrez

Informatics 2025, 12(1), 19; https://doi.org/10.3390/informatics12010019

Submission received: 12 November 2024 / Revised: 2 February 2025 / Accepted: 7 February 2025 / Published: 11 February 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. Figure 1 needs to be clear.

2. The dataset needs to be talked about before delving into the algorithms. How does each value contribute to the anemia. (extension of table 2)

3. There is an uneven balance of data for the classes. How does this affect the classification.

4,. is there some coefficient that takes care of it.

5. Can we compare the results of the ML algorithms with any classifier after feature extraction.

6. More interestingly if solutions are provided by IF THEN ...ELSE programming why does it need the help of sophisticated ML algorithms.

Author Response

After reviewing each of your comments in detail, we have responded to them. We appreciate the feedback that was provided.

Figure 1 needs to be clear.

R:/

Enhanced Figure

The dataset needs to be talked about before delving into the algorithms. How does each value contribute to the anemia. (extension of table 2)

The dataset consists of hematological parameters such as hemoglobin (HGB), mean cell volume (MCV) among others. Each value contributes differently to the diagnosis and classification of anemia as follows:

HGB: Determines the presence of anemia based on threshold levels (e.g., <13.6 g/dL for men).
MCV: Differentiates anemia types (microcytic, normocytic, macrocytic) based on red blood cell size.
MCH and MCHC: Indicate the content and concentration of hemoglobin within cells, aiding in subclassification.
RDW: Reflects variation in red blood cell size, helping to identify mixed anemia types.
PLT, WBC, RBC: Provide context on overall blood health and rule out other disorders.

Once these metrics are understood, algorithms can set thresholds for classification.

There is an uneven balance of data for the classes. How does this affect the classification.

Considering that within the dataset, there is an imbalance in the classes, especially in the dominant classes, which in this case are people who do not have Class 0 anemia, followed by patients who have Class 2 normocytic anemia, Class 1 microcytic anemia, and Class 3 macrocytic anemia.

According to this, the model can be affected by

Prediction bias: models can favor dominant classes.

Reduced performance metrics, such as precision, recall, and F1 scores, for minority classes can be poor.

Overfitting to common classes: Models can have difficulty generalizing minority class patterns.

However, once the model that performed best was defined, which for our problem is the random forest algorithm. To solve this imbalance, we used a multiclass logarithmic loss function. Considering that this function is valid if and only if the number of classes is greater than or equal to 3 (n_classes >= 3). For this method, the Gradient Boosting Classifier function was used.

According to the Gradient Boosting Classifier function, the predicted probability p determines the value of loss [23]. In this case, if the value of p is high (i.e., P=1), the model is rewarded for making a correct prediction. Otherwise, if the value of P<1, this indicates a low value of loss, that is, a bad prediction.

The result of the model score is 1.0, which means that the model is highly reliable for multiclass predictions. That is, the diagnosis of the presence or absence of anemia and the type of anemia can be predicted with a high level of confidence.

4,. is there some coefficient that takes care of it.

R/ Yes, the multiclass log loss function was used. As explained in the previous question.

Can we compare the results of the ML algorithms with any classifier after feature extraction.

Yes, feature extraction or engineering can refine datasets to improve the classifier performance. This study compares algorithms such as Linear Discriminant Analysis (LDA), decision trees, and random forests without performing explicit feature extraction. A comparison of post-feature extraction results with simpler classifiers did not allow for the validation of feature importance, robustness, and model interpretability. Classification models such as SVM, Logistic Regression, Gaussian NB, and KNeighbors were used for this exercise, which performed poorly; however, they were not considered for this study because of their very low scores. However, the Random Forest model was compared with the Linear Discriminant and Decision Tree models. The results showed that Random Forest achieved higher accuracy.

More interestingly if solutions are provided by IF THEN ...ELSE programming why does it need the help of sophisticated ML algorithms.

Rule-based systems such as IF-THEN-ELSE are limited because they require exhaustive rules that may miss subtle patterns in data. They also lack adaptability to unseen and complex scenarios. Interdependencies between features (e.g., MCV and RDW relationships) were not leveraged. In other words, ML algorithms, such as random forests, automatically discover patterns, handle noise, and generalize better, making them indispensable for robust anemia classification in varied datasets. However, for model training, it is necessary to define the IF-THEN-ELSE rules.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors

Overall, your paper presents significant knowledge and considerable effort in the field. I have provided a few comments that I believe will help you improve your manuscript in terms of clarity and completeness and give greater impact to your study.

I hope you find these comments helpful in improving your manuscript.

I look forward to seeing the revised version and wish you the best in your research endeavors.

Best regards

*******

The summary: The study entitled: Anemia classification system using machine learning. by Jorge Gómez Gómez, Camilo Parra Urueta, Daniel Salas Álvarez, Velssy Hernández Riaño and Gustavo Ramirez-Gonzalez.

The study developed a system using blood count data and supervised learning algorithms to predict the type of anemia. Anemia, characterized by low red blood cell levels, impairs oxygenation and often leads to symptoms like fatigue and shortness of breath. The system classifies anemia into three types: microcytic, normocytic, and macrocytic, based on metrics like mean corpuscular volume. The Random Forest model achieved an accuracy of 99.82%, surpassing previous studies.

Comments – Suggestions

The data definition should include the relevant link to the dataset you have chosen. Alternatively, you could include the dataset you used as a supplementary file.

Also, the variables (features) of the Kaggle dataset should be described in detail in a supplementary file.

The inclusion criteria for the dataset you are using should also be clearly stated.

You will make your study more comprehensive if you clearly describe the framework and workflow of your proposed model and accompany it with appropriate figures.

Comments on the Quality of English Language

The quality of English is satisfactory.

Author Response

After reviewing each of your comments in detail, we have responded to them. We appreciate the feedback that was provided.

I hope you find these comments helpful in improving your manuscript.

I look forward to seeing the revised version and wish you the best in your research endeavors.

Best regards

*******

Comments – Suggestions

The data definition should include the relevant link to the dataset you have chosen. Alternatively, you could include the dataset you used as a supplementary file.

R:/

This is the link to the dataset obtained from Kaggle: https://www.kaggle.com/datasets/biswaranjanrao/anemia-dataset?resource=download&select=anemia.csv

This is the link to the file that was trained to solve the problem: https://github.com/jeliecergomez/Machine_Learning/blob/main/Anemia_Confirmate_Type.csv

Also, the variables (features) of the Kaggle dataset should be described in detail in a supplementary file.

R:/

This is the file of the anemia dataset characteristics:

https://github.com/jeliecergomez/Machine_Learning/blob/main/About%20Dataset_Anemia.txt

The inclusion criteria for the dataset you are using should also be clearly stated.

R:/

HGB: Determines the presence of anemia based on threshold levels (e.g., <13.6 g/dL for men).
MCV: Differentiates anemia types (microcytic, normocytic, macrocytic) based on red blood cell size.
MCH and MCHC: Indicate the content and concentration of hemoglobin within cells, aiding in subclassification.

Once these metrics are understood, algorithms can set thresholds for classification.

You will make your study more comprehensive if you clearly describe the framework and workflow of your proposed model and accompany it with appropriate figures.

R:/

The framework and workflow of the proposed model is presented below, Figure 2 shows the main elements that provide workflow of the model.

Figure 2. framework and workflow of the proposed model

An extended description of the framework is included in the following steps:

Dataset Collection and Preprocessing:

Source: Kaggle dataset with 1,421 instances.
Data Cleaning: Removal of incomplete or inconsistent records.
Normalization: Ensuring values are scaled to eliminate bias due to variable magnitudes.

Feature Selection and Engineering:

Parameters like HGB, MCV, MCHC, and RDW are selected based on their diagnostic significance.
Thresholds are defined for anemia classification (e.g., MCV for microcytic: <80).

Model Training:

Algorithms: Random Forest, Decision Trees, and Linear Discriminant Analysis.
Dataset split into training (80%) and testing (20%) sets.

Evaluation:

Metrics: Accuracy, precision, recall, F1 score, and confusion matrix.

Implementation:

Integration into a diagnostic system for clinical use.
Designed for regions with limited access to laboratory tests.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The research paper focuses on developing a machine-learning system designed to classify various types of anemia using blood count data. It addresses the challenges of diagnosing anemia, particularly in low-resource areas, by leveraging supervised learning algorithms such as Random Forest, Decision Trees, and Linear Discriminant Analysis. Among these, the Random Forest model achieved the highest accuracy at 99.82%, surpassing previous methodologies in the field.

Recomendations to strenghen the research
1. The paper lacks a clear methodological proposal that outlines the development process. Including a block diagram or flow chart to illustrate how the proposal was conceived and implemented would provide clarity and reinforce the validity of the results.
2. The related works section should emphasize studies directly aligned with the methodological approach under development. The bibliometric analysis currently included is overly extensive. Synthesizing this information or removing it, if deemed unnecessary, would streamline the section.
3. The background section may not be essential. Readers interested in the topic are likely already equipped to understand the context without it. Its removal would focus attention on the core contributions of the paper.
4. Figure 3 adds little value to the work. Replacing it with pseudocode that demonstrates the logic and development of the proposed algorithm would better illustrate how the problem was addressed.
5. The presentation of Figure 4 requires improvement for better clarity and visual appeal.
6. A more detailed analysis of the results obtained should be provided, aligning it with the proposed methodology. This addition would enhance the reader's understanding of the implications and robustness of the findings.

Additional comments
1. The choice of the Kaggle dataset over other potential datasets should be justified, particularly regarding its representativeness. While Colombian Ministry of Health data on this topic is not freely available, a discussion on the dataset’s relevance and limitations would strengthen the study's credibility.
2. Exploring other techniques to validate the model's performance is recommended to avoid potential overestimations. This would ensure the robustness and reliability of the results.
3. The discussion section could be expanded to address the ethical and practical challenges of implementing AI in healthcare, particularly in underserved regions.
4. Potential biases in the dataset and their impact on clinical decision-making should be discussed. Addressing these limitations would provide a more balanced perspective and enhance the study’s applicability in real-world scenarios.

Final decision: Accepted with corrections.

Comments on the Quality of English Language

Revise the english grammar and cohesion between paragraphs.

Author Response

After reviewing each of your comments in detail, we have responded to them. We appreciate the feedback that was provided.

R:/

4.1. Framework and workflow

The framework and workflow of the proposed model is presented below, Figure 2 shows the main elements that provide workflow of the model.

Figure 2. framework and workflow of the proposed model

An extended description of the framework is included in the following steps:

Dataset Collection and Preprocessing:

Source: Kaggle dataset with 1,421 instances.
Data Cleaning: Removal of incomplete or inconsistent records.
Normalization: Ensuring values are scaled to eliminate bias due to variable magnitudes.

Feature Selection and Engineering:

Parameters like HGB, MCV, MCHC, and RDW are selected based on their diagnostic significance.
Thresholds are defined for anemia classification (e.g., MCV for microcytic: <80).

Model Training:

Algorithms: Random Forest, Decision Trees, and Linear Discriminant Analysis.
Dataset split into training (80%) and testing (20%) sets.

Evaluation:

Metrics: Accuracy, precision, recall, F1 score, and confusion matrix.

Implementation:

Integration into a diagnostic system for clinical use.
Designed for regions with limited access to laboratory tests.

The related works section should emphasize studies directly aligned with the methodological approach under development. The bibliometric analysis currently included is overly extensive. Synthesizing this information or removing it, if deemed unnecessary, would streamline the section.

R:/

Obviously, it is a bit extensive; however, we wanted to find the similarity of our work with the most recent ones to compare the performance of our model with other related works.

The background section may not be essential. Readers interested in the topic are likely already equipped to understand the context without it. Its removal would focus attention on the core contributions of the paper.

R:/

Section removed

Figure 3 adds little value to the work. Replacing it with pseudocode that demonstrates the logic and development of the proposed algorithm would better illustrate how the problem was addressed.

R:/

Figure 3. Confusion matrix.

Figure removed

The presentation of Figure 4 requires improvement for better clarity and visual appeal.

R:/

Figure 4. Performance of supervised learning algorithms

changed figure

A more detailed analysis of the results obtained should be provided, aligning it with the proposed methodology. This addition would enhance the reader's understanding of the implications and robustness of the findings.

R:/

According to this, the model can be affected by

Prediction bias: models can favor dominant classes.

Reduced performance metrics, such as precision, recall, and F1 scores, for minority classes can be poor.

Overfitting to common classes: Models can have difficulty generalizing minority class patterns.

On the other hand, given the high precision of the Random Forest algorithm, it was demonstrated by comparing metrics with related works that it is far superior, even compared in this same research with other ML algorithms such as Linear Discriminant and Decision Trees. This allows for the generation of confidence in the diagnosis of anemia types for medical personnel. This will allow rapid diagnoses to be generated through the use of the developed system, especially in rural areas where medical care is very poor.

R:/

The choice of the Kaggle dataset was justified by its accessibility and relevance for machine learning research. However, the discussion should address potential biases owing to the lack of representation of the dataset for certain demographic groups or clinical settings, especially in underserved areas. A comparison with datasets such as those from local health ministries (e.g., the Colombian Ministry of Health) highlights the limitations and demonstrates the trade-offs involved in this selection. However, despite the existence of clinical laboratories, hospitals, and even the Ministry of Health, there is no access to this type of data, either because of privacy policies or a lack of culture in managing and storing this type of clinical data.

Exploring other techniques to validate the model's performance is recommended to avoid potential overestimations. This would ensure the robustness and reliability of the results.

R:/

Indeed, as can be seen in the comparative analysis of the decision tree models, the Linear Discriminant and Random Forest models were used. However, the following were explored for this same work: SVM, Logistic Regression, Gaussian NB and KNeighbors, and their performance was very low; therefore, the best performers were presented.

The discussion section could be expanded to address the ethical and practical challenges of implementing AI in healthcare, particularly in underserved regions.

R:/

One ethical consideration relates to biases in algorithmic predictions and their implications for resource-limited health care systems. Practical challenges include implementing such systems in rural areas with limited digital infrastructure, training healthcare professionals, and addressing patient privacy concerns when handling sensitive healthcare data.

Potential biases in the dataset and their impact on clinical decision-making should be discussed. Addressing these limitations would provide a more balanced perspective and enhance the study’s applicability in real-world scenarios.

R:/

We address biases in the dataset (e.g., unbalanced class representation) in the Results section; however, related issues such as regional limitations would still need to be addressed in conjunction with health authorities, which is currently unclear. These factors can affect the accuracy and impartiality of clinical decisions. Highlighting the impact of these biases on diagnosis, especially for minority or underrepresented patient groups, would strengthen the credibility of the research; however, it is a team effort that demands synergy and financial resources to implement.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

All comments and considerations for improvement were correctly addressed by the authors.

Author Response

Dear Reviewer:

Thank you for your thorough review and constructive comments. We have carefully revised the manuscript to address your concerns. A detailed account of the changes made is provided below:

Comment 1:

The authors now compare the results to previous studies of the same dataset. However, It still states in the abstract and the discussion that it is outperforming previous studies, but this is not true, since a previous study had better results. Hence this statement should be removed in both places.

Answer:

Abstract: Removed the phrase "outperforming previous studies" and replaced it with:

The random forest model achieved an accuracy of 99.82%, demonstrating strong performance in subclassifying anemia types compared to prior work focusing solely on binary classification.

Discussion: Deleted the sentence "Our system achieved an accuracy of 99.82% in prediction, outperforming the methods described above" and revised the comparison to clarify:

While Sabatini (2022) achieved 100% accuracy in binary classification (presence/absence of anemia) using the same dataset, our study extended this by subclassifying anemia into microcytic, normocytic, and macrocytic types with 99.82% accuracy, demonstrating the feasibility of multiclass classification.

Comment 2: Note that this study does consider subclassification, but this is just algorithmically determined, not clinically, and hence not really an improvement (the Random Forest will just learn these thresholds that are put in). One should be also clear about this in the discussion.

Answer:

Discussion: Added the following clarification:

Our subclassification was based on predefined clinical thresholds (e.g., MCV <80 for microcytic anemia) derived from the medical literature. The Random Forest algorithm learns these thresholds algorithmically, which speeds up classification, but does not represent a clinical breakthrough. In future studies, these thresholds will be validated using dynamic clinical criteria.

Comment 3:

Table 1 titles should be in English, and the current study’s ML algorithms must be specified.

Answer:

Table 1:

Translated column headers to English:

"Paper" → "Study"
"Años" → "Year"
"Ours" → "Current Study"

Updated the "AI/ML Techniques Used" for the current study to:
"Random Forest, Decision Trees, Linear Discriminant Analysis"

We appreciate your guidance in revising our manuscript. The revisions have ensured accuracy, transparency, and alignment with the standards of the journal. Please let us know if any further adjustments are required.

Author Response File: Author Response.docx

Article Menu

Anemia Classification System Using Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI