Next Article in Journal
Clinical Characteristics and Outcomes for Neonates with Respiratory Failure Referred for Extracorporeal Membrane Oxygenator (ECMO) Support
Previous Article in Journal
Immediate Versus Semi-Elective Treatment of Stable Slipped Capital Femoral Epiphyses (SCFE)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques

1
Department of Mathematics & Statistics, International Islamic University Islamabad, Islamabad 44000, Pakistan
2
Department of Health Policy and Management, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30460, USA
3
Department of Statistics, Lahore College for Women University, Lahore 54000, Pakistan
*
Author to whom correspondence should be addressed.
Children 2025, 12(7), 924; https://doi.org/10.3390/children12070924 (registering DOI)
Submission received: 18 March 2025 / Revised: 31 May 2025 / Accepted: 5 July 2025 / Published: 12 July 2025

Abstract

Background/Objectives: Early childhood anemia is a severe public health concern and the most common blood disorder worldwide, especially in emerging countries. This study examines the sources of childhood anemia in Ghana through various societal, parental, and child characteristics. Methods: This research used data from the 2022 Ghana Demographic and Health Survey (GDHS-2022), which comprised 9353 children. Using STATA 13 and R 4.4.2 software, we analyzed maternal, social, and child factors using a model-building procedure, logistic regression analysis, and machine learning (ML) algorithms. The analyses comprised machine learning methods including decision trees, K-nearest neighbor (KNN), logistic regression, and random forest (RF). We used discrimination and calibration parameters to evaluate the performance of each machine learning algorithm. Results: Key predictors of childhood anemia are the father’s education, socioeconomic status, iron intake during pregnancy, the mother’s education, and the baby’s postnatal checkup within two months. With accuracy (94.74%), sensitivity (82.5%), specificity (50.78%), and AUC (86.62%), the random forest model was proven to be the most effective machine learning predictive model. The logistic regression model appeared second with accuracy (67.35%), sensitivity (76.16%), specificity (56.05%), and AUC (72.47%). Conclusions: Machine learning can accurately predict childhood anemia based on child and paternal characteristics. Focused interventions to enhance maternal health, parental education, and family economic status could reduce the prevalence of early childhood anemia and improve long-term pediatric health in Ghana. Early intervention and identifying high-risk youngsters may be made easier with the application of machine learning techniques, which will eventually lead to a healthier generation in the future.

1. Introduction

Anemia is a significant global public health issue, particularly in emerging countries [1,2]. It primarily affects children under five and pregnant women due to micronutrient deficiencies of vitamin B12 and folate, as well as an inadequate intake of iron-rich foods [3,4]. About 39.8% of children 5 years or below and 36.5% of pregnant women are anemic worldwide [3,5]. In Africa, the prevalence of anemia is 67.6% among children aged 5 years or below [6]. According to WHO estimates, prevalence among children under five might reach 68% [7]. Ghana, a country in West Africa, has an anemia rate of 54.5% among children in early childhood, with notable differences between rural and urban areas, 59.1% vs. 41.1%, respectively [8]. According to WHO guidelines, anemia is considered a public health problem when the prevalence reaches 5%. The public health significance is mild between 5% and 19%, moderate between 20% and 29%, and severe at 40% [8]. The most prevalent type of anemia is iron deficiency anemia (IDA) [9]. IDA is expected to be the basis of almost half of all anemia instances and the cause of one million deaths per year worldwide [9,10,11]. Nevertheless, there remains a lack of accurately documented and biochemically evaluated IDA prevalence based on representative population-based samples in the African region.
Anemia occurs when there is a shortage of red blood cells, a reduction in their size, or a decrease in the hemoglobin concentration below normal. This illness may make it more difficult for the blood to carry oxygen throughout the body efficiently [12]. The most pronounced group of the affected population, due to the absence of red blood cells, is pregnant women and children under 5 years of age, who need more iron for better growth [13]. The primary reason for anemia in children under five years old is insufficient iron [14,15], vitamin B12 (cyanocobalamin), and vitamin B9 (folate), increasing the risk of death [16,17]. A low consumption of healthy food, the presence of worms, the mother’s history of anemia, problems with the intestines, and a lack of awareness are additional risk factors [18]. Anemia and malnutrition have serious long-term consequences on children’s growth, development, and overall health [19,20].
In many African nations, anemia is mainly caused by parasite infection, which increases the breakdown of red blood cells [21,22]. According to recent research, over 49% of Ghanaian children between the ages of 6 and 59 months suffer from anemia, with mild anemia accounting for 27.6% and moderate to severe anemia accounting for 21.4%. This is consistent with results from the Ghana Demographic and Health Survey of 2022, which also showed a 49% prevalence in this age range [23]. While child anemia remains an uncontrolled health concern in undeveloped and underprivileged nations, it has a significant public health impact and high prevalence in low-resourced countries [24,25].
In addition to dietary deficiencies, childhood anemia arises from the interaction of several factors at different influencing levels, including maternal characteristics, child traits, and individual biological and social factors [26,27]. Children’s anemia rates are higher in areas where infectious illnesses like malaria and hookworm infections are common because they seriously affect iron intake and metabolism [26,27,28]. Chronic anemia results from genetic diseases impacting hemoglobin production and function [29]. Social determinants also affect anemia. For instance, anemia is more common among children from lower socioeconomic families, those with low levels of mother’s education, household income, and availability of medical treatment [27,30,31]. Other risk factors for childhood anemia include maternal age, nutritional status, and underlying medical problems (such as maternal anemia) [26,28,32]. Premature or underweight babies are more likely to suffer from anemia because their physiological systems are not fully mature, which can impact how well they absorb and use nutrients [30].
Applications for machine learning in the medical domain are growing. The two fields that benefit from the use of ML are illness diagnosis and outcome prediction [33,34]. While conventional statistical approaches try to identify relationships between variables, machine learning algorithms concentrate on generating accurate and practical predictions [35]. Unlike traditional statistical approaches, machine learning approaches are data-driven and free from strict assumptions [36]. In the healthcare industry, machine learning (ML) techniques outperform conventional techniques in health prediction. The choice between machine learning (ML) techniques and logistic regression depends on the specific problem, data, and goals. ML models often achieve higher accuracy than logistic regression, especially with large datasets.
There is a lack of comprehensive and updated research evidence concerning child-related, parental, and societal predictors of early childhood anemia in Ghana. Therefore, it is imperative to assess inconsistencies in the children of Ghana to identify the determinants and barriers to improving child anemia rates. This study contributes to the literature on this topic by examining the predictors of child anemia in Ghanaian children using conventional logistic regression and machine learning algorithms. We pursue three research questions: (1) Which social determinants of health operationalized through household sociodemographic characteristics are associated with childhood anemia? (2) Which maternal characteristics are associated with childhood anemia? (3) Which child characteristics predict childhood anemia? The study’s findings yield critical research evidence for evidence-based public health practice and policy.

2. Materials and Methods

2.1. Dataset and Study Population

The Ghana Demographic and Health Survey (GDHS), a nationally representative household survey, provided the data for this study. Data on various demographic topics, including women’s and children’s health and nutritional status, are collected through the survey, which is conducted every five years. The funding for the GDHS was provided by the Government of Ghana, the United States Agency for International Development (USAID), the U.S. President’s Malaria Initiative (PMI), the United Nations Population Fund, UNFPA, UNICEF, the World Bank, the Global Fund, the Korean International Cooperation Agency (KOICA), and the World Health Development Office (UK-FCDO) and World Health Organization (WHO). The ICF provided technical assistance through the DHS Program, a USAID-funded initiative that supports the implementation of demographic and health surveys in countries worldwide. The GDHA used a two-stage stratified cluster sampling approach. In the initial step, clusters were chosen as the primary sampling units. Households, the secondary sampling units, were selected in the second phase. From the households in the sample, children aged 6 to 60 months had their hemoglobin levels tested. Interviews were conducted with 7044 males aged 15–59 in half of the chosen houses and 15,014 women aged 15–49 in 17,933 households, representing a nationally representative sample. Estimates for the 16 regions of the nation, as well as for urban and rural areas, are provided by the sample design for the 2022 Ghana Demographic and Health Survey (GDHS). To address the missing data, we employ the multiple imputation approach. The analysis is predicated on established cases. For descriptive results, a weighted sample of 9353 children aged 6 to 60 months was analyzed in this research.

2.2. Study Variables and Measurements

Childhood anemia status is the outcome variable. As part of the GDHS survey, blood samples were collected from children aged 6 to 60 months for anemia testing. Consent was obtained from their parents or other responsible adults. A microcuvette was used to collect blood samples. A portable battery-operated HemoCue analyzer (HemoCue, Ängelholm, Sweden) was used for hemoglobin analysis, and the findings were provided on the spot. Anemia was quantified as a binary outcome variable for this research. As a result, children with anemia were assigned a value of 1, and those without it were assigned a value of 0. If a child’s altitude-adjusted hemoglobin level was less than 11.0 g/dL, they were considered anemic [37].
The twenty-seven predictor variables (features) used in this study were selected from prior research conducted in Ghana and other locations based on their association with childhood anemia. These are region (Wester, Central, Greater Accra, Volta, Eastern, Ashanti, Western North, Ahafo, Bono, Bono East, Oti, Northern, Savannah, North East, Upper East, or Upper West), the place of residence (urban or rural), household members (<4, 4–6, 7–9, or >9), the source of drinking water (unimproved or improved), the sex of the household head (female or male), the father's education (no education, primary, secondary, or higher), socioeconomic status (poor, middle, or rich), the mother's education (no education, primary, secondary, or higher), maternal age (15–19, 20–24, 25–29, 30–34, 35–39, 40–44, or 45–49), maternal smoking (no or yes), whether the child was ever breastfed (no or yes), the initiation of breastfeeding (immediately, within first hour, or within first day), the mother's occupation (not working or working), the intake of iron during pregnancy (no or yes), the consumption of drugs for intestinal parasites during pregnancy (no or yes), birth order number (first born, 2–4, or >5), birth type (single or multiple birth), the sex of the child (male or female), the size of the child at birth (small, average, or large), formula milk consumption (no or yes), the child's age in months (0–6, 7–12, 13–24, 25–36, 37–48, or 49–60), stunting (no, moderate, or severe), wasting (no, moderate, or severe), underweight (no, moderate, or severe), the intake of fruits and vegetables (no or yes), baby postnatal checkup within 2 months (no or yes), and whether the child was given zinc (no or yes).

2.3. Statistical Analysis

SPSS version 20 and STATA version 13 were used for data cleaning and descriptive analyses, and R 4.4.1 was used to apply machine learning techniques to assess childhood anemia. Four machine learning methods, including decision trees (DTs), random forest (RF), K-nearest neighbor (KNN), and logistic regression (LR), were employed to identify the most significant predictors of early childhood anemia in Ghana. The performance of these different machine learning algorithms was then evaluated. Our dataset is relatively small, categorized as a moderate to large dataset. Datasets of such sizes can perform well with our chosen techniques. Secondly, our chosen algorithms strike a balance between model complexity and interpretability, which is vital for understanding the relationships between the predictors and the outcome variable. Our chosen algorithms are computationally efficient and can be run on standard hardware, ensuring essential reproducibility.
The Machine Learning (ML) Specific Checklist has been followed, and this study is well-planned, executed, and evaluated. We have clearly defined the problem and determined the goal by specifying the questions we are trying to answer. At the first stage of data preparation, the dataset was checked for errors, inconsistencies, and missing values. The data were preprocessed for feature scaling and coding after splitting them into training and testing sets. At the model selection stage, we identified the problem type as a classification and selected suitable models, including random forest, logistic regression, decision trees, and K-nearest neighbor. Model evaluation was based on appropriate metrics, such as the ROC curve, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, SHAP plot, and partial dependence plot. Following this, discrimination and calibration techniques were employed to assess the model’s performance.
Discrimination and calibration techniques were used to assess the models’ prediction ability [38]. The degree of agreement between each model’s predicted probabilities of anemia’s presence and observed anemia frequencies was measured using calibration plots; the models’ discrimination was evaluated using the area under the curve, which was estimated using the receiver characteristic curve; each model’s accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were obtained and determined the most significant predictors of childhood anemia based on the most accurate algorithms.

3. Results

Table 1 summarizes descriptive and univariate statistics on risk factors associated with childhood anemia. The table is divided into three categories: societal characteristics, parental characteristics, and child characteristics. The odds ratios (OR) and 95% confidence intervals (CI) for several traits associated with childhood anemia are also presented in this table. Around 93% of households have more than nine members. Approximately 59% of households reside in rural areas. A substantial proportion of households, 41%, obtained their drinking water from sources that were not improved. About three-fourths (73%) of households were led by males. Approximately 65% of fathers had some formal education. About 56% of the children have poor socioeconomic status. Approximately 16% of mothers have only completed primary education. Approximately 24% of the mothers belonged to the 20–29 age group. Women’s smoking ratio was reported to be less than 1% in Ghana. About half (54%) of the pregnant women consumed intestinal parasite medication, and most (91%) mothers consumed iron during pregnancy. More than half (55.15%) of children were breastfed, and 64% were nursed within the first hour of birth. In Ghana, most mothers (83%) are employed.
The explanatory factors that are substantially connected with anemia status in the country of Ghana include region, the place of residence, the sex of the household head, the father’s education, socioeconomic level, the mother’s education, maternal age, whether the child was ever breastfed, the mother's occupation, the intake of iron during pregnancy, taking medicines to treat intestinal parasites when pregnant, birth order number, the sex of the child, the child’s age in months, the intake of fruits and vegetables, stunting, and infant postnatal check within 2 months; these had a p value less than 0.05. Children in the Northern region are more likely to be anemic (OR = 1.996; 95% CI, 1.646 to 2.345). Children residing in rural areas had a higher risk of being anemic (OR = 1.516; 95% CI, 1.335 to 1.722) compared to children from urban families. The risks of childhood anemia are substantially reduced in households headed by women (OR = 0.831; 95% CI, 0.722 to 0.956). The children of fathers with higher formal education have lower chances of developing anemia (OR = 0.485; 95% CI, 0.390 to 0.603) compared to the children whose fathers are uneducated.
Children from a higher socioeconomic status had a lower chance of being anemic (OR = 0.449; 95% CI, 0.386 to 0.522) compared to those from a lower socioeconomic status. A child with a highly educated mother was less likely to be anemic (OR = 0.400; 95% CI, 0.309 to 0.519) than a child with an uneducated mother. Compared to mothers who are between the ages of 15 and 19, children whose mothers are between the ages of 35 and 39 are less likely to become anemic (OR = 0.541; 95% CI, 0.362 to 0.809). Children of working mothers were less likely to develop anemia (OR = 0.756; 95% CI, 0.632 to 0.904) compared to those of non-working mothers. Mothers who take iron during pregnancy have children who have a lower chance of developing anemia (OR = 0.575; 95% CI, 0.410 to 0.807) as compared to mothers who do not take iron during pregnancy. The prevalence of anemia is lower in children whose mothers consume drugs for intestinal parasites (OR = 0.680; 95% CI, 0.568 to 0.814). A child with a birth order of five or higher has a greater chance of developing anemia (OR = 1.476; 95% CI, 0.999 to 1.355) as compared to the first child. Female children have a lower chance of developing anemia (OR = 0.854; 95% CI, 0.753 to 0.968). Children aged 49 to 60 months are less likely to be anemic (OR = 0.342; 95% CI, 0.217 to 0.539) than those aged 0 to 6 months. A child with moderate stunting is less likely to develop anemia than a child suffering from severe stunting (OR = 1.423; 95% CI, 0.301–1.594). A child who consumes fruits and vegetables is less likely to become anemic (OR = 0.268136; 95% CI, 0.055868 to 1.523078) than those who do not eat fruits and vegetables. A child having a postnatal checkup within two months is less likely to develop anemia (OR = 0.5728842; 95% CI, 0.3585152 to 0.9154322).
Table 2 shows the multivariate analysis of anemia status for children 5 years or below in Ghana. The father’s education, socioeconomic status, mother’s education, baby’s postnatal checkup within 2 months, and iron intake during pregnancy proved significant characteristics in Ghana (p < 0.05). The results show that a father’s education is the first important variable associated with childhood anemia in Ghana. Compared to fathers with no formal education, the odds of anemia are lower for kids if the father had secondary education (AOR = 0.652; 95% CI: 0.288 to 2.126) or a higher level of education (AOR = 0.156; 95% CI: 0.182 to 5.457). The study established that middle-class households were less likely to have anemic children (AOR = 0.010; 95% CI: 0.0001 to 0.568) than low-income households. The children of low-income families are more likely to be anemic than those of middle-income families. Additionally, the children of highly educated mothers (AOR = 0.002; 95% CI: 0.0009 to 0.529) are less likely to be anemic than those of mothers with only an elementary or secondary education. One of the most critical factors that might affect children’s anemic status is their initial period of breastfeeding. Children fed immediately after birth are less likely to develop anemia (AOR = 3.445; 95% CI: 1.540 to 7.313) than children fed within the first hour. Additionally, iron consumption during pregnancy is a significant contributing factor to childhood anemia. Compared to mothers who do not take iron throughout pregnancy, those who do have a lower risk of developing anemia (AOR = 0.017; 95% CI: 0.004 to 6.122). Childhood anemia is associated with a baby’s postnatal examination within two months. Children who receive a postnatal checkup within two months are less likely to be anemic (AOR = 0.732; 95% CI: 3.452 to 8.076) than children who do not receive a postnatal exam within two months.
Figure 1 illustrates the predictive quality of logistic regression models. The total area under the receiver operating curve (ROC) is 64.60% for the logistic regression model. Moreover, the ROC curve is closer to the upper left diagonal, which specifies that the model performed well.
Table 3 displays the performance of the machine learning algorithms built on training data. After training the dataset, the four machine learning algorithms projected childhood anemia, as shown in Table 4. We used a confusion matrix to evaluate the prediction performance of the algorithms. According to the confusion matrix, the random forest model accurately predicted that 1387 children would have anemia and 280 would not. It was incorrectly expected that 772 children would not have anemia, and 366 would. With an AUC of 98% (95% CI: 99.55–99.93), a sensitivity of 98% (95% CI: 99.35–99.93), a specificity of 67% (95% CI: 99.29–99.95), a classification accuracy of 95% (95% CI: 99.48–99.89), and an F1 Score of 98.20%, the random forest model exceeds the other models in predicting childhood anemia. The second-best algorithm for predicting anemia in children was the logistic regression algorithm. The K-nearest neighbor algorithm provided the least accuracy among all the algorithms used. The random forest model performed better than the other models. Given its capacity to manage complex feature interactions and minimize overfitting, random forest appears to be a good fit for this challenge.
Additionally, the receiver operating characteristics curves (ROCs) for each algorithm are shown in Figure 2. The area under the curve (AUC) is a measure of the overall performance of the classifier. The random forest model presents the largest AUC value (98%) among the four machine learning models utilized in this investigation. It suggests that it is the most effective in discriminating between children with and without anemia.
To illustrate the accuracy of the four models, we created calibration plots for each of the four ML models shown in Figure 3. The calibration figure shows the degree of agreement between the observed anemia frequency (Y-axis) and the average predicted probability of anemia predicted by the models (X-axis). The graphical representation in Figure 3a demonstrates that the calibration by the random forest algorithm is good, as the mean predicted probability of childhood anemia is comparable to the actual childhood anemia rate across the entire dataset distribution. The logistic regression algorithm appeared to be the second-best fit. The measured frequency differs significantly from the predictions made by the other models. A p value of 0.98 for the Hosmer–Lemeshow test and 0.72 for the Pearson chi-square goodness-of-fit test indicated that the random forest model accurately describes the data.
Figure 4a shows the essential features of the random forest model. The features are ranked based on their importance scores, with the most critical features at the top. The vital score measures the contribution of each feature to the model’s prediction accuracy. The features of importance scores are displayed as probability values, which range from zero to one. The region has the highest importance score, indicating that it makes the most significant contribution to the model’s prediction accuracy. The child’s age in months has the second-highest importance score, suggesting that it plays a substantial role in model prediction. Maternal age is also quite important, ranking third in the model. The size of the child at birth is slightly less critical, but it still makes a significant contribution to the model. Maternal smoking has the lowest importance score, suggesting it has the least impact on the model outcomes.
Figure 4b displays the variable importance scores for the logistic regression algorithm, which include postnatal checkup, birth type, maternal smoking, the child's age, whether the child was given zinc, stunting, and the child’s sex. In contrast, the sex of the household head, birth size, place of residence, and region are the least important variables.
Figure 5 illustrates the SHAP (Shapley Additive exPlanations) technique, which provides a global interpretation of each variable’s contribution to the model’s predictive performance. Higher absolute SHAP values indicate a greater influence on the model’s output. The y-axis presents variables related to societal, parental, and child characteristics. At the same time, the x-axis displays the corresponding SHAP values, highlighting the impact of each feature on the model’s predictions. Feature values are color-coded: yellow represents lower values, and purple represents higher ones. The most influential feature is child age, with a SHAP value of 0.096 indicating a strong positive effect on the model’s forecast. Intake of zinc also shows a high impact on the model output. The region also shows significant effects, suggesting that geographic factors play a role. Parental education—both maternal and father—shows a moderate impact (0.040 and 0.033), underscoring the role of educational attainment in improving child health outcomes. Economic status and maternal age demonstrate slightly lower influence (0.032 and 0.035), suggesting that both maternal factors and household economic conditions contribute to predictive accuracy. Additional significant demographic variables include birth order, breastfeeding history, and stunting, each of which may affect child health outcomes. In contrast, variables such as postnatal checkups and maternal smoking exhibit minimal influence, implying limited predictive utility. Overall, features with SHAP values close to zero have negligible effects, whereas those with values farther from zero exert a more substantial positive or negative impact on the model’s predictions.
Figure 6a–h present the partial dependence plots (PDPs) of the top features identified by importance score in the random forest model. Each subplot illustrates the relationship between a specific feature (x-axis) and the predicted outcome (y-axis, labeled as “yhat”). Figure 6a shows a general downward trend in the values of yhat. In Figure 6b, a clear positive linear trend is observed, indicating that as child age increases, the predicted value also rises. This suggests that older children are associated with a higher predicted risk of childhood anemia, explaining the increased risk of anemia with advancing age. Figure 6c displays a more complex pattern: the predicted value (yhat) begins at approximately −0.20 when maternal age is 2, declines initially, and then gradually increases, surpassing −0.18 around maternal age 6. This trend reveals an inverse association at lower maternal ages, followed by a positive relationship at higher ages. These nuanced trends could help understand maternal health and infant development. Figure 6d demonstrates a two-phase relationship between birth size and predicted childhood anemia. In the first segment (up to a birth size of 1.5), a downward trend is evident, indicating that smaller birth sizes are linked to lower predicted values, which may reflecta higher risk of anemia. Beyond this poin at 2.0t, the trend reverses sharply upward, indicating that larger birth sizes are associated with more favorable health outcomes for anemi, could imply various biological, health and environmental factors at playa. In Figure 6e, a direct, positive association is evident between paternal education and predicted outcomes. Higher levels of fathers’ education correspond to increased predicted values, suggesting a beneficial effect on childhood anemia. A similar pattern is seen in Figure 6f, where improved maternal education levels also show a positive impact on predicted outcomes, further emphasizing the importance of parental education in influencing childhood anemia. Figure 6g reveals a negative relationship between birth order and the predicted outcome. As birth order increases, the predicted value gradually declines, indicating that children born later in birth order may experience comparatively lower health outcomes related to anemia. Finally, Figure 6h shows that, although predicted values remain negative across the range, a positive linear trend is observed as breastfeeding initiation increases. This suggests an inverse relationship between breastfeeding and anemia status—proper breastfeeding initiation corresponds with a lower likelihood of anemia.

4. Discussion

Our study is the first of its kind, conducted on a nationwide dataset of 9353 children in Ghana, addressing a specific gap in the application of machine learning techniques for assessing anemia status and essential factors throughout the country. We believe our research contributes to the existing body of knowledge by conducting a multivariate modeling study that incorporates various machine learning techniques, which have not been previously performed in Ghana. In low-resourced countries such as Ghana, where most of the children are malnourished and anemic, accurate assessment of the anemic status of children through machine learning algorithms can play a vital role in reducing the disease burden. The current study uses a quantitative cross-sectional study design based on data from the GDHS survey to examine the prevalence of anemia and the factors associated with it among children of five years and under in Ghana. We performed a binary logistic regression model using suitable machine learning techniques, including random forest, logistic regression, KNN, and decision trees, simultaneously. Findings have shown that more than half (54.20%) of children of five years and under five years experienced anemic disorders.
Among the factors evaluated, paternal education, socioeconomic status, iron intake during pregnancy, and the baby’s postnatal checkup within two months of birth emerged as the significant predictors of a child’s anemia status. Overall, paternal education played a strong protective role in a child’s anemic status, implying that supporting paternal education and improving societal behavior can contribute to improvements in the anemic status of young children. There should be a supportive environment for parents to prioritize their children’s health and adopt healthy feeding practices. In Ghana, there is a need for focused programs to educate parents, raise their awareness about anemia prevention and treatment, and encourage the adoption of healthy behaviors. The government should improve access to healthcare services, particularly in rural or underserved areas.
Our results indicate that children of educated mothers have a lower likelihood of being anemic compared to children of mothers with no formal education. Educated mothers tend to be better aware of health-related information and are better prepared to protect their children against the risk of malnutrition and anemia [39]. Our findings align with numerous previous studies that have shown an educated mother is significantly less likely to have anemic children [25,33,34,35]. However, a significant number of children of mothers with good nutritional knowledge suffered from anemia, demonstrating that although education is essential, other variables also come into play [40]. Nutritional behaviors and food choices are influenced by maternal education. More highly educated mothers often provide their kids with better nutrition, including nutritious food and supplements high in iron, which are essential for preventing iron deficiency anemia [41,42,43]. Our findings show a significant association between a father’s education and early childhood anemia. A child whose father has received primary and secondary education has a lower chance of developing anemia. These findings suggest that higher paternal education may help reduce the risk of iron deficiency in children. Our study found that middle-income and affluent youngsters were less likely to be anemic. This is consistent with other studies showing that families with higher incomes are in a better position to offer nutritious foods and access to medical care, both of which are essential for avoiding anemia [44]. More affluent households may be able to afford a more diverse diet that includes foods high in iron [45,46]. Conversely, low-resource families may rely on less bioavailable iron sources, such as those found in plant-based diets, raising the risk of anemia.
The risk of anemia in children is lower when mothers consume iron during pregnancy. This finding is also supported by a study [47] which reports that such children are less likely to suffer from anemia. The results highlighted that regular supplementation might reduce the chances of low birth weight and maternal anemia, all of which are connected to a higher incidence of childhood anemia. Sunuwar et al. showed that one of the most critical factors in predicting anemia in children aged 6 to 59 months was the mother’s compliance with iron supplementation [26]. Compared to children whose mothers did not take iron supplements, children whose mothers did had a considerably lower risk of being anemic. This implies that to ensure children have sufficient iron throughout their early development, mothers must regularly consume adequate amounts of iron. Previous research has shown that infants born to mothers who maintained adequate hemoglobin levels during pregnancy have better health outcomes, including lower rates of anemia. The evidence supports the notion that maternal nutrition, particularly iron intake, plays a critical role in children’s long-term health [46].
The baby’s postnatal examination, conducted within two months after birth, is another significant contributor to the overall assessment. Our study revealed that early childhood anemia could be reduced if newborns received postnatal care. The American Academy of Pediatrics emphasizes the importance of screening children, particularly those at risk, for anemia. Frequent postnatal examinations enable early anemia identification and timely treatment, both of which are essential for enhancing health outcomes [47].
A random forest algorithm achieves a higher prediction accuracy of 94.74% (95% CI: 90.58–95.84%) compared to other machine learning algorithms, indicating that the child's age, maternal age, father’s education, birth size, mother’s education, and birth order have the highest importance scores. Similarly, the second-best machine learning technique for predicting childhood anemia is logistic regression, with an accuracy of 67.35% (95% CI: 66.20–68.49%). According to Zemariam et al., the best machine learning method for predicting anemia in young Ethiopian girls is the random forest algorithm [48]. Other studies have demonstrated stronger predictive ability of the random forest algorithm than other classifiers, ranging from 82% to 98.4% accuracy [49]. The random forest model has been demonstrated to detect significant predictors, including the mother’s health, household circumstances, and child morbidity [50]. According to other studies conducted in Bangladesh [32] and in Afghanistan [51], the random forest method performs the best when compared to other machine learning algorithms. However, a former study reported that the naïve Bayes model outperforms KNN and support vector machine models with an accuracy of 99% [52]. Anemia is significantly predicted by the child’s age, particularly if the child is between 6 and 23 months old. This age group is more vulnerable to iron deficiency and anemia, according to studies, making it an essential consideration in predictive models [53].
Several factors predicted childhood anemia. Nonetheless, the logistic regression and random forest models differ in their level of significance. The random forest analysis reveals that the following variables are the most significant predictors of anemia in Ghanaian children: region, the child's age in months, maternal age, the father’s education, birth size, the mother’s education, birth order number, and socioeconomic status. In Ghanaian studies on childhood anemia, the region does play a significant role. The Upper East and Upper West regions have consistently had the highest prevalence of anemia among children under five, with rates as high as 88.9% and 88.1%, respectively, in previous surveys. New data continue to identify these regions as high-risk areas [54,55]. Child age in months also has a high importance score. These findings are consistent with various studies. According to research examining data from Ethiopia, the prevalence of anemia was greater in children aged 6–23 months (72.0%) than in older children (50.1%). One known risk factor for childhood anemia is low birth weight. This study emphasized the significance of birth size as a predictor by showing that children born with low birth weight had a greater prevalence of anemia. Maternal age has a crucial role, as children born to mothers under the age of twenty and women over the age of forty have greater incidences of anemia [53] and the research from India showed that children between the ages of 6 and 24 months had a much higher likelihood of being anemic than those between the ages of 36 and 59 months. The children from households with lower levels of education are more likely to suffer from anemia because they have less access to nutrient-dense diets [56]. Although their relative importance fluctuates, both logistic regression and random forest consistently identify the child's age in months and birth order number as significant factors. Nevertheless, the two algorithms found conflicting predictors, such as the region being given considerable value in the RF method and the postnatal checkup being given great importance in logistic regression. Furthermore, maternal smoking was found to be the least significant predictor of childhood anemia in random forest, but the third most significant predictor in logistic regression.
In logistic regression, the postnatal checkup shows a high importance score, indicating that postnatal checkups are essential for newborn children. According to WHO guidelines, receiving thorough care at this time is crucial to avoiding problems [57]. Because of their faster growth and greater need for minerals like iron, younger children, especially those under 24 months, are more likely to develop anemia [58,59]. Zinc has a high importance score; zinc is essential for general health and immunological function. Children’s enhanced hemoglobin levels have been linked to zinc supplementation, illustrating the significance of this nutrient in dietary treatments meant to prevent anemia [60]. Pregnancy-related intestinal health drug consumption may potentially influence childhood anemia. According to research, maternal health interventions, including the use of such drugs, are essential for avoiding anemia in children [8].

Strengths and Limitations of the Study

A key strength of this study lies in its use of the recent wave of nationally representative data from the Ghana Demographic and Health Survey (GDHS-2022), which incorporates over 27 associated predictors, a broad range of societal, parental, and child-level variables relevant to anemia among children up to five. Previous anemia research conducted in Ghana utilized earlier DHS waves (2014–2016) and employed traditional logistic regression with a limited array of covariates [7,8,54]. In comparison, our research of GDHS-2022 (n = 9353) includes a comprehensive set of predictors at the household, parental, and child levels, evaluates four machine learning methods, and employs SHAP to demonstrate that region remains a significant factor, despite multivariable logistic regression diminishing its importance. After comparing the predictive performance of these models, the study identifies substantial determinants of childhood anemia within the population. This innovative integration of contemporary data and interpretable machine learning differentiates our research.
The performance of each model was assessed using a range of evaluation metrics. Interestingly, while univariate logistic regression identified region as a significant factor associated with anemia, this variable lost significance in the multivariate model. However, the feature importance scores and SHAP (Shapley Additive exPlanations) plots of the best-performing machine learning model, the random forest model, reaffirmed the importance of the region factor. Both of these plots of the machine learning algorithm indicated considerable regional disproportion in anemic prevalence. This finding underscores the need for region-specific intervention strategies.
Moreover, our results suggested that parental characteristics, particularly educational attainment and maternal health behaviors (including iron supplementation during pregnancy), play a more prominent role in predicting anemia than child-related factors. Among child-specific variables, only postnatal checkup within two months emerged as a significant predictor, emphasizing its value for early detection and intervention. These insights contribute to a deeper understanding of the determinants of childhood anemia in Ghana and illustrate the practical value of machine learning approaches for informing targeted public health strategies in similar contexts.
This study has several limitations, which may have influenced the generalizability of the results. First, the secondary data limits our choice of variables in the study. For example, various clinical indicators of anemia, including the pallor of the palms, conjunctiva, and tongue, were absent from the data. Their inclusion could have improved the subpar performance of the machine learning algorithms employed in this study. Studying the use of machine learning algorithms with a richer set of variables to forecast pediatric anemia based on such clinically significant features might be useful in future research. Moreover, information about fever, diarrhea, and cough during the two weeks preceding the study was obtained through interviews with mothers. Such self-reported data, which were not independently verified, could have been prone to several biases, including social desirability bias and recall bias. Another limitation of this study pertains to the missed seed setting, which affects reproducibility. Without setting a seed, there can be inconsistent results, making it challenging to reproduce and exactly compare the two different models. This issue can be addressed in our future work to ensure reproducibility. Regardless of these limitations, the policy and practice-relevant research evidence is valuable due to its strengths. The current study utilizes nationally representative data collected using instruments that have been tested for validity and reliability, ensuring its generalizability and applicability to public health and healthcare interventions. Moreover, this research is innovative in that we employed several machine learning techniques to identify childhood predictors that would have otherwise remained unreported if simple statistical analysis had been applied.

5. Conclusions

The high prevalence of childhood anemia in Ghana and its adverse effects on child development make it a significant public health problem. Approximately 49% of Ghanaian children aged 6 to 60 months have anemia, according to current data from the 2022 Ghana Demographic and Health Survey. Given its detrimental impact on cognitive development, physical growth, and general well-being, anemia continues to be a significant public health problem, affecting about half of all children in this age range. The prevalence of moderate anemia in children was approximately 27.6%, and moderate to severe anemia affected about 21.4% of the population. According to this study, factors such as the father’s education, socioeconomic level, the mother’s education, iron consumption during pregnancy, and the baby’s postnatal checkup will have a substantial impact on childhood anemia. It proposes a comprehensive approach to reducing childhood anemia that encompasses social and economic factors, including iron supplementation and postnatal checkups. It also promotes educating pregnant women and their families about health and well-being. The results demonstrated that, in comparison to the other machine learning algorithms applied in this study, the random forest machine learning algorithm provides stronger prediction accuracy, with the logistic regression method ranking second. Additionally, several combinations of the most significant predictors of childhood anemia have been identified using random forest and logistic regression machine learning methods, with distinct levels of significance. Therefore, in addition to the conventional statistical analysis technique, the study provides evidence on how the machine learning approach can be utilized to better identify the determinants of childhood anemia. Healthcare professionals in the primary healthcare unit may use the findings in their day-to-day clinical work at remote primary healthcare units with inadequate diagnostic resources for identifying anemia. By addressing key child, parental, and societal factors influencing anemia, public health programs can develop targeted interventions in different geographical regions to improve the anemia-related health status of children under five. A greater focus should be placed on strategies that involve nutritional awareness and counseling for parents, easy access to healthcare, and the provision of iron supplements to mothers during pregnancy, as well as deworming programs to reduce parasite infections, which can contribute to a reduction in anemia rates.

Author Contributions

Writing—original draft, M.S. and M.S.B.; writing—review and editing, G.S., A.K. and S.T.O.; visualization, M.S., G.S., M.S.B., A.K. and S.T.O.; validation, M.S., G.S., M.S.B., A.K. and S.T.O.; software, M.S. and M.S.B.; supervision, G.S., A.K. and S.T.O.; project administration, M.S.; methodology, M.S. and M.S.B.; investigation, M.S., G.S., M.S.B., A.K. and S.T.O.; formal analysis, M.S., M.S.B., A.K. and S.T.O.; conceptualization, M.S., G.S., M.S.B., A.K. and S.T.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Committee of Islamic International University, Islamabad, Pakistan (IIU ORIC Bioethics, protocol #110, 31 January 2025).

Informed Consent Statement

Patient consent was waived for this study as it involved the analysis of secondary data that are publicly available. This study was conducted after receiving authorization to access and utilize RDHS datasets provided by the DHS program upon formal request through an online system at https://dhsprogram.com/data/ (accessed on 4 July 2025).

Data Availability Statement

The data set used in the study was taken from the Demographic and Health Survey (DHS) website, and the files are available at the following url: https://dhsprogram.com/data/dataset/Ghana_Standard-DHS_2022.cfm?flag=0 (accessed on 4 July 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLMachine Learning
DTDecision Tree
KNNK-nearest Neighbor
RFRandom Forest
AUCArea Under the Curve
ROCReceiving Operating Curve

References

  1. WHO. Anemia Policy Brief 2014. Available online: https://www.who.int/publications/i/item/WHO-NMH-NHD-14.4 (accessed on 9 February 2025).
  2. Stevens, G.A.; Finucane, M.M.; De-Regil, L.M.; Paciorek, C.J.; Flaxman, S.R.; Branca, F.; Peña-Rosas, J.P.; Bhutta, Z.A.; Ezzati, M. Global, regional, and national trends in haemoglobin concentration and prevalence of total and severe anaemia in children and pregnant and non-pregnant women for 1995–2011: A systematic analysis of population-representative data. Lancet Glob. Health 2013, 1, e16–e25. [Google Scholar] [CrossRef] [PubMed]
  3. World Bank. Prevalence of Anemia Among Children (% of Children Ages 6–59 Months). 2021. Available online: https://data.worldbank.org/indicator/SH.ANM.CHLD.ZS (accessed on 4 July 2025).
  4. World Bank. Prevalence of Anemia Among Women of Reproductive Age (% of Women Ages 15–49). 2021. Available online: https://data.worldbank.org/indicator/SH.ANM.ALLW.ZS (accessed on 9 February 2025).
  5. World Bank. Prevalence of Anemia Among Pregnant Women (%). 2021. Available online: https://data.worldbank.org/indicator/SH.PRG.ANEM (accessed on 9 February 2025).
  6. Tesema, G.A.; Worku, M.G.; Tessema, Z.T.; Teshale, A.B.; Alem, A.Z.; Yeshaw, Y.; Alamneh, T.S.; Liyew, A.M. Prevalence and determinants of severity levels of anemia among children aged 6–59 months in sub-Saharan Africa: A multilevel ordinal logistic regression analysis. PLoS ONE 2021, 16, e0249978. [Google Scholar] [CrossRef]
  7. Aheto, J.M.K.; Alhassan, Y.; Puplampu, A.E.; Boglo, J.K.; Sedzro, K.M. Anemia prevalence and its predictors among children under-five years in Ghana. A multilevel analysis of the cross-sectional 2019 Ghana Malaria Indicator Survey. Health Sci. Rep. 2023, 6, e1643. [Google Scholar] [CrossRef] [PubMed]
  8. Asmare, A.A.; Tegegne, A.S.; Belay, D.B.; Agmas, Y.A. Coexisting predictors for undernutrition indices among under-five children in West Africa: Application of a multilevel multivariate ordinal logistic regression model. BMC Nutr. 2025, 11, 112. [Google Scholar]
  9. American Society of Hematology. The Role of Red Blood Cells in Anemia. Available online: https://www.hematology.org/education/patients/anemia (accessed on 9 February 2025).
  10. Horton, S.; Ross, J. The economics of iron deficiency. Food Policy 2003, 28, 51–75. [Google Scholar] [CrossRef]
  11. Shekar, M.; Kakietek, J.; Eberwein, J.D.; Walters, D. An Investment Framework for Nutrition; World Bank: Washington, DC, USA, 2016. [Google Scholar]
  12. Hasan, M.; Tahosin, M.S.; Farjana, A.; Sheakh, M.A.; Hasan, M.M. A Harmful Disorder: Predictive and Comparative Analysis for fetal Anemia Disease by Using Different Machine Learning Approaches. In Proceedings of the 2023 11th International Symposium on Digital Forensics and Security (ISDFS), Chattanooga, TN, USA, 11–12 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
  13. El-Shafie, A.M.; Kasemy, Z.A.; Omar, Z.A.; Alkalash, S.H.; Salama, A.A.; Mahrous, K.S.; Hewedy, S.M.; Kotb, N.M.; Abd El-Hady, H.S.; Eladawy, E.S.; et al. Prevalence of short stature and malnutrition among Egyptian primary school children and their coexistence with Anemia. Ital. J. Pediatr. 2020, 46, 91. [Google Scholar] [CrossRef]
  14. Roberts, D.J.; Matthews, G.; Snow, R.W.; Zewotir, T.; Sartorius, B. Investigating the spatial variation and risk factors of childhood anaemia in four sub-Saharan African countries. BMC Public Health 2020, 20, 126. [Google Scholar] [CrossRef]
  15. Kassebaum, N.J. The Global Burden of Anemia. Hematol. Oncol. Clin. N. Am. 2016, 30, 247–308. [Google Scholar] [CrossRef]
  16. Gardner, W.M.; Razo, C.; McHugh, T.A.; Hagins, H.; Vilchis-Tella, V.M.; Hennessy, C.; Taylor, H.J.; Perumal, N.; Fuller, K.; Cercy, K.M.; et al. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: Findings from the Global Burden of Disease Study 2021. Lancet Haematol. 2023, 10, e713–e734. [Google Scholar] [CrossRef]
  17. Sarna, A.; Porwal, A.; Ramesh, S.; Agrawal, P.K.; Acharya, R.; Johnston, R.; Khan, N.; Sachdev, H.P.S.; Nair, K.M.; Ramakrishnan, L.; et al. Characterisation of the types of anaemia prevalent among children and adolescents aged 1–19 years in India: A population-based study. Lancet Child. Adolesc. Health 2020, 4, 515–525. [Google Scholar] [CrossRef]
  18. Mou, J.; Zhou, H.; Feng, Z.; Huang, S.; Wang, Z.; Zhang, C.; Wang, Y. A Case-Control Study of the Factors Associated with Anemia in Chinese Children Aged 3–7 years Old. Anemia 2023, 2023, 1–7. [Google Scholar] [CrossRef] [PubMed]
  19. Institute for Health Metrics and Evaluation. The Lancet: New Study Reveals Global Anemia Cases Remain Persistently High Among Women and Children. Anemia Rates Decline for Men. 31 July 2023. Available online: https://www.healthdata.org/news-events/newsroom/news-releases/lancet-new-study-reveals-global-anemia-cases-remain-persistently (accessed on 4 July 2025).
  20. Gaston, R.T.; Habyarimana, F.; Ramroop, S. Joint modelling of anaemia and stunting in children less than five years of age in Lesotho: A cross-sectional case study. BMC Public Health 2022, 22, 285. [Google Scholar] [CrossRef] [PubMed]
  21. Aliyo, A.; Jibril, A. Anemia and Associated Factors Among Under Five Year Old Children Who Attended Bule Hora General Hospital in West Guji zone, Southern Ethiopia. J. Blood Med. 2022, 13, 395–406. [Google Scholar] [CrossRef] [PubMed]
  22. Athman, L.P.; Jonathan, A.; Musa, F.; Kipasika, H.J.; Mahawi, I.; Urio, F.; Ally, M.; Mutagonda, R.; Chirande, L.; Makani, J.; et al. Clinical depression prevalence and associated factors among adolescents with sickle cell anemia in dar es salaam, tanzania: A cross-sectional study. BMC Pediatr. 2025, 25, 10. [Google Scholar] [CrossRef]
  23. Asgedom, Y.S.; Habte, A.; Woldegeorgis, B.Z.; Koyira, M.M.; Kedida, B.D.; Fente, B.M.; Gebrekidan, A.Y.; Kassie, G.A. The prevalence of anemia and the factors associated with its severity among children aged 6–59 months in Ghana: A multi-level ordinal logistic regression. PLoS ONE 2024, 19, e0315232. [Google Scholar] [CrossRef]
  24. Kim, Y.; Choi, Y.; Kim, C.; Seo, E.; Kang, Y. Risk factors of anemia among children aged 6-59 months in Madagascar. Afr. J. Food Agric. Nutr. Dev. 2024, 24, 24611–24655. [Google Scholar] [CrossRef]
  25. Tesfaye, S.H.; Seboka, B.T.; Sisay, D. Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016. PLoS ONE 2024, 19, e0300172. [Google Scholar] [CrossRef]
  26. Sunuwar, D.R.; Singh, D.R.; Pradhan, P.M.S.; Shrestha, V.; Rai, P.; Shah, S.K.; Adhikari, B. Factors associated with anemia among children in South and Southeast Asia: A multilevel analysis. BMC Public Health 2023, 23, 343. [Google Scholar] [CrossRef]
  27. Gebreegziabher, T.; Sidibe, S. Prevalence and contributing factors of anaemia among children aged 6–24 months and 25–59 months in Mali. J. Nutr. Sci. 2023, 12, e112. [Google Scholar] [CrossRef]
  28. Ampofo, G.D.; Osarfo, J.; Okyere, D.D.; Kouevidjin, E.; Aberese-Ako, M.; Tagbor, H. Malaria and anaemia prevalence and associated factors among pregnant women initiating antenatal care in two regions in Ghana: An analytical cross-sectional study. BMC Pregnancy Childbirth 2025, 25, 617. [Google Scholar]
  29. Zimlich, R. Understanding Anemia in Kids. Healthline, 27 July 2022. [Google Scholar]
  30. Liu, Y.; Ren, W.; Wang, S.; Xiang, M.; Zhang, S.; Zhang, F. Global burden of anemia and cause among children under five years 1990–2019: Findings from the global burden of disease study 2019. Front. Nutr. 2024, 11, 1474664. [Google Scholar] [CrossRef]
  31. Souza, J.P.; Day, L.T.; Rezende-Gomes, A.C.; Zhang, J.; Mori, R.; Baguiya, A.; Jayaratne, K.; Osoti, A.; Vogel, J.P.; Campbell, O.; et al. A global analysis of the determinants of maternal health and transitions in maternal mortality. Lancet Glob. Health 2024, 12, e306–e316. [Google Scholar] [CrossRef] [PubMed]
  32. Martinez-Torres, V.; Torres, N.; Davis, J.A.; Corrales-Medina, F.F. Anemia and Associated Risk Factors in Pediatric Patients. Pediatr. Health Med. Ther. 2023, 14, 267–280. [Google Scholar] [CrossRef] [PubMed]
  33. Ahsan, M.M.; Luna, S.A.; Siddique, Z. Machine-Learning-Based Disease Diagnosis: A Comprehensive Review. Healthcare 2022, 10, 541. [Google Scholar] [CrossRef] [PubMed]
  34. Sidey-Gibbons, J.A.M.; Sidey-Gibbons, C.J. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol. 2019, 19, 64. [Google Scholar] [CrossRef]
  35. Rajula, H.S.R.; Verlato, G.; Manchia, M.; Antonucci, N.; Fanos, V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Medicina 2020, 56, 455. [Google Scholar] [CrossRef]
  36. Ley, C.; Martin, R.K.; Pareek, A.; Groll, A.; Seil, R.; Tischer, T. Machine learning and conventional statistics: Making sense of the differences. Knee Surg. Sports Traumatol. Arthrosc. 2022, 30, 753–757. [Google Scholar] [CrossRef]
  37. Ghana Statistical Service (GSS) and ICF. Ghana Demographic and Health Survey 2022: Key Indicators Report; GSS and ICF: Accra, Ghana; Rockville, MD, USA, 2023. [Google Scholar]
  38. Grobbee, D.E.; Hoes, A.W. Clinical Epidemiology: Principles, Methods, and Applications for Clinical Research, 2nd ed.; Jones & Bartlett Learning: Burlington, MA, USA, 2025. [Google Scholar]
  39. Siddiqa, M.; Shah, G.H.; Mayo-Gamble, T.L.; Zubair, A. Determinants of Child Stunting, Wasting, and Underweight: Evidence from 2017 to 2018 Pakistan Demographic and Health Survey. J. Nutr. Metab. 2023, 2023, 1–12. [Google Scholar] [CrossRef]
  40. Ismail, A.; Fatima, F. Maternal Nutritional Knowledge and its Association with Iron Deficiency Anemia in Children. Nurture 2017, 11, 16–20. [Google Scholar] [CrossRef]
  41. Alhaija, R.A.; Hasab, A.A.H.; El-Nimr, N.A.; Tayel, D.I. Impact of educational intervention on mothers of infants with iron-deficiency anemia. Health Educ. Res. 2024, 39, 254–261. [Google Scholar] [CrossRef]
  42. Al-Suhiemat, A.A.; Shudifat, R.M.; Obeidat, H. Maternal Level of Education and Nutritional Practices Regarding Iron Deficiency Anemia Among Preschoolers in Jordan. J. Pediatr. Nurs. 2020, 55, e313–e319. [Google Scholar] [CrossRef] [PubMed]
  43. Bonsu, E.O.; Addo, I.Y.; Boadi, C.; Boadu, E.F.; Okeke, S.R. Determinants of iron-rich food deficiency among children under 5 years in sub-Saharan Africa: A comprehensive analysis of Demographic and Health Surveys. BMJ Open 2024, 14, e079856. [Google Scholar] [CrossRef]
  44. Bamboro, S.A.; Boba, H.I.; Geberetsadik, M.K.; Gebru, Z.; Gutema, B.T. Prevalence of anemia and its associated factors among under-five children living in Arba Minch Health and Demographic Surveillance System Sites (HDSS), Southern Ethiopia. PLOS Global Public. Health 2024, 4, e0003830. [Google Scholar] [CrossRef]
  45. Rima, F.S.; Kundu, S.; Tarannum, S.; Jannatul, T.; Bin Sharif , A. Spatial variations and determinants of vitamin A and iron rich food consumption among Bangladeshi children aged 6–23 months. Sci. Rep. 2025, 15, 17881. [Google Scholar]
  46. Nicholson, W.K.; Silverstein, M.; Wong, J.B.; Chelmow, D.; Coker, T.R.; Davis, E.M.; Jaén, C.R.; Krousel-Wood, M.; Lee, S.; Li, L.; et al. Screening and Supplementation for Iron Deficiency and Iron Deficiency Anemia During Pregnancy. JAMA 2024, 332, 906–913. [Google Scholar] [CrossRef] [PubMed]
  47. Baker, R.D.; Greer, F.R. Diagnosis and Prevention of Iron Deficiency and Iron-Deficiency Anemia in Infants and Young Children (0–3 Years of Age). Pediatrics 2010, 126, 1040–1050. [Google Scholar] [CrossRef]
  48. Zemariam, A.B.; Yimer, A.; Abebe, G.K.; Wondie, W.T.; Abate, B.B.; Alamaw, A.W.; Yilak, G.; Melaku, T.M.; Ngusie, H.S. Employing supervised machine learning algorithms for classification and prediction of anemia among youth girls in Ethiopia. Sci. Rep. 2024, 14, 9080. [Google Scholar] [CrossRef] [PubMed]
  49. Dhakal, P. Prediction of Anemia using Machine Learning Algorithms. Int. J. Comput. Sci. Inf. Technol. 2023, 15, 15–30. [Google Scholar] [CrossRef]
  50. Khan, J.R.; Chowdhury, S.; Islam, H.; Raheem, E. Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh. J. Data Sci. 2021, 17, 195–218. [Google Scholar] [CrossRef]
  51. Zahirzada, A.; Zaheer, N.; Shahpoor, M.A. Machine Learning Algorithms to Predict Anemia in Children Under the Age of Five Years in Afghanistan: A Case of Kunduz Province. J. Surv. Fish. Sci. 2023, 10, 752–762. [Google Scholar]
  52. Appiahene, P.; Asare, J.W.; Donkoh, E.T.; Dimauro, G.; Maglietta, R. Detection of iron deficiency anemia by medical images: A comparative study of machine learning algorithms. BioData Min. 2023, 16, 2. [Google Scholar] [CrossRef]
  53. El Bilbeisi, A.H. Prevalence of nutritional anemia and its risk factors in children under five in the Gaza Strip. Front. Nutr. 2025, 12. [Google Scholar] [CrossRef]
  54. Ewusie, J.E.; Ahiadeke, C.; Beyene, J.; Hamid, J.S. Prevalence of anemia among under-5 children in the Ghanaian population: Estimates from the Ghana demographic and health survey. BMC Public Health 2014, 14, 626. [Google Scholar] [CrossRef] [PubMed]
  55. Alhassan, A.R.; Yakubu, M. Determinants of under-five anaemia in the high prevalence regions of Ghana. F1000Res 2022, 11, 724. [Google Scholar]
  56. Islam, M.A.; Afroja, S.; Khan, M.S.; Alauddin, S.; Nahar, M.T.; Talukder, A. Prevalence and Triggering Factors of Childhood Anemia: An Application of Ordinal Logistic Regression Model. Int. J. Clin. Pract. 2022, 2022, 1–12. [Google Scholar] [CrossRef] [PubMed]
  57. New Series Highlights the Importance of a Positive Postnatal Experience for All Women and Newborns; WHO: Geneva, Switzerland, 2024.
  58. Gotine, A.R.E.M.; Xavier, S.P.; Vasco, M.D.; Alfane, N.W.A.; Victor, A. Prevalence and predictors of anemia in children under 5 years of age in sub-Saharan Africa: A systematic review and meta-analysis. medRxiv 2024. [Google Scholar] [CrossRef]
  59. Gemechu, K.; Asmerom, H.; Sileshi, B.; Belete, R.; Ayele, F.; Nigussie, K.; Bete, T.; Negash, A.; Sertsu, A.; Mekonnen, S.; et al. Anemia and associated factors among under-five children attending public Hospitals in Harari Regional State, eastern Ethiopia: A cross-sectional study. Medicine 2024, 103, e38217. [Google Scholar] [CrossRef]
  60. Sorsa, A.; Habtamu, A.; Kaso, M. Prevalence and Predictors of Anemia Among Children Aged 6–23 Months in Dodota District, Southeast Ethiopia: A Community-Based Cross-Sectional Study. Pediatric Health Med. Ther. 2021, 12, 177–187. [Google Scholar] [CrossRef]
Figure 1. Receiver operating curve (ROC) of the logistic regression model.
Figure 1. Receiver operating curve (ROC) of the logistic regression model.
Children 12 00924 g001
Figure 2. Receiver operating curves (ROCs) for each of the four ML models.
Figure 2. Receiver operating curves (ROCs) for each of the four ML models.
Children 12 00924 g002
Figure 3. Calibration plots for each of the four ML models. (a) Calibration plot for random forest; (b) calibration plot for logistic regression; (c) calibration plot for KNN; and (d) calibration plot for decision tree.
Figure 3. Calibration plots for each of the four ML models. (a) Calibration plot for random forest; (b) calibration plot for logistic regression; (c) calibration plot for KNN; and (d) calibration plot for decision tree.
Children 12 00924 g003
Figure 4. (a) Important features from the random forest algorithm. (b) Important features from the logistic regression algorithm.
Figure 4. (a) Important features from the random forest algorithm. (b) Important features from the logistic regression algorithm.
Children 12 00924 g004
Figure 5. SHAP summary plot on the impact of independent variables on the random forest model’s predictive ability.
Figure 5. SHAP summary plot on the impact of independent variables on the random forest model’s predictive ability.
Children 12 00924 g005
Figure 6. Partial dependence plots of the random forest model’s top features (a) Partial dependence plot of the region (b) partial dependence plot of child age in months (c) partial dependence plot of maternal age (d) partial dependence plot of the size of child at birth (e) partial dependence plot of the father’s education (f) partial dependence plot of the mother’s education (g) partial dependence plot of birth order number and (h) partial dependence plot of the initiation of breastfeeding.
Figure 6. Partial dependence plots of the random forest model’s top features (a) Partial dependence plot of the region (b) partial dependence plot of child age in months (c) partial dependence plot of maternal age (d) partial dependence plot of the size of child at birth (e) partial dependence plot of the father’s education (f) partial dependence plot of the mother’s education (g) partial dependence plot of birth order number and (h) partial dependence plot of the initiation of breastfeeding.
Children 12 00924 g006aChildren 12 00924 g006b
Table 1. Summary statistics of risk factors and univariate analysis of anemia status for children of ages 5 years or below, GDHS-2022, (n = 9353).
Table 1. Summary statistics of risk factors and univariate analysis of anemia status for children of ages 5 years or below, GDHS-2022, (n = 9353).
Sr. NoAttributesCategoriesfreq.Percentage %OR (C-I)p Value
Societal Characteristics 
1RegionWestern *4544.85- 
Central5105.450.002 (−0.001–0.383)0.991
Greater Accra4554.860.369 (0.170–0.530)0.070
Volta3834.090.252 (−0.156–0.661)0.227
Eastern4364.660.309 (−0.212–0.692)0.132
Ashanti5926.330.241 (−0.612–0.329)0.202
Western North4344.640.002 (0.001–0.415)0.991
Ahafo4975.310.398 (−0.788–0.728)0.045
Bono4274.570.338 (0.149–0.471)0.106
Bono East6597.050.212 (−0.154–0.579)0.256
Oti6326.760.522 (0.146–0.898)0.006
Northern97010.371.996 (1.646–2.345)<0.001
Savannah7978.520.639 (0.284–0.995)<0.001
North East8689.280.835 (0.4804- 1.190)<0.001
Upper East6386.820.770 (0.393–1.1476)<0.001
Upper West6016.430.803 (0.432–1.174)<0.001
2Household members<4 *2983.19- 
4–61171.251.149 (0.604–2.187)0.671
7–92332.491.238 (0.731–2.097)0.425
>9870593.071.041 (0.729–1.487) 0.823
3Place of residenceUrban *385741.24- 
Rural549658.761.516 (1.335–1.722) <0.001
4Source of drinking waterUnimproved *386241.29- 
Improved549158.711.191 (1.049–1.352) 0.176
5Sex of household headMale *688073.56- 
Female247326.440.831 (0.722–0.956)0.010
6Socioeconomic statusPoor *530856.75- 
Middle168117.970.746 (0.629–0.885) 0.001
Rich236425.280.449 (0.386–0.522) <0.001
Parental Characteristics 
7Mother’s educationNo education *291731.19- 
Primary149615.990.918 (0.752–1.120)0.402
Secondary423745.30.529 (0.457–0.614)<0.001
Higher7037.520.400 (0.309–0.519)<0.001
8Father’s educationNo education *271533.51- 
Primary86810.710.862 (0.676–1.100) 0.233
Secondary341942.20.543 (0.464–0.636)<0.001
Higher109913.570.485 (0.390–0.603)<0.001
9Maternal age15–19 *3533.77- 
20–24167117.870.810 (0.540–1.217)0.312
25–29227624.330.657 (0.442–0.976)0.038
30–34226624.230.643 (0.433–0.953)0.028
35–39171318.310.541 (0.362–0.809)0.003
40–448198.760.548 (0.358–0.839)0.006
45–492552.730.780 (0.460–1.321)0.357
10Maternal smokingNo *928799.29- 
Yes660.711.293 (0.626–2.671)0.487
11Breastfeed everNo *419544.85- 
Yes515855.151.754 (1.546–1.991)<0.001
12Initiation of breastfeedingImmediately *366763.84- 
Within the first hour180531.420.926 (0.778–1.104)0.395
Within 1 day2724.740.988 (0.675–1.445)0.951
13Mother's occupationNot working *153716.43- 
Working781683.570.756 (0.632–0.904)0.002
14Intake of iron during pregnancyNo *4698.99- 
Yes 474991.010.575 (0.410–0.807)0.001
15Consumption of drugs for intestinal parasites
during pregnancy
No *238245.65- 
Yes283654.350.680 (0.568–0.814)<0.001
Child Characteristics 
16Birth order number1st born *238625.51- 
2–4478251.131.164 (0.999–1.355)0.051
>5218523.361.476 (0.999–1.355)<0.001
17Birth typeSingle birth *890795.23- 
Multiple births4464.770.873 (0.637–1.196)0.398
18Sex of childMale *480451.36- 
Female454948.640.854 (0.753–0.968)0.014
19 Size of child at birthSmall *79213.73- 
Average232940.380.957 (0.737–1.243)0.747
Large264745.890.912 (0.705–1.179)0.484
20Formula milk consumptionNo *512390.5- 
Yes5389.50.771 (0.589–1.009)0.058
21Child's age in months0–6 *60913.61- 
7–1249010.951.035 (0.649–1.650)0.884
13–24101722.731.032 (0.660–1.612)0.889
25–3684518.890.571 (0.365–0.893)0.014
37–4883418.640.473 (0.302–0.740)
0.001
49–6067915.180.342 (0.217–0.539)<0.001
22StuntingNo *1994.45- 
Moderate61813.830.700 (0.481–1.017)0.062
Severe365381.721.423 (0.301–1.594)<0.001
23UnderweightNo *1072.39- 
Moderate47710.670.958 (0.595–1.544)0.863
Severe388686.940.540 (0.348–0.839)0.636
24WastingNo *481.07- 
Moderate2134.770.873 (0.439–1.736)0.699
Severe420994.160.777 (0.413–1.460)0.434
25Intake of fruits and
vegetables
No *308762.29- 
Yes186937.710.268 (0.055–1.523)0.011
26Baby postnatal checkup within 2 monthsNo * 2225.04- 
Yes 418394.960.572 (0.358–0.915)0.020
27Given zincNo *74761.89- 
Yes46038.111.219 (0.855–1.739)0.273
Note: * Indicates the reference category, OR indicates odds ratio, significant p values are shown in bold, C-I indicates confidence interval, freq. indicates the frequency, and % indicates the percentage.
Table 2. Binary logistic regression analysis of anemia status for children 5 years or below, GDHS-2022, (n = 9353).
Table 2. Binary logistic regression analysis of anemia status for children 5 years or below, GDHS-2022, (n = 9353).
Sr No.AttributesCategoriesAOR (C-I)p Value
Societal Characteristics 
1RegionWestern *- 
  Central0.363 (−2.044–2.772)0.767
Greater Accra−2.485 (−5.557–0.587)0.113
Volta1.690 (−1.203–4.584)0.252
Eastern−2.033 (−4.890–0.823)0.163
Ashanti−1.042 (−3.341–1.256)0.374
Western North−2.099 (−4.950–0.751)0.149
Ahafo−0.717 (−3.111–1.676)0.557
Bono−1.729 (−4.769–1.309)0.265
Bono East−0.610 (−3.009–1.787)0.618
Oti−0.392 (−2.806–2.021)0.750
Northern0.364 (−1.873–2.601)0.750
Savannah1.446 (−1.176–4.069)0.280
North East0.678 (−1.680–3.037)0.573
Upper East1.996 (−1.180–5.173)0.218
Upper West2.016 (−1.217–5.250)0.222
2Household members<4 *- 
4–60.143 (0.103–3.420)0.432
7–90.654 (0.521–2.543)0.596
>90.342 (0.832–5.321)0.104
3Place of residenceUrban *- 
Rural1.791 (0.247–2.951)0.564
4Source of drinking waterUnimproved *- 
Improved2.312 (0.272–9.613)0.442
5Sex of household headMale *- 
Female0.457 (0.078–2.672)0.385
6Socioeconomic statusPoor *- 
Middle0.010 (0.0001–0.568)0.025
Rich0.041 (0.001–1.077)0.046
Parental Characteristics 
7Mother’s educationNo education *- 
Primary0.040 (0.0009–1.608)0.088
Secondary0.068 (0.003–1.173)0.064
Higher0.002 (0.0009–0.529)0.029
8Father’s educationNo education *- 
Primary4.945 (0.850–7.058)0.062
Secondary0.652 (0.288–2.126)0.009
Higher0.156 (3.582–5.457)0.012
9Maternal age15–19 *- 
20–249.766 (0.396–9.946)0.163
25–296.270 (0.255–4.070)0.261
30–342.134 (0.083–5.762)0.647
35–390.042 (0.0004–4.554)0.186
40–440.793 (0.009–6.967)0.919
45–49 0.486 (0.765–3.210)0.659
10Maternal smokingNo *- 
Yes0.672 (0.383–1.895)0.085
11Breastfeed everNo *- 
Yes3.586 (0.228–6.299)0.363
12Initiation of breastfeedingImmediately *- 
Within the first hour3.445 (1.540–7.313)0.119
Within 1 day4.071 (0.104–5.180)0.065
13Mother occupationNot working *- 
Working0.673 (0.056–8.018)0.755
14Intake of iron during pregnancyNo *- 
Yes0.017 (0.004–6.122)0.033
15Consumption of drugs for intestinal parasites
during pregnancy
No *- 
Yes0.761 (0.122–4.715)0.769
Child Characteristics 
16Birth order number1st born *- 
2–40.132 (0.009–1.943)0.140
>50.434 (0.014–3.224)0.632
17Birth typeSingle birth *- 
Multiple births7.641 (0.009–8.456)0.552
18Sex of childMale *- 
Female0.877 (0.170–4.520)0.876
19Size of child at birthSmall *- 
Average2.760 (0.397–4.549)0.150
Large6.603 (0.330–7.788)0.217
20Formula milk consumptionNo *- 
Yes2.845 (0.073–3.518)0.575
21Child age in months0–6 *- 
7–120.096 (0.0011–8.606)0.308
13–240.053 (0.0003–8.935)0.262
25–361.024 (0.780–1.317)0.929
37–480.704 (0.472–1.0464)0.087
49–600.710 (0.435–1.034)0.074
22StuntingNo *- 
Moderate3.063 (1.106–4.611)0.146
Severe0.747 (0.851–3.825)0.758
23UnderweightNo *- 
Moderate1.224 (0.048–3.627)0.902
Severe0.969 (0.995–2.714)0.840
24WastingNo *- 
Moderate0.067 (0.002–1.920)0.114
Severe1.692 (0.383–2.012)0.555
25Intake of fruits and
Vegetables
No *- 
Yes4.755 (0.639–5.364)0.128
26Baby postnatal checkup within 2 monthsNo *- 
Yes0.732 (3.452–8.076)0.010
27Given zincNo *- 
Yes0.505 (0.081–3.130)0.521
Note: * Indicates the reference category, significant p values are shown in bold, C-I indicates confidence interval, and AOR indicates adjusted odds ratio.
Table 3. Performance indicators of all four machine learning algorithms, evaluated on training data for the prediction of childhood anemia.
Table 3. Performance indicators of all four machine learning algorithms, evaluated on training data for the prediction of childhood anemia.
Evaluation ParametersRandom ForestDecision TreeLogistic RegressionK-Nearest Neighbor
Confusion matrixPredictedPredictedPredictedPredicted
  No anemiaAnemia No anemiaAnemia No anemiaAnemia No anemiaAnemia
No Anemia5071829No anemia17161169No anemia16071260No anemia1216863
Anemia18882330Anemia11432520Anemia8772803Anemia16512817
% (95% CI)% (95% CI)% (95% CI)% (95% CI)
Accuracy94.74 (90.58–95.84)64.69 (63.51–65.84)67.35 (66.20–68.49)61.60 (60.41–62.78)
Sensitivity82.50 (80.47–86.85)68.79 (67.27–70.29)76.16 (74.76–77.54)63.04 (61.61–64.47)
Specificity50.78 (48.54–58.92)59.48 (57.66–61.28)56.05 (54.21–57.88)58.48 (56.34–60.62)
Positive predictive value75.23 (73.65–76.24)68.31 (66.78–69.810)68.98 (67.54–70.41)76.54 (75.15–77.91)
Negative predictive value56.81 (51.31–58.81)60.02 (58.2–61.82)64.69 (62.78–66.58)42.41 (40.6–44.25)
AUC86.62 (80.6–88.86)64.16 (63.61–65.34)72.47 (71.26–73.7)59.48 (58.35–60.62)
F1 scores96.9457.3667.8251.03
Performance time1.5913 s1.1774 s 1.1448 s1.81 s
Table 4. Performance indicators of all four machine learning algorithms, evaluated on test data for the prediction of childhood anemia.
Table 4. Performance indicators of all four machine learning algorithms, evaluated on test data for the prediction of childhood anemia.
Evaluation ParametersRandom ForestDecision TreeLogistic RegressionK-nearest Neighbor
Confusion matrixPredictedPredictedPredictedPredicted
  No anemiaAnemia No anemiaAnemia No anemiaAnemia No anemiaAnemia
No Anemia280366No anemia17161169No anemia16071260No anemia1216863
Anemia7721387Anemia11432520Anemia8772803Anemia16512817
% (95% CI)% (95% CI)% (95% CI)% (95% CI)
Accuracy95.75 (99.48–99.89)64.69 (63.51–65.84)67.35 (66.20–68.49)61.60 (60.41–62.78)
Sensitivity98.74 (99.35–99.93)68.79 (67.27–70.29)76.16 (74.76–77.54)63.04 (61.61–64.47)
Specificity67.89 (99.29–99.95)59.48 (57.66–61.28)56.05 (54.21–57.88)58.48 (56.34–60.62)
Positive predictive value88.81 (99.45–99.96)68.31 (66.78–69.810)68.98 (67.54–70.41)76.54 (75.15–77.91)
Negative predictive value80.67 (99.17–99.91)60.02 (58.2–61.82)64.69 (62.78–66.58)42.41 (40.6–44.25)
AUC98.34 (99.55–99.93)64.16 (63.21–65.34)72.47 (71.26–73.7)59.48 (58.35–60.62)
F1 score98.2059.5668.2754.48
Performance time0.06 s0.75 s0.08 s0.2 s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Siddiqa, M.; Shah, G.; Butt, M.S.; Kamal, A.; Opoku, S.T. Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques. Children 2025, 12, 924. https://doi.org/10.3390/children12070924

AMA Style

Siddiqa M, Shah G, Butt MS, Kamal A, Opoku ST. Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques. Children. 2025; 12(7):924. https://doi.org/10.3390/children12070924

Chicago/Turabian Style

Siddiqa, Maryam, Gulzar Shah, Mahnoor Shahid Butt, Asifa Kamal, and Samuel T. Opoku. 2025. "Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques" Children 12, no. 7: 924. https://doi.org/10.3390/children12070924

APA Style

Siddiqa, M., Shah, G., Butt, M. S., Kamal, A., & Opoku, S. T. (2025). Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques. Children, 12(7), 924. https://doi.org/10.3390/children12070924

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop