Next Article in Journal
Case Study on the Adaptive Assessment of Floods Caused by Climate Change in Coastal Areas of the Republic of Korea
Previous Article in Journal
Modeling the Impact of Wind Drag Coefficient on Wind-Driven Currents in Lake Taihu, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analyzing Unimproved Drinking Water Sources and Their Determinants Using Supervised Machine Learning: Evidence from the Somaliland Demographic Health Survey 2020

by
Hibak M. Ismail
1,
Abdisalam Hassan Muse
1,
Mukhtar Abdi Hassan
1,2,
Yahye Hassan Muse
1,2 and
Saralees Nadarajah
3,*
1
Faculty of Science and Humanities, School of Postgraduate Studies and Research (SPGSR), Amoud University, Borama 25263, Somalia
2
School of Graduate Studies, University of Hargeisa, Hargeisa 25263, Somalia
3
Department of Mathematics, University of Manchester, Manchester M13 9PL, UK
*
Author to whom correspondence should be addressed.
Water 2024, 16(20), 2986; https://doi.org/10.3390/w16202986
Submission received: 22 September 2024 / Revised: 16 October 2024 / Accepted: 17 October 2024 / Published: 19 October 2024

Abstract

:
Access to clean and safe drinking water is a fundamental human right. Despite global efforts, including the UN’s “Water for Life” program, a significant portion of the population in developing countries, including Somaliland, continues to rely on unimproved water sources. These unimproved sources contribute to poor health outcomes, particularly for children. This study aimed to investigate the factors associated with the use of unimproved drinking water sources in Somaliland by employing supervised machine learning models to predict patterns and determinants based on data from the 2020 Somaliland Demographic and Health Survey (SHDS). Secondary data from SHDS 2020 were used, encompassing 8384 households across Somaliland. A multilevel logistic regression model was applied to analyze the individual- and community-level factors influencing the use of unimproved water sources. In addition, machine learning models, including logistic regression, decision tree, random forest, support vector machine (SVM), and K-nearest neighbor (KNN), were compared in terms of accuracy, sensitivity, specificity, and other metrics using cross-validation techniques. This study uses supervised machine learning models to analyze unimproved drinking water sources in Somaliland, providing data-driven insights into the complex determinants of water access. This enhances predictive accuracy and informs targeted interventions, offering a robust framework for addressing water-related public health issues in Somaliland. The analysis identified key determinants of unimproved water source usage, including socioeconomic status, education, region, and household characteristics. The random forest model performed the best with an accuracy of 93.57% and an area under the curve (AUC) score of 98%. Decision tree and KNN also exhibited strong performance, while SVM had the lowest predictive accuracy. This study highlights the role of socioeconomic and community factors in determining access to clean drinking water in Somali Land. Factors such as age, education, gender, household wealth, media access, urban or rural residence, poverty level, and literacy level significantly influenced access. Local policies and resource availability also contribute to variations in access. These findings suggest that targeted interventions aimed at improving education, infrastructure, and community water management practices can significantly reduce reliance on unimproved water sources and improve the overall public health.

1. Introduction

Access to clean and safe water, along with adequate sanitary facilities, is a fundamental human right and is one of the most basic human needs. However, the United Nations reports that 2.4 billion people lack access to proper sanitation facilities, and over 1.1 billion lack access to safe drinking water [1]. In 2010, the UN officially recognized the right to access safe drinking water as a fundamental human right, further underscoring its importance. One of the primary goals of the UN’s “Water for Life” program, introduced in 2005, was to reduce the number of people who lacked access to clean water by half.
Improved drinking water sources are defined as those protected from contamination, particularly from fecal matter. These sources include household connections, public standpipes, boreholes, protected dug wells, protected springs, and the water provided by tanker trucks. According to the Joint Monitoring Program [2], drinking water services are considered “improved” if the collection time for water is no more than 30 min for a round trip, including queuing. Approximately 10% of the global disease burden can be prevented by improving access to clean drinking water. In developing countries, inadequate access to improved drinking water contributes to 30% of deaths among children under five years old [3].
Despite Ethiopia’s abundant water sources, factors such as uneven distribution, pollution, rapid population growth, urbanization, and climate change hinder its ability to provide accessible and improved drinking water services [4]. Similar challenges exist in Somaliland, where poverty, a lack of infrastructure, limited government resources, inadequate sanitation facilities, population growth, and sociocultural practices exacerbate this issue.
Thus, this study aimed to thoroughly examine the issue of unimproved drinking water sources and factors related to them beyond their availability. By exploring the individual and community levels, we will produce evidence-based recommendations for interventions and policies that support equitable and sustainable access to clean drinking water.
Understanding the factors associated with the use of unimproved water sources is crucial for developing targeted interventions and policies for improving access to safe drinking water. Previous studies in Somaliland have indicated that individual and contextual factors influence the utilization of unimproved water sources. Individual-level factors included socioeconomic status, education level, awareness of health risks, and hygiene practices. In contrast, contextual factors may encompass community-level infrastructure, water management practices, and geographical locations [5]. However, there is still a need for a comprehensive and localized investigation that focuses specifically on Somaliland to address the unique challenges and factors contributing to the use of unimproved water sources. By conducting spatial and multilevel analyses, this study aimed to determine the complex interplay of individual and contextual factors associated with the use of unimproved water sources in Somaliland [5].
This study aimed to determine the factors contributing to the use of unimproved water, contribute to efforts to reduce the reliance on unimproved water sources in Somaliland, and ensure that all individuals have access to safe and clean drinking water. By addressing the associated factors through evidence-based interventions, this study aimed to improve the overall health and well-being of the population and foster sustainable development goals, as well as investigate the problem of unimproved drinking water sources and identify the associated factors that contribute to this issue.
The significance of this study extends far beyond academic walls, resonating with the lives of countless citizens and impacting various stakeholders. Its benefits cascade outwards, empowering communities, guiding policy decisions, and fostering sustainable solutions. For instance, government agencies and policymakers stand to benefit from this study, as they provide valuable insights for evidence-based policies and strategies to enhance access to safe drinking water. NGOs and development partners also had a stake in the outcomes of the study. However, these findings can inform decision-making processes and guide resource allocation to address water-related issues in Somaliland.
This study aimed to address the lack of research on the factors influencing unimproved drinking water sources in Somaliland. It uses a comprehensive approach to integrate individual- and community-level factors utilizing spatial and multilevel analyses. The aim was to uncover complex interactions among these determinants, providing culturally relevant and evidence-based recommendations for interventions and policies aimed at improving access to safe drinking water in Somaliland.

2. Review of the Related Literature

The utilization of unimproved water sources is influenced by various factors, including the household head, the education levels of both husbands and wives, and the employment status of women. The household head has decision-making power and is aware of the importance of using safe water sources. If the household head is knowledgeable about the risks associated with unimproved water sources, they are more likely to prioritize accessing safe water for their families. Gender dynamics also play a role in determining the utilization of unimproved water sources. Male household heads often have more authority and control over resources, which can influence whether a household opts for improved or unimproved water sources [4].
At the community level, the utilization of unimproved water sources is influenced by several factors. First, the region where the community is located plays a significant role. Some regions may have limited access to improved water sources, thus forcing communities to rely on unimproved sources. This could be because of factors such as geographical challenges, the scarcity of water resources, or inadequate infrastructure. The place of residence is another important factor. Rural areas tend to rely more on unimproved water sources than urban areas. This is often due to the distance from improved water sources and the lack of developed infrastructure, making it difficult for rural communities to access safe drinking water [6].
The issue of unimproved drinking water sources is a critical public health concern, particularly in low- and middle-income countries. The literature highlights various determinants that influence access to improved drinking water, including socioeconomic status, education, and geographical factors. For instance, ref. [7] found that households in Ethiopia with lower socioeconomic status were more likely to rely on unimproved water sources. This finding is supported by [8], who emphasized the spatial clustering of unimproved water sources in rural areas, suggesting that socioeconomic disparities significantly affect access to safe drinking water. Moreover, the role of education in determining water source quality is significant. Ref. [9] reported that this correlation suggests that increased awareness of the health risks associated with unimproved water sources can lead to better water management practices.
Similarly, Ref. [10] reported that maternal undernutrition was associated with drinking water from unimproved sources, indicating that education and awareness of water quality can have broader implications for health outcomes. The health implications of using unimproved water sources have been well documented. For example, studies have shown that children from households using unimproved water sources are at a higher risk of morbidity, particularly from diarrheal diseases [11,12,13].
Furthermore, Ref. [14] conducted a systematic review that confirmed widespread fecal contamination of drinking water in low-income settings, underscoring the public health risks associated with unimproved sources. In addition to health outcomes, the environmental context and water quality also play essential roles. Ref. [1] pointed out that unimproved water sources are often contaminated by both anthropogenic and natural factors, leading to diseases such as cholera and typhoid. This was echoed by [15], who estimated the risk of typhoid fever associated with unimproved water sources, reinforcing the need for interventions that address both water quality and access.
According to [16], households in rural Somaliland have limited access to safe drinking water due to the scarcity of water infrastructure and the high costs of transporting water. This reliance on unimproved water sources poses serious health risks, particularly for children and other vulnerable groups.
Machine learning techniques have emerged as valuable tools for analyzing the determinants of unimproved drinking water sources. By leveraging demographic health survey data, researchers can identify patterns and predict areas at risk of using unimproved sources. For instance, ref. [17] utilized geospatial analysis to assess water supply access, demonstrating the potential of machine learning in public health research. This approach can facilitate targeted interventions in regions identified as hotspots for unimproved water sources, thereby improving health outcomes.
In conclusion, the literature indicates that unimproved drinking water sources are influenced by a complex interplay between socioeconomic, educational, and environmental factors. The health risks associated with these sources are significant, particularly for vulnerable populations such as children. Machine learning offers promising avenues for further research and intervention strategies aimed at improving access to safe drinking water.

3. Methodology

3.1. Data Source

A secondary data source from the Somaliland Demographic and Health Survey (SHDS) dataset was employed in this investigation. Between 20 March 2020 and May 2020, the SHDS gathered the data. Access to the SHDS dataset was also made possible following the submission of a request and the receipt of a permission letter. The SHDS dataset offers extensive health and demographic data on the Somaliland population. This demonstrates that at least one-third of Somaliland’s population depends on unimproved water sources, which are not equally dispersed throughout the nation. It is extremely difficult to guarantee that every resident has access to clean and safe drinking water because of this unequal distribution.

3.2. Study Variables

3.2.1. Outcome Variable

The outcome variable in this study was the use of unimproved drinking water sources. The drinking water source was classified as unimproved if a household received water from an unprotected dug well, an unprotected spring, surface water, etc.

3.2.2. Predictor Variables

The independent variables were divided into two levels: individual and community variables. These included the age of the household head, the educational status of the household head, the gender of the household head, household wealth, television, and radio, which were some of the individual-level factors included in this study. Community-level factors included the residence, poverty level, literacy level, and region. After checking their distribution, community poverty and literacy levels were classified as high or low, and the median value was used as the cut-off point for classification because the data were not normally distributed.
The community level was classified as high if the proportion of households from the two lowest wealth quintiles in a given community was greater than the median value and low if the proportion was below the median value. Community-level literacy was the proportion of household heads in the community with at least a primary level of education, which was classified as high (the proportion of households is greater than the median national value) or low (the proportion of women is below the median national value). This study region was categorized into two categories, larger central [Marodijeh, Togdheer, Saahil, Sanaag, Awdal, and Sool] [Hargeisa, Berbera, and Borama, Buro, and Erigavo], and based on their geopolitical features, which is consistent with a prior Somaliland study.

3.3. Data Preprocessing

To ensure that it was ready for the study, secondary data from the 2020 Somaliland Health Demographic Survey (SHDS) underwent extensive preprocessing. Several steps were involved in this phase, such as addressing the missing values, cleansing the data, and changing the variables as needed. These procedures are crucial for creating reliable and accurate datasets for future studies.

3.4. Data Cleaning

Data cleansing is essential for guaranteeing the quality and integrity of the dataset. A thorough data-cleaning procedure was used in this study to identify and fix mistakes, inconsistencies, and outliers in the data. During this procedure, duplicate entries were eliminated, formatting problems were fixed, and data input errors were addressed. These procedures produced a high-quality dataset that functioned as the basis for all other analyses.

3.5. Missing Value

Amputation: Maintaining the integrity of the dataset, guaranteeing correct analysis, and handling missing data are essential. Iterative missing-value imputation was used to obtain complete data for each variable [18]. By using a strict methodology, the possibility of bias and inaccuracies resulting from missing data was reduced, thereby improving the accuracy of the findings [19]. The use of appropriate imputation techniques yielded a strong and complete dataset that was appropriate for thorough examination and interpretation.
The choice of the iterative imputation method for handling missing values in this study was driven by its ability to produce more accurate and reliable estimates than simpler techniques, such as mean or median imputation. Iterative imputation, specifically methods such as Multivariate Imputation by Chained Equations (MICE), leverages the relationships among variables in the dataset to estimate missing values iteratively. This method accounts for the uncertainty associated with missing data by treating each variable as a function of others, thereby preserving the inherent multivariate structure of the data. To validate the quality of the imputation, we employed cross-validation techniques, in which the imputed dataset was compared against a held-out subset of the original data with known values. Additionally, we assessed the distribution of imputed values against observed values to ensure consistency and minimize biases introduced by the imputation process. Using these validation strategies, we aimed to enhance the robustness of the dataset, ensuring that subsequent analyses would yield reliable and accurate insights.

3.6. Sampling and Populations

This study used secondary data from SHDS 2020 (SHDS 2020). The SHDS employs a sample of 8384 households. The sample size was strategically determined based on statistical principles to ensure the representativeness of the overall population of Somaliland. Although we cannot directly control the sample size of the secondary data, its design considerations are crucial. The substantial sample size of the SHDS 2020 (8384) suggests a high degree of confidence in the generalizability of its findings to the broader population. This strengthened the reliability of the data for our analysis, allowing us to draw meaningful conclusions regarding the population of interest.
SHDS 2020 employed a two-stage stratified cluster sampling design to select sample households. In the first stage, 20 enumeration areas (EAs) were selected from the 2018 national census with probability proportional to size (PPS). These EAs were divided into urban and rural strata to ensure representation from both urban and rural settings. The sample for SLHDS was designed to provide estimates of key indicators for the country as a whole, for each of the six geographical regions which are the country’s first-level administrative divisions, as well as separately for urban, rural, and nomadic areas. In the second stage, 30 households were systematically selected from each EA, resulting in a total of 600 households surveyed.

3.7. Data Management

Before statistical analysis, the data were cleaned and weighted using sampling weights for probability sampling and nonresponse to restore representativeness. We conducted descriptive statistics, correlation, and multilevel logistic regression analysis, assuming that each community had a different intercept and fixed coefficient and within and between community variations, with a random effect applied at the cluster level. In the multivariable model, variables with p-values less than 0.2 in the bivariate analysis were fitted, and the Adjusted Odds Ratio (AOR) with a 95% CI and p-value of 0.05 were used to declare a significant association with the outcome. The goodness of fit was checked using deviance. Variance Inflation Factor (VIF) was used to test for multicollinearity among the selected explanatory variables.
A multicollinearity test was conducted to ensure the accuracy and interpretation of the machine learning models. The Variance Inflation Factor (VIF) was calculated for each predictor variable to indicate the degree of multicollinearity. Variables with a VIF score > 10 were flagged for further investigation. To address multicollinearity, the correlation matrix was examined, and strategies, such as removing highly correlated variables or combining variables, were applied. The VIF scores were then recalculated to ensure that all the remaining predictors had VIF values below the acceptable threshold. This test ensured that independent variables did not unduly influence each other, leading to more robust and reliable model performance.

3.8. Proposed ML Models

Machine learning algorithms, such as decision trees, random forests, support vector machines (SVMs), and neural networks, are suitable for binary classification tasks when dealing with binary data, which have only two possible values (usually 0 and 1). The complexity of the problem, the size of the dataset, and the required level of performance all influence the choice of algorithm. The model was trained on binary data after the algorithm was chosen to enable it to discover underlying relationships and patterns. Machine learning models are effective tools for handling binary data and resolving challenging issues in various fields and businesses.
Machine learning models such as logistic regression, decision tree, random forest, support vector machine (SVM), and K-nearest neighbors (KNN) were selected for classification tasks. These models were chosen for their simplicity, interpretability, and ability to provide robust insights from the dataset. Logistic regression was chosen for its simplicity and interpretability in binary classification, decision tree for its intuitive structure, random forest for its accuracy and reduction of overfitting, SVM for its effectiveness in high-dimensional spaces, and KNN for its non-parametric nature. These models provide a comprehensive analysis, balancing accuracy, interpretability, and computational efficiency compared to models such as neural networks, which may require larger datasets and more computational resources. Neural networks, Gradient Boosting Machines (GBMs), and XGBoost were not selected because of their complexity, computational demands, and dataset nature. Instead, simpler models, such as logistic regression, decision tree, random forest, SVM, and KNN, were chosen because of their interpretability, ease of implementation, and proven effectiveness in similar classification tasks.

3.8.1. Logistic Regression

Logistic regression is a statistical method used for binary classification that models the probability of a binary outcome based on one or more predictor variables. It uses a logistic function to constrain the output between zero and one, allowing for the interpretation of the output as a probability [20].

3.8.2. Decision Tree

Decision tree is a flowchart-like structure used for classification and regression tasks. It splits the data into subsets based on the value of the input features, creating branches that lead to decisions and leaf nodes that represent the final output or class. Splitting is performed based on criteria such as Gini impurity or information gain to create the most homogeneous subsets possible. Decision trees are intuitive and easy to interpret, making them a popular choice in various applications [21,22].

3.8.3. Random Forest

Random forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of their predictions for classification tasks, or the mean prediction for regression tasks. It operates by creating a multitude of decision trees, each trained on a random subset of the data, which helps improve the model’s accuracy and control overfitting. The final prediction is made by aggregating the predictions of all individual trees, thus enhancing the robustness and accuracy [23].

3.8.4. Support Vector Machine (SVM)

An SVM is a supervised learning algorithm used for classification and regression tasks. It works by determining the hyperplane that best separates the data points of the different classes in a high-dimensional space. The optimal hyperplane is determined by maximizing the margin between the closest points of the classes, known as support vectors. SVM is particularly effective in high-dimensional spaces and robust against overfitting, especially in cases where the number of dimensions exceeds the number of samples [24]. It can also be adapted to handle nonlinear classification through the use of kernel functions [25].

3.8.5. K-Nearest Neighbor (KNN)

K-Nearest Neighbor (KNN) is a simple, instance-based learning algorithm used for classification and regression. It classifies a data point based on how its neighbors are classified, typically using a majority vote among the K-nearest neighbors in the feature space. A distance metric (e.g., Euclidean distance) was used to determine the closeness of instances. The KNN is non-parametric and can adapt to various data distributions; however, its performance can degrade with high-dimensional data owing to the curse of dimensionality. It is particularly useful for small datasets and easy to implement [26].
These five supervised machine-learning models provide a diverse set of techniques for unimproved sources of drinking water based on the 2020 Somaliland Health Demographic Survey (SHDS) model, which has its own strengths, assumptions, and formulas, allowing for a comprehensive analysis of their predictive capabilities.

3.9. Model Comparison and Evaluation

In this section, we compare the performance of the five machine learning algorithms introduced earlier, based on the key evaluation metrics commonly used in assessing classification models. These metrics included accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC). By comparing these metrics, we aim to identify the most effective model for predicting the use of unimproved drinking water sources.
TPs (True Positives) represent the number of correctly predicted positive instances.
TNs (True Negatives) represent the number of correctly predicted negative instances.
FPs (False Positives) represent the number of incorrectly predicted positive instances.
FNs (False Negatives) represent the number of incorrectly predicted negative instances.
Accuracy measures the proportion of correctly predicted instances compared with the total number of instances. This was calculated using the following formula:
A c c u r a c y = T P + T N ( T P + T N + F P + F N )
Sensitivity (recall), also known as the recall or true positive rate, measures the proportion of correctly predicted positive instances (i.e., correctly identifying students who performed well in mathematics) out of all actual positive instances. This was calculated using the following formula:
S e n s i t i v i t y = T P ( T P + F N )
Specificity measures the proportion of correctly predicted negative instances (i.e., correctly identifying students who did not perform well in mathematics) out of all actual negative instances. This was calculated using the following formula:
S p e c i f i c t y = T N ( T N + F P )
F1 Score: The F1 score is a combined measure of precision and recall. It provides a balanced assessment of the performance of the model by considering both false positives and false negatives. This was calculated using the following formula:
F 1   S c o r e = 2 ( P r e c i s i o n R e c a l l ) ( P r e c i s i o n + R e c a l l )
Precision: Precision is a measure of the model’s ability to correctly identify positive instances out of all predicted positive instances. This was calculated using the following formula:
P r e c i s i o n = T P ( T P + F P )

3.10. Area under the Curve (AUC)

A common metric for evaluating the adequacy of a supervised machine learning model is the area under the curve (AUC), which offers a comprehensive evaluation of the model’s performance over a range of classification thresholds [27]. It is frequently used to compare the overall performance of models and to assess their predictive power. In the context of our research on an unimproved source of drinking water in SHDS2020, a higher AUC score denotes a model’s superior capacity to rank instances accurately, assigning positive instances a higher probability than negative ones. We can learn a great deal about the efficacy and performance of supervised machine learning models in predicting the source of drinking water based on the SHDS2020 by using these model adequacy measures.
These measures were used to assess the performance of each model on the test dataset. When it comes to forecasting student dropout rates, the model with the highest scores across these metrics was considered the most accurate. Model evaluation is an essential first step in evaluating the performance of machine learning models. The models were tested using a validation dataset that was not used for the training.
To guarantee repeatability and clarity, the assessment procedure consisted of the following steps: Train/Test Split: to ensure uniformity and openness in the assessment procedure, the dataset was split into 80% training data and 20% testing data.
Cross-validation: K-fold cross-validation (k = 10) was used to improve the robustness and generalizability of the model.

4. Findings and Results

4.1. Descriptive Statistics

As shown in Table 1, the study examined a number of factors, including the source of drinking water, gender, highest education level, frequency of television viewing, employment status during the previous 12 months, age in 5-year intervals, district, region type of residence, literacy rates, and total wealth index. The findings show a large educational gap in the population, with the majority not attending formal schooling. Of the respondents, 86.24% said that they never watched television, which may be related to the limited viewing habits or cultural norms specific to a region. The majority of respondents (0.71%) had not held a job during the previous 12 months. The population’s demographics varied, with the largest age group (35–39) comprising 20.56% of participants and 17.43% between 40 and 44 years.
The majority of the respondents (71.21%) had a husband/partner who had attended school, whereas a smaller percentage (11%) did not. A significant number (8.26%) were unclear regarding their educational history. In the last 12 months, 42.98% of the respondents had a job while 56.60% did not. A small percentage (0.42%) were unsure about their husband/partner’s employment status. In terms of employment, just less than half of the respondents had a husband/partner who had worked in the past year, while over half did not.
The results showed a strong relationship between the type of drinking water source, age, wealth, location, gender, work status, and school attendance. Higher education levels were found in improved sources than in unimproved ones. The husband’s or partner’s region, income, age, and work status are all affected by the type of drinking water source. The number of children enrolled in a school also affects drinking water quality. The research highlights the necessity.

4.2. Magnitude of Source of Unimproved Water in Somaliland Health Demographic Survey

The analysis shows a concerning disparity in drinking water sources, where 53.88% of households use improved sources of drinking water, and 46.12% rely on unimproved sources, as depicted in Figure 1, indicating that a significant proportion of the population lacks access to safe drinking water.

4.3. Supervised Machine Learning Models

A thorough examination of the machine-learning models used to forecast student dropout and completion rates is presented in Table 2. Support vector machines (SVMs), random forests, decision trees, logistic regression, and K-nearest neighbors (KNNs) were among the models that were assessed. The AUC, balanced accuracy, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NV), prevalence, detection rate, and detection prevalence were among the crucial measures used to evaluate the performance of each model.
Table 2 shows the ML method models, namely random forests (RFs), decision trees (DTs), support vector machines (SVMs), and K-nearest neighbors (KNNs), which were applied to build a predictive binary model of the source of water. All predictive models had 80% training data and were tested on a test dataset of 20%. The performance of the predictive models was evaluated and compared using evaluation metrics, namely the confusion matrix, accuracy, precision, recall and F1 score, and the area under receiver operating characteristics (AUC). The performance of the five machine learning models is presented in the table below. When comparing the models’ total accuracy, the RF model performs the best, scoring a 0.9357 accuracy, followed by the DT model with an accuracy of 0.8493; the DT model likewise exhibits good performance. This was followed by KNN with an accuracy of 79%. Conversely, the SVM model accurately predicted the class labels by 68.43%. In terms of the other metrics, the RF model was the most accurate in identifying the improved and unimproved classes, with the best sensitivity (0.9267) and specificity (0.9437). The KNN model had the highest PPV and the lowest false-positive rate (0.9966). The model with the highest negative predictive was the DT (0.8396).

4.4. Features Selection These Features

The performance of the five machine learning models, evaluated using the area under the curve (AUC), demonstrated their ability to differentiate across classes. Both the random forest (RF) and decision tree (DT) models showed the highest AUC values of 98%, indicating superior classification performance. In contrast, the K-nearest neighbor (KNN) model performed the worst, with the lowest AUC score of 72%.
In terms of feature importance, different models highlight various significant factors that influence the decision-making processes. For the logistic regression model, regional and district-level factors, such as the Sanaag, Sool, Lasanod, Burao, Buhoodle, Oodwayne, Berbera, Togdheer, Gabiley, and Waqooyi Galbeed regions, are critical. Wealth and age were significant predictors, whereas urban literacy, television access, and employment were less influential. By focusing on wealth and residence, while considering less significant factors such as education level and employment, the model’s performance can be optimized.
Similarly, the decision tree model’s feature importance analysis emphasizes the district and region as the most important factors, with wealth and age also being significant. Conversely, characteristics such as education, television, and literacy ranked lower. The focus on wealth and residence could further enhance the model’s accuracy, along with fine-tuning based on less significant features such as education and employment.
For the support vector machine (SVM) model, television-related variables emerged as the most influential factors. Wealth and residence were also important, whereas employment, education, and literacy were less important. Adjusting the model to prioritize wealth and residence while considering less significant factors can improve the overall performance.
The KNN algorithm’s feature importance analysis ranked district, area, and age as the most significant variables, followed by wealth, television, and residence. Less importance was placed on literacy, age, and educational attainment. By focusing on key predictors, such as district and age, the prediction accuracy of the KNN model can potentially be improved through further fine-tuning.
In summary, while different models prioritize different features, the common key predictors across most models include region, district, wealth, and age, suggesting that these variables are crucial for optimizing the classification performance in this study.

4.5. Model Evaluation and Selection

Many considerations must be made when choosing the best model for assessing and enhancing the quality of drinking water sources. This study evaluated the model’s goodness of fit and selection using Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC). These criteria help assess the quality of the model in terms of its ability to explain the data while considering the complexity of the variables included. As shown in Table 3 the model considered had a lower AIC (9724.130) and BIC (10,033.410) than the null model, suggesting a better fit to the data. The AIC and BIC values were calculated using log-likelihood values and degrees of freedom, respectively. In this case, the model being considered has a better fit to the data than the null model, as indicated by the lower AIC and BIC values.
The statistical findings of this study, which highlight disparities in access to drinking water sources, have significant implications for public health and policies. Communities that rely on unimproved water sources are at a greater risk of waterborne diseases, including diarrhea, cholera, and typhoid, which disproportionately affect children and vulnerable populations. These health outcomes underscore the urgent need for targeted public health interventions such as improving water quality and promoting hygiene practices. Policymakers should prioritize investments in infrastructure to ensure equitable access to safe water, particularly in underserved rural areas. Furthermore, the disparities revealed by the data indicate broader socioeconomic inequities, suggesting that interventions must address poverty and education. For example, increasing the awareness of the risks associated with unimproved water quality through public health campaigns could complement infrastructural improvements. By translating these findings into actionable policies, governments and stakeholders can better align with global development goals, such as the United Nations’ sustainable development goal 6, which aims to ensure the availability and sustainable management of water and sanitation.

5. Discussion

This study on the use of unimproved drinking water sources in Somaliland identifies the factors contributing to their use. It emphasizes the importance of clean water as a human right and highlights the UN’s programs to improve access. Factors such as socioeconomic status, education, and community-level infrastructure contribute to the use of unimproved water sources.
This study used the Somaliland Demographic and Health Survey (SDHS) 2020 dataset to analyze factors associated with unimproved water sources. Factors such as age, education level, residence, and school attendance were associated with unimproved water sources. The total number of children born is also linked to unimproved water sources.
Wealth index, number of household members, and employment status were significantly associated. The study concluded that policies should be developed to improve access to safe drinking water, enhance overall health, and achieve development goals.
Socioeconomic status is a key factor determining access to better drinking water. According to previous studies [28,29], people who come from lower socioeconomic backgrounds are more likely to depend on unimproved water sources because they have less access to infrastructure.
The use of unimproved water sources is substantially related to a variety of factors, including the wealth of the family, the number of individuals in the household, and job status. This is consistent with the results for other areas, such as those where socioeconomic inequalities lead to uneven access to water and sanitation services [30,31].
These inequalities not only impact the health outcomes of individuals but also contribute to the perpetuation of cycles of poverty and inequality within communities. The ramifications of these disparities are significant. The level of education available is a significant factor that affects the quality of water sources.
According to the statistics provided by SDHS 2020, higher levels of education are related to a lower chance of using water sources that have not been upgraded, and studies have suggested that educated people are more aware of the health hazards associated with poor water quality and are more able to advocate for improved water services [32].
This is because educated individuals are likely to be more aware of these factors. In addition, the availability of community-level infrastructure such as schools and health facilities may have an impact on water accessibility [28].
In rural regions, where access to better facilities is restricted, inadequate infrastructure often leads to dependence on unimproved sources
This is particularly true in locations in which electricity is not readily available. Therefore, it is necessary to improve the educational opportunities and community infrastructure in Somaliland to encourage greater access to water and general health.
This study also underlines the significance of demographic parameters such as age and school attendance in connection with water sources that have not been upgraded. There is a higher probability that younger individuals and those enrolled in schools are aware of the significance of access to clean water and sanitation. This knowledge has the potential to result in behavioral changes that encourage the use of better water sources [33].
Furthermore, there is a correlation between the total number of children born at home and the use of unimproved water sources, which suggests that larger families may have more difficulty gaining access to clean drinking water. This study highlights the need for comprehensive strategies to address the multidimensional nature of water access by considering the demographic changes and family arrangements regarding water availability.
In conclusion, research on unimproved drinking water sources in Somaliland provides significant insights into the social, educational, and infrastructural factors that determine access to water. The results highlight the need for policies that not only enhance access to clean drinking water but also address the underlying socioeconomic variables that lead to water insecurity. Therefore, these policies should be implemented immediately. Stakeholders can work toward attaining sustainable development goals and ensuring that clean water is available to everyone by concentrating on improving education, socioeconomic circumstances, and community infrastructure.
In conclusion, research on unimproved drinking water sources in Somaliland revealed a multifaceted problem influenced by socioeconomic, educational, and infrastructural factors. Addressing these issues requires comprehensive policies aimed at improving access to clean water while tackling underlying inequalities. To that end, the government, in partnership with international organizations and the private sector, should prioritize:
  • Subsidized water infrastructure development for low-income households.
  • Education campaigns and school-based WASH programs to promote safe water practices.
  • Community-level investments in water supply systems, especially in rural areas.
  • Public–private partnerships for sustainable water infrastructure projects.
  • Family planning and water cooperatives to manage household water needs more efficiently.
By implementing these strategies, Somaliland can improve access to safe drinking water and progress toward achieving its development goals, ensuring that clean water becomes accessible to all its citizens.

6. Conclusions

The Somaliland Demographic and Health Survey 2020 revealed an association between the use of unimproved water sources and poor health outcomes. These findings suggest the need for targeted education and awareness campaigns to raise awareness of the risks and benefits of accessing improved water sources.
The findings of this study have significant implications beyond the immediate access to clean drinking water. From a public health perspective, improving access to safe water is critical for reducing the incidence of waterborne diseases such as diarrhea and cholera, which are common in areas reliant on unimproved water sources. By ensuring that communities have access to clean water, health outcomes can be significantly improved, reducing the burden on healthcare systems and enhancing overall quality of life. Healthier populations are better able to engage in productive activities, which leads to improved educational outcomes for children and greater adult workforce participation.
Economically, access to clean water is a cornerstone of sustainable development. When communities, particularly in rural areas, are equipped with reliable water sources, they can dedicate more time and resources to economic activities rather than spending excessive hours collecting water from distant or unsafe sources. This shift enables greater participation in the labor force, boosts agricultural productivity, and stimulates local economies. Furthermore, the investment in water infrastructure creates job opportunities at both local and national levels in construction, maintenance, and water management.
On a broader scale, the findings directly align with the United Nations Sustainable Development Goals, particularly SDG 6, which seeks to “ensure availability and sustainable management of water and sanitation for all.” Addressing the inequities in water access in Somaliland contributes not only to achieving SDG 6 but also intersects with other SDGs, including SDG 3 (Good Health and Well-being), SDG 4 (Quality Education), and SDG 10 (Reduced Inequalities). Ensuring equitable access to clean water supports educational achievement by reducing absenteeism caused by water-related illnesses, fosters gender equality by alleviating the water collection burden often placed on women and girls, and promotes economic equity by improving livelihoods in underserved areas.
By addressing the challenges associated with unimproved water sources in Somaliland, this study contributes to a larger framework for public health improvements, economic development, and progress toward sustainable development.
Policymakers, water authorities, and community organizations must leverage these findings to design and implement comprehensive water access initiatives that promote health, economic resilience, and sustainable growth for all communities.

7. Future Research

Building on the findings of this study, future research should explore the long-term health impacts of reliance on unimproved drinking water sources. While this study identified socioeconomic and geographical disparities in water access, further investigation is needed to quantify direct and indirect health consequences, such as chronic illnesses and the economic burden on families and healthcare systems. Longitudinal studies that track health outcomes in populations with limited access to clean water over time could provide deeper insights into the relationship between water access and public health, particularly among vulnerable groups such as children and the elderly.
Future research should examine the effectiveness of policy interventions aimed at improving water access and quality. While this study highlights the need for infrastructural investment and public health campaigns, it remains unclear which specific strategies are most effective in different contexts. Comparative studies that assess the outcomes of various intervention approaches, such as community-based water management, government-led infrastructure projects, and public–private partnerships, would provide valuable evidence for policymakers. This study could also explore the role of local governance and community engagement in ensuring the sustainability of water-related projects.
Finally, future research should investigate the role of climate change in exacerbating the disparities in water access. As climate change continues to affect water availability, particularly in arid regions such as Somaliland, it is crucial to understand how changing weather patterns, droughts, and water scarcity might worsen existing inequities. Studies that model future water availability under different climate scenarios and assess the resilience of water infrastructure could help inform adaptive policies that prepare communities for future challenges. Understanding these dynamics is critical for developing long-term solutions to ensure equitable access to safe drinking water under a changing climate.

Author Contributions

Conceptualization, H.M.I. and M.A.H.; methodology, H.M.I. and M.A.H.; software, M.A.H.; validation, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; formal analysis, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; investigation, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; resources, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; data curation, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; writing—original draft preparation, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; writing—review and editing, H.M.I., Y.H.M., S.N. and A.H.M.; visualization, H.M.I., M.A.H., Y.H.M., S.N. and A.H.M.; supervision, M.A.H. and S.N.; project administration, H.M.I. and M.A.H.; funding acquisition, H.M.I. and M.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

No funding agency was awarded for this study.

Institutional Review Board Statement

The study utilized publicly available secondary data, which can be accessed at no cost through the website https://microdata.nbs.gov.so/index.php/catalog/50 (accessed on 7 July 2024).

Data Availability Statement

All raw data sources are available upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Addisie, M.B. Evaluating Drinking Water Quality Using Water Quality Parameters and Esthetic Attributes. Air Soil Water Res. 2022, 15, 117862212210750. [Google Scholar] [CrossRef]
  2. World Health Organization; UNICEF. Progress on Sanitation and Drinking-Water; World Health Organization: Geneva, Switzerland, 2013. [Google Scholar]
  3. Azanaw, J.; Abera, E.; Malede, A.; Endalew, M. A multilevel analysis of improved drinking water sources and sanitation facilities in Ethiopia: Using 2019 Ethiopia mini demographic and health survey. Front. Public Health 2023, 11, 1063052. [Google Scholar] [CrossRef] [PubMed]
  4. Andualem, Z.; Dagne, H.; Azene, Z.N.; Taddese, A.A.; Dagnew, B.; Fisseha, R.; Muluneh, A.G.; Yeshaw, Y. Households access to improved drinking water sources and toilet facilities in Ethiopia: A multilevel analysis based on 2016 Ethiopian Demographic and Health Survey. BMJ Open 2021, 11, e042071. [Google Scholar] [CrossRef] [PubMed]
  5. The Somaliland Health and Demographic Survey 2020. Available online: www.somalilandmohd.com (accessed on 22 September 2024).
  6. Yusuf, A.M. Assessment of the Bacteriological, the Physicochemical Chemical Qualities of Drinking Water in Hargeisa, Somaliland. Master’s Thesis, University of Nairobi, Nairobi, Kenya, 2022. [Google Scholar]
  7. Damtew, Y.T.; Geremew, A. Households with Unimproved Water Sources in Ethiopia: Spatial Variation and Point-of-Use Treatment Based on 2016 Demographic and Health Survey. Environ. Health Prev. Med. 2020, 25, 81. [Google Scholar] [CrossRef] [PubMed]
  8. Kassie, A.W.; Mengistu, S.W. Spatiotemporal Analysis of the Proportion of Unimproved Drinking Water Sources in Rural Ethiopia: Evidence from Ethiopian Socioeconomic Surveys (2011 to 2019). J. Environ. Public Health 2022, 2022, 2968756. [Google Scholar] [CrossRef]
  9. Aragaw, F.M. Unimproved Source of Drinking Water and Its Associated Factors: A Spatial and Multilevel Analysis of Ethiopian Demographic and Health Survey. BMC Public Health 2023, 23, 1455. [Google Scholar] [CrossRef]
  10. Morakinyo, O.M.; Adebowale, A.S.; Obembe, T.; Oloruntoba, E.O. Association Between Household Environmental Conditions and Nutritional Status of Women of Childbearing Age in Nigeria. PLoS ONE 2020, 15, e0243356. [Google Scholar] [CrossRef]
  11. Afrifa-Anane, G.F.; Kyei-Arthur, F.; Agyekum, M.W.; Afrifa, E. Factors Associated with Comorbidity of Diarrhoea and Acute Respiratory Infections Among Children Under Five Years in Ghana. PLoS ONE 2022, 17, e0271685. [Google Scholar] [CrossRef]
  12. Amadu, I.; Seidu, A.-A.; Agyemang, K.K.; Arthur-Holmes, F.; Duku, E.; Salifu, I.; Bolarinwa, O.A.; Hagan, J.E., Jr.; Ahinkorah, B.O. Joint Effect of Water and Sanitation Practices on Childhood Diarrhoea in Sub-Saharan Africa. PLoS ONE 2023, 18, e0283826. [Google Scholar] [CrossRef]
  13. Manalew, W.S.; Tennekoon, V. Dirty Hands on Troubled Waters: Sanitation, Access to Water and Child Health in Ethiopia. Rev. Dev. Econ. 2019, 23, 1800–1817. [Google Scholar] [CrossRef]
  14. Bain, R.; Cronk, R.; Wright, J.; Yang, H.; Slaymaker, T.; Bartram, J. Fecal contamination of drinking-water in low- and middle-income countries: A systematic review and meta-analysis. PLoS Med. 2014, 11, e1001644. [Google Scholar] [CrossRef] [PubMed]
  15. Mogasale, V.; Ramani, E.; Park, J.Y.; Wierzba, T.F. Estimating Typhoid Fever Risk Associated with Lack of Access to Safe Water: A Systematic Literature Review. J. Environ. Public Health 2018, 2018, 9589208. [Google Scholar] [CrossRef] [PubMed]
  16. Muhumed, O.A. College of Urban Deveolpment and Engineering Department of Environment and Climate Change Management: Assessing Water Supply Challenges and Sanitation, in the Case of Hargeisa City; in Somaliland. Master’s Thesis, Ethiopian Civil Service University, Addis Abeba, Ethiopia, 2020. [Google Scholar]
  17. Yu, W.; Wardrop, N.A.; Bain, R.; Alegana, V.A.; Graham, L.J.; Wright, J.A. Mapping Access to Domestic Water Supplies from Incomplete Data in Developing Countries: An Illustrative Assessment for Kenya. PLoS ONE 2019, 14, e0216923. [Google Scholar] [CrossRef] [PubMed]
  18. Van Buuren, S. Flexible Imputation of Missing Data; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  19. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019; Volume 793. [Google Scholar]
  20. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
  21. Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  22. Sud, K.; Erdogmus, P.; Kadry, S. Introduction to Data Science and Machine Learning; BoD–Books on Demand: Norderstedt, Germany, 2020. [Google Scholar]
  23. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  24. Cortes, C. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  25. Wieczorek, J.; Lei, J. Model selection properties of forward selection and sequential cross-validation for high-dimensional regression. Can. J. Stat. 2022, 50, 454–470. [Google Scholar] [CrossRef]
  26. Iacucci, M.; McQuaid, K.; Gui, X.S.; Iwao, Y.; Lethebe, B.C.; Lowerison, M.; Matsumoto, T.; Shivaji, U.N.; Smith, S.C.L.; Subramanian, V.; et al. A multimodal (FACILE) classification for optical diagnosis of inflammatory bowel disease associated neoplasia. Endoscopy 2019, 51, 133–141. [Google Scholar] [CrossRef]
  27. Rainio, O.; Teuho, J.; Klén, R. Evaluation metrics and statistical tests for machine learning. Sci. Rep. 2024, 14, 6086. [Google Scholar] [CrossRef]
  28. Angoua, E.L.E.; Dongo, K.; Templeton, M.R.; Zinsstag, J.; Bonfoh, B. Barriers to Access Improved Water and Sanitation in Poor Peri-Urban Settlements of Abidjan, Côte D’Ivoire. PLoS ONE 2018, 13, e0202928. [Google Scholar] [CrossRef]
  29. Oskam, M.J.; Pavlova, M.; Hongoro, C.; Groot, W. Socio-Economic Inequalities in Access to Drinking Water Among Inhabitants of Informal Settlements in South Africa. Int. J. Environ. Res. Public Health 2021, 18, 10528. [Google Scholar] [CrossRef]
  30. Deshpande, A.; Miller-Petrie, M.K.; Lindstedt, P.A.; Baumann, M.M.; Johnson, K.B.; Blacker, B.F.; Abbastabar, H.; Abd-Allah, F.; Abdelalim, A.; Abdollahpour, I.; et al. Mapping Geographical Inequalities in Access to Drinking Water and Sanitation Facilities in Low-Income and Middle-Income Countries, 2000–2017. Lancet Glob. Health 2020, 8, e1162–e1185. [Google Scholar] [CrossRef] [PubMed]
  31. Pullan, R.L.; Freeman, M.C.; Gething, P.W.; Brooker, S. Geographical Inequalities in Use of Improved Drinking Water Supply and Sanitation Across Sub-Saharan Africa: Mapping and Spatial Analysis of Cross-Sectional Survey Data. PLoS Med. 2014, 11, e1001626. [Google Scholar] [CrossRef] [PubMed]
  32. Hutton, G.; Chase, C. The Knowledge Base for Achieving the Sustainable Development Goal Targets on Water Supply, Sanitation and Hygiene. Int. J. Environ. Res. Public Health 2016, 13, 536. [Google Scholar] [CrossRef] [PubMed]
  33. Rupani, M.P.; Trivedi, A.V.; Singh, M.P.; Tundia, M.N.; Patel, K.N.; Parikh, K.D.; Parmar, V.B. Socio-Demographic, Epidemiological and Environmental Determinants of Acute Gastroenteritis in Western India. J. Nepal Med. Assoc. 2016, 54, 8–16. [Google Scholar] [CrossRef]
Figure 1. Sources of drinking water results.
Figure 1. Sources of drinking water results.
Water 16 02986 g001
Table 1. Univariate analysis of individual- and community-level Somaliland, SDHS 2020.
Table 1. Univariate analysis of individual- and community-level Somaliland, SDHS 2020.
VariableCategoryFrequency (n)Percentage %
Source of drinking waterUnimproved
Improved
3848
4495
46.12
53.88
GenderMale
Female
5141
3202
61.62
38.38
Highest education level No education
Primary education
Secondary education
7350
870
123
88.10
10.43
1.47
Frequency of watching televisionAt least once a week
Less than once a week
Not at all
953
195
7195
11.42
2.34
86.24
Respondent worked in last 12 monthYes
No
59
8284
0.71
99.29
Age in 5-year groups15–19
20–24
25–34
30–34
35–39
40–44
45–49
104
601
1520
1715
2126
1454
823
1.25
7.20
18.22
20.56
25.48
17.43
9.86
RegionAwdal
Woqooyi-galbeed
Togdheer
Sool
Sanaag
1175
1961
1599
1840
1768
14.08
23.50
19.17
22.05
21.19
Type of place of residenceRural
Urban
4275
4068
51.24
48.76
Districtborama
Waqooyi galbed
buro
lasanod
erigavo
1175
1961
1599
1840
1768
14.08
23.50
19.17
22.05
21.19
LiteracyCannot Read at all|
Able to read only part of the sentence
Able to read whole sentence|
No card with available language|
Blind/Visually impaired|
7006
755

522

41

19
83.9
9.0

6.25

0.49

0.23
83.9
Wealth index combinedLowest
Second
Middle
Fourth
Fifth
3455
1248
828
1085
1727
41.41
14.96
9.92
13.00
20.70
Husband partner/ever attended school 0
Yes
No
3
Don’t Know
5941
918
689
456
339
71.21
11.00
8.26
5.47
4.06
Husband/partner worked in the last 12 months Yes
No
Don’t Know
3586
4722
35
42.98
56.60
0.42
Table 2. Machine learning models.
Table 2. Machine learning models.
Logistic RegressionRandom ForestDecision TreeSVMKNN
PredictedImprovedUnimprovedImprovedUnimprovedImprovedUnimprovedImprovedUnimprovedImprovedUnimproved
Improved822 352 2531 143 1010 164 706 468 11704
Unimproved411917134303321311153221006511817
Accuracy0.6950 0.9357 0.8493 0.6843 0.7942
Sensitivity0.6667 0.9267 0.8258 0.6868 0.6960
Specificity0.7226 0.9437 0.8718 0.6825 0.9951
Pos pred value0.7002 0.9370 0.8603 0.6014 0.9966
Neg pred value0.6905 0.9345 0.8396 0.7575 0.6152
Prevalence0.4928 0.4744 0.4888 0.4109 0.6719
Detection rate0.3285 0.4396 0.4037 0.2822 0.4676
Detection prevalence0.4692 0.4692 0.4692 0.4692 0.4692
Balanced accuracy0.6946 0.4692 0.4692 0.4692 0.4692
AUC0.759 0.980 0.758 0.896 0.728
Table 3. Akaike’s Information Criterion and Bayesian Information Criterion.
Table 3. Akaike’s Information Criterion and Bayesian Information Criterion.
Akaike’s Information Criterion
and Bayesian Information Criterion
ll (Null)ll (Model)DfAICBIC
N 8343−5757.814−4818.065449724.13010,033.410
Note: BIC uses N = number of observations.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ismail, H.M.; Muse, A.H.; Hassan, M.A.; Muse, Y.H.; Nadarajah, S. Analyzing Unimproved Drinking Water Sources and Their Determinants Using Supervised Machine Learning: Evidence from the Somaliland Demographic Health Survey 2020. Water 2024, 16, 2986. https://doi.org/10.3390/w16202986

AMA Style

Ismail HM, Muse AH, Hassan MA, Muse YH, Nadarajah S. Analyzing Unimproved Drinking Water Sources and Their Determinants Using Supervised Machine Learning: Evidence from the Somaliland Demographic Health Survey 2020. Water. 2024; 16(20):2986. https://doi.org/10.3390/w16202986

Chicago/Turabian Style

Ismail, Hibak M., Abdisalam Hassan Muse, Mukhtar Abdi Hassan, Yahye Hassan Muse, and Saralees Nadarajah. 2024. "Analyzing Unimproved Drinking Water Sources and Their Determinants Using Supervised Machine Learning: Evidence from the Somaliland Demographic Health Survey 2020" Water 16, no. 20: 2986. https://doi.org/10.3390/w16202986

APA Style

Ismail, H. M., Muse, A. H., Hassan, M. A., Muse, Y. H., & Nadarajah, S. (2024). Analyzing Unimproved Drinking Water Sources and Their Determinants Using Supervised Machine Learning: Evidence from the Somaliland Demographic Health Survey 2020. Water, 16(20), 2986. https://doi.org/10.3390/w16202986

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop