Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni

Ndawo, Mpondomise Nkosinathi; Dzansi, Dennis; Tangwe, Stephen Loh

doi:10.3390/urbansci9080294

Open AccessArticle

Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni

by

Mpondomise Nkosinathi Ndawo

^1,*

,

Dennis Dzansi

²

and

Stephen Loh Tangwe

³

¹

Management College of Southern Africa, MANCOSA, Research Directorate, 26 Samora Machel Street, Durban 4001, South Africa

²

CUT Entrepreneurship Development Unit, Faculty of Management Sciences, Central University of Technology, Bloemfontein 9301, South Africa

³

Resources and Operations Division, Central University of Technology, Bloemfontein 9301, South Africa

^*

Author to whom correspondence should be addressed.

Urban Sci. 2025, 9(8), 294; https://doi.org/10.3390/urbansci9080294

Submission received: 7 June 2025 / Revised: 3 July 2025 / Accepted: 16 July 2025 / Published: 29 July 2025

Download

Browse Figures

Versions Notes

Abstract

This study investigates the encroachment risk in the Makause informal settlement by analysing resident survey data to identify key contributing factors and build predictive models. Encroachment threatens the water infrastructure through damage, contamination, and service disruptions, highlighting the need for informed, community-based planning. The data was collected from 105 residents, with responses (“Yes,” “No,” “Unsure”) analysed using descriptive statistics and a one-way ANOVA to identify significant differences across categories. The ReliefF algorithm was used to rank the importance of features predicting the encroachment risk. These inputs were then used to train, validate, and test an Artificial Neural Network (ANN) model. The Artificial Neural Network demonstrated a high predictive accuracy, achieving correlation coefficients above 95% and low mean squared errors. The ANOVA identified statistically significant mean differences for selected variables, while ReliefF helped determine the most influential predictors. A high agreement level (p > 0.900) between predicted and actual responses confirmed the model’s validity. This research introduces an innovative, data-driven framework that integrates machine learning and a statistical analysis to support municipalities and utility providers in engaging informal communities to protect infrastructure. While this study is limited to Makause and may be affected by a self-reported bias, it demonstrates the potential of Artificial Neural Networks and ReliefF in enhancing the risk analysis and infrastructure management in informal settlements.

Keywords:

encroachment; informal settlements; water infrastructure; ReliefF; artificial neural network; infrastructure risk

1. Introduction

The effective management of the water supply infrastructure remains fundamental to sustainable urban development, particularly in the context of rapid urbanisation and the proliferation of informal settlements in South African cities [1]. Ref. [2] emphasises that addressing the spatial complexity of African urban growth requires a recognition of the significant human impacts of the Technocene era on infrastructure systems. Recent studies, such as those by [3,4], highlight the increasing importance of community-based hazard models in informal settlements and underscore the need for participatory approaches that build local awareness and resilience. This study contributes to this growing discourse by demonstrating how Artificial Neural Network (ANN) and ReliefF-based machine learning pipelines can complement and enhance traditional spatial tools, providing communities with an adaptive and cost-effective solution for predicting encroachment risks and designing more effective awareness campaigns.

The management of the bulk water infrastructure is a critical aspect of urban resilience, particularly as cities expand and informal settlements increasingly encroach upon essential services. Encroachment presents significant threats to water supply systems, elevating the risks of infrastructural damage, contamination, and service disruptions. Given the rapid expansion of informal settlements, traditional monitoring and protection methods have proven inadequate. Recent advancements in machine learning (ML) and Artificial Neural Networks offer innovative solutions for the proactive management of bulk water systems. These technologies excel at analysing complex data from informal settlements, detecting patterns overlooked by conventional methods, and supporting real-time decision-making and infrastructure planning.

Ref. [4] observed that between 2020 and 2022, the KwaZulu-Natal province and eThekwini Municipality in South Africa experienced a series of disasters, including catastrophic flooding, widespread looting, and the impact of the COVID-19 pandemic. These crises had profound and multi-dimensional consequences for low-income communities, particularly informal settlements such as Abahlali Basemjondolo. The study further noted that municipal officials, provincial authorities, social workers, academics, and practitioners collaborated with a community engagement research centre to provide psychosocial support to affected residents. These findings highlight the increased risks and vulnerabilities faced by informal settlement communities, largely due to the lack of formal spatial planning and the establishment of dwellings in flood-prone areas, which threaten both property and human life.

Historically, the literature on water infrastructure management focused predominantly on technical aspects such as system design and maintenance. Ref. [5], for instance, underscored the importance of engineering solutions for water distribution networks, while [6] advanced hydraulic modelling techniques to optimise the water flow and energy consumption in bulk water systems. These foundational studies highlighted the necessity of reliable and efficient systems to meet the demands of growing urban populations.

However, classical approaches often overlooked the socio-political and human dimensions of water management, particularly the challenges posed by informal settlements. While [7] advocated for integrating water management with broader social and environmental policies, specific frameworks for addressing infrastructure encroachment remained underdeveloped.

More recent perspectives stress the significance of urban resilience, participatory planning, and socio-technical systems, especially in cities like Ekurhuleni where informal settlements place additional pressure on water supply networks. Ref. [8] presented a comprehensive framework for conceptualising urban infrastructure resilience, emphasising the integration of physical assets, institutional capacity, and community engagement. Additionally, Ref. [9] demonstrated how participatory and inclusive neighbourhood planning can mitigate the negative impacts of informal settlements by aligning the infrastructure provision with local needs and behaviours. Together, these studies highlight the importance of incorporating socio-technical considerations and participatory governance into the planning and protection of critical water supply infrastructure in rapidly urbanising contexts.

Earlier methods for water infrastructure management were constrained by technological limitations. Inspections were manual and maintenance followed fixed schedules rather than data-driven insights, often leading to reactive rather than preventative measures. This approach struggled to address challenges such as leaks, contamination, and damage, particularly in areas impacted by informal settlement growth, underscoring the need for more sophisticated monitoring tools and methodologies.

Traditionally, statistical models such as the survival data analysis and Poisson regression were employed to predict water infrastructure failures. Ref. [10] improved upon earlier models by introducing a survival analysis to account for right-censored data, but these approaches still fell short in capturing the complex, non-linear interactions associated with the encroachment by informal settlements. Geographic Information Systems (GISs) have also been widely used, focusing on variables like land use, population growth, and the infrastructure age [11]. Although a GIS improves infrastructure management by pinpointing problem areas, it typically does not account for human-induced pressures such as encroachment.

Hydraulic models, as discussed by [12], were designed to calibrate pipe characteristics to withstand soil pressure but did not consider human interventions like illegal structures built on servitudes. Similarly, Decision Support Systems (DSSs), examined by [13], focused primarily on technical factors such as the drainage density, slope, and land use, neglecting the impact of informal construction on flood risks. Ref. [14] noted that while spatial approaches have evolved with technological advancements, many studies remain limited due to the lack of detailed, granular data, particularly at the local level.

Poisson regression models have been employed to model failure rates based on factors such as the pipe material and diameter, but they often exclude relevant explanatory variables like the soil type and weather conditions [15]. In contrast, modern AI and machine learning techniques offer the capability to simulate and analyse complex, non-linear relationships between multiple variables, providing a more holistic understanding of risk factors. Artificial Neural Networks, in particular, excel at handling large datasets and non-linear dependencies, making them highly effective for water infrastructure management.

Recent advancements demonstrate the increasing adoption of machine learning and Artificial Neural Networks for water management tasks such as demand forecasting, leakage detection, and system optimisation. Ref. [16] developed a machine learning model that accurately predicts the urban water demand using historical, weather, and socio-economic data. Ref. [17] applied machine learning to detect leaks in distribution networks using sensor data, achieving a high accuracy and enabling faster repairs. However, these studies often focus on technical aspects and overlook human and community factors, which are crucial for successful implementation [14].

Artificial Neural Networks have also proven effective in predicting pipe failures and optimising distribution systems. Ref. [18] demonstrated that an Artificial Neural Network could accurately estimate pipeline failure probabilities based on historical condition data. Ref. [19] (2024) used an Artificial Neural Network to optimise pump schedules and valve settings, reducing energy consumption while maintaining an adequate pressure—a crucial consideration for resource-constrained municipalities such as the City of Ekurhuleni.

Despite these advances, a notable gap persists regarding the integration of community and human factors into machine learning and Artificial Neural Network frameworks. Many studies fail to address challenges unique to informal settlements, such as poor infrastructure documentation and limited community engagement [11,12,13,14]. This gap underscores the novelty of the present study, which seeks to develop a human-centred Artificial Neural Network and machine learning framework to predict and mitigate encroachment risks in informal settlements.

Addressing this gap requires research that integrates social, economic, and cultural variables into predictive models and actively involves communities in data collection and monitoring. Participatory machine learning offers a promising solution, whereby residents contribute local knowledge and data, improving the model accuracy and fostering community trust. For example, residents of Makause Ward 91, Germiston, could be trained to monitor water infrastructure encroachment and report issues, supplying real-time data to continuously refine predictive models. This collaborative approach empowers both municipalities and communities to protect vital infrastructure more effectively.

2. The Aim of the Study

To model and analyse residents’ understanding of encroachment risks to water infrastructure in the Makause informal settlement using an Artificial Neural Network (ANN), to identify key influencing factors and support data-driven risk mitigation strategies.

3. Objectives

The primary objectives of this study include the following:

To design and train an Artificial Neural Network (ANN) model that predicts levels of the resident understanding of encroachment risks to water infrastructure.
To evaluate the predictive accuracy and performance of the Artificial Neural Network model in classifying the resident understanding of encroachment risks.
To interpret the relative importance of input variables in shaping resident awareness and risk perception using Artificial Neural Network techniques.
To provide data-driven insights to policymakers and urban planners for targeted awareness campaigns and infrastructure protection in informal settlements.

4. Research Question

To what extent can an Artificial Neural Network (ANN) accurately model and predict residents’ understanding of encroachment risks?
How can insights from the Artificial Neural Network analysis inform risk communication and urban planning strategies for informal settlements?

These questions aim to explore the feasibility of a comprehensive, practical model that can be implemented by local authorities and stakeholders involved in urban planning and infrastructure management.

5. Research Methodology

A mixed methods approach was chosen for this study to comprehensively investigate the multifaceted problem of water infrastructure encroachment caused by informal settlements. By integrating quantitative and qualitative methods, this study captures not only statistical trends but also contextual insights needed to develop a robust and practical management framework. The mixed methods research design, which integrates both philosophical assumptions and practical research methods, was applied in this study to comprehensively investigate the problem of water infrastructure degradation caused by informal settlements [20]. This approach draws on the philosophical paradigms of post-positivism, which emphasises objective measurement and hypothesis testing, and interpretivism, which focuses on understanding human experiences and social phenomena in context [21]. By merging quantitative and qualitative data collection and analysis strategies, mixed methods research offers a more comprehensive understanding of complex issues that may not be adequately captured by a single method alone [22].

The quantitative component enabled the collection of structured data from a broad group of stakeholders, including residents of informal settlements from the Ekurhuleni municipality and Rand Water. Furthermore, this allowed the researcher to measure the extent of the infrastructure degradation and identify patterns and relationships that can serve as a basis for strategic interventions. Conversely, the qualitative component enabled in-depth interviews and observations that revealed the lived experiences, socio-economic drivers, and governance challenges associated with the encroachment of water servitudes. This dual approach was important to capture both the statistical and contextual dimensions of the problem.

In addition, the triangulation of data, one of the strengths of mixed methods, increases the validity of results by comparing evidence from multiple sources [23]. This is crucial in this study, where stakeholder perspectives, spatial data, and policy documents are interwoven to formulate a robust and actionable management framework. Ref. [24] asserts that triangulation not only strengthens conclusions but also promotes richer interpretations that can inform both theory and practice. Ultimately, the use of mixed methods in this study ensured a balanced approach to understanding both the quantifiable impacts of informal settlement growth and the qualitative factors that influence the community behaviour, administrative capacity, and infrastructure vulnerability. This comprehensive approach supports the development of actionable and context-sensitive strategies to protect critical water infrastructure from future encroachment.

6. Equations and Models Used

The mathematical formulae used for the sample size calculation, correlation analysis, ReliefF feature ranking, neural network training, and statistical significance testing are presented in Appendix A. The main text focuses on the logical flow of the methods and their application to the intervention risk assessment.

(i)

Sample determination:

The sample size was calculated using standard formulae for unknown populations based on the methods of [24,25]. To ensure statistical reliability, a confidence level of 95 and an acceptable margin of error were applied (see Appendix A for formulae).

(ii)

Correlation and Model Accuracy:

Correlation coefficients, determination coefficients (R²), and the root mean square error (RMSE) were calculated to assess relationships between variables and the accuracy of predictive models. The detailed equations used in these calculations are provided in Appendix A.

(iii)

ReliefF Algorithm:

The ReliefF algorithm was used to identify the most important predictors of encroachment risk. In simple terms, ReliefF updates the importance of each factor based on how well it distinguishes between different response categories. The core logic is as follows:

-: If two observations belong to the same group, the importance score of factors decreases with the similarity.
-: If they belong to different groups, the importance score increases with the difference.
Detailed update and distance equations are presented in Appendix A.

(iv)

Neural Network Transfer Function:

A logistic sigmoid transfer function was used in the neural network model to predict the risk of encroachment. The Levenberg–Marquardt algorithm was used for training. The complete mathematical expressions can be found in Appendix A.

(v)

Statistical Significance: To determine significance, the p-values were calculated based on the t-statistics derived from the correlation coefficients. Details on the calculation of the t-values and the degrees of freedom can be found in Appendix A.

7. Workplace Site Exposure Sample Determination

A sample size of 15, from 105 residents, was used for the experimental exposure measurement concerning the context of the risk analysis framework for water infrastructure encroachment by informal settlements in the Makause Ward 91 in Primrose Township, Germiston, as well as for measures to verify data obtained from the questionnaires and to improve the questionnaire, where shortcomings, questions not fully adapted to the study area, and minor spelling errors related to this study were identified during the experimental exposure with both samples. The parameters considered in this study were aligned with a comprehensive framework for risk analysis and management. These included the natural environment, vulnerabilities, causes and causalities, organisational planning, infrastructure protection, community welfare and support, resource security, awareness and education, communication strategies, and resettlement planning. This holistic approach enabled the identification and assessment of multiple risk factors contributing to alienation and supported the development of targeted mitigation and management strategies related to informal settlements. Both random and purposive sampling was implemented in this study.

7.1. Data Collection Methods

Both qualitative and quantitative methods are included in the technique [26]. Qualitative methods, consisting of field observations, interviews, and informal conversations, were used to collect primary data, while questionnaires and interviews were developed and used to obtain quantitative data. In this study, an interview guide was drawn up to ensure the collection of the required data concerning all 105 residents who were subjected to the interview process. Refs. [27,28] referred to questionnaires as “a survey instrument with questions that the subjects are meant to answer”. A list of questions was developed and compiled into a questionnaire in accordance with the objectives of this study and the input parameters. The input parameters and questionnaire were coded and divided into four sections based on the study. The questionnaire used a three-point Likert scale with 1 = “Yes”, 2 = “No”, and 3= “Unsure”. The answers were added together to obtain a score for the measures to be evaluated. The questionnaires were distributed to 105 local residents to gather information about the living conditions, demographics, health, and the infrastructural needs.

7.2. Ethics and Community Positioning

This study followed strict ethical protocols to protect the rights and privacy of participants, especially given the sensitivity of working with vulnerable communities in the informal settlement areas of Ekurhuleni City. Before commencing this study, the authors obtained ethical approval from the ethics committee of the Management University of Southern Africa (MANCOSA) by the institutional and national guidelines of the South African Ministry of Higher Education and Training. Informed consent was also obtained using a consent form that was presented to all participants. The form explained the purpose and scientific nature of this study, emphasised the voluntary nature of participation, and guaranteed the anonymity and confidentiality of personal data. These measures were implemented through the provisions of the South African Protection of Personal Information (POPI) Act. In addition, community involvement was an integral part of the research design, ensuring that local perspectives were respected and that participation was conducted in a culturally sensitive and ethically responsible manner.

7.3. Sampling Procedure and Possible Biases

The sample of 105 households was randomly drawn based on the willingness and availability of residents to participate during the data collection period in Ward 91. Whilst this method provided convenient access to respondents, it may also introduce a selection bias, as households with some level of familiarity or interest in infrastructure issues may have been more willing to participate. Therefore, the results should be interpreted with caution and may not represent the entire population of informal settlements.

7.4. Pilot Testing and the Validation of the Scale

Before the main survey, a pretest of the questionnaire and interview schedule was conducted to ensure the clarity and validity of the three-point Likert scale used to measure residents’ perceptions. The pretest involved 15 randomly selected residents of the Makause informal settlement. This pretest was used to assess the content validity (confirming that all relevant topics were covered), internal validity (ensuring that the questions were in line with the research objectives), and external validity (assessing the potential for generalisation). Thanks to the feedback from the pilot test, the questionnaire was refined to remove ambiguities and increase the reliability of the data collection instruments for the main study.

8. Validity and Reliability

The phrase validity pertains to the accuracy and truthfulness with which results accurately represent the facts, as well as the authenticity of the methods employed. In contrast, reliability refers to the consistency and repeatability of the analytical techniques utilised [29]. The researcher ensured the validity and reliability of the research instruments used in this study by making sure that a pretest was carried out among a selected number of participants to determine the accuracy of the research question finally used to collect the data implemented in this study.

9. Data Analysis

According to [30], data analysis is referred to as the way of giving meaning or sense to the opinions and perceptions of research participants about certain situations, corresponding patterns, themes, categories, and regular similarities. The data gathered by the investigator, whether from the field notes, the site documents, or interview recordings, did not yet comprise all the data but were regarded once subjected to some formal processes of analysis [31]. In this study, all the qualitative data collected were analysed by using a thematic analysis, while quantitative data were analysed by using descriptive statistics. Ref. [32] asserted that the thematic content analysis entailed data familiarisation, coding, and the subsequent grouping of the data into categories that informed the study themes as the findings.

10. Findings and Discussion

The statistical analysis approach described in this article is taken directly from the dissertation of [33] for the Doctor of Business Administration (DBA) submitted to MANCOSA for examination.

10.1. The Analysis of the Questionnaires on the Encroachment Risk for Residents

A representative population sample of one hundred and five (105) residents was analysed in this study according to the input targets from the designed questionnaires on encroachment risk and was explored using descriptive statistics (mean, median, standard deviation, and skewness), statistical tests (one-way ANOVA and ReliefF tests), and a mathematical model (Artificial Neural Network). The critical input parameters from the designed questionnaires on encroachment risk were Yes (Yes), No (No), and Unsure (Unsure). The analytical results and descriptive inferences from the dataset were displayed in a 2D plot with reference to the specific questions in the questionnaires on encroachment risk. The mean, median, standard deviation, and skewness were determined from the results of the critical inputs based on the actual number of outcomes (Yes, No, and Unsure) to the twenty questions found on the survey questionnaires. The one-way analysis of variance (one-way ANOVA) and the ReliefF tests were used to perform the statistical tests. The one-way ANOVA test was used to verify the presence of any mean significant difference among the three groups of the critical input parameters. The ReliefF test was used to rank the input targets based on their important contribution to the desired outputs (encroachment risk). Lastly, the Artificial Neural Network model was exploited to train both the inputs and the desired output dataset.

10.2. Encroachment Risk

The analysis dealt with a depiction based on a 2D graph with matrix plots, representing the specific number of critical input outcomes (Yes, No, Unsure) associated with each of the questions in the survey. The questions on the encroachment risk were arranged such that 1–4 were on assessment, 5–9 were on preparedness, 10–13 were on mitigation, and 14–20 were on response. Figure 1 shows the 2D graph for the inputs and the outputs, whereby the inputs are represented on the x-axis while the outputs are represented on the y-axis. The plots for Yes (Yes) and the outputs are represented by black star markers. The plots for No (No) and the outputs are represented by red diamond markers. The plots for Unsure (Unsure) and the outputs are represented by blue right triangle markers. The distribution of the inputs with Yes (Yes), No (No), and Unsure (Unsure) resulted in means of 53.75, 31.45, and 14.70, respectively. The distribution of the inputs with Yes (Yes), No (No), and Unsure (Unsure) resulted in medians of 52.0, 31.0, and 7.0, respectively. The distribution of the inputs with Yes (Yes), No (No), and Unsure (Unsure) resulted in standard deviations of 20.41, 16.55, and 15.64, respectively. The distribution of the inputs with Yes (Yes), No (No), and Unsure (Unsure) showed a skewness of 0.03307, 0.1604, and 1.3278, respectively. The “Unsure” distribution yielded the lowest mean, median, and standard deviation but a maximum positive skewness, while the “Yes” distribution produced the highest mean, median, and standard deviation alongside the lowest skewness. It is of great importance that the skewness for the inputs (No and Yes) was positive. Furthermore, the three critical groups are fully quantified based on descriptive statistics [34].

10.3. The One-Way ANOVA Test for the Inputs on Encroachment Risk

According to Kent State University (2024), a one-way ANOVA is a parametric test. The one-way ANOVA test was used to verify if any mean significant difference existed among the three groups of inputs (Yes, No, and Unsure) [35]. Table 1 shows the ANOVA table for the distribution used to identify the existence of any mean significant difference [36].

Table 1 shows that the sum of squares (SS) and the degree of freedom (df) between the three groups (columns) was 15,240.4 and 2, respectively. The ratio of the sum of squares and the degree of freedom is the mean square (MS), and between the groups it is 7620.2, while within the groups it is 311.67. The SS and df within the groups (error) was 17,765.3 and 57, respectively. The total for between and within the groups based on the SS and df was 33,005.7 and 59, respectively. The F-statistics is the ratio of the MS between the groups and the MS within the groups, and it was noted to be 24.45. The probability of the F-statistics (Prob > F) is equal to the p-value, and it was recorded as 2.15 × 10⁻⁸. It is critical to note that if the p-value is less than 0.05, then there exists a mean significant difference among the groups; however, if the p-value is greater than 0.05, there is no mean significant difference among the groups [37]. Therefore, the p-value of 2.15 × 10⁻⁸, which was very small as opposed to the threshold p-value of 0.05, suggested that there exists a mean significant difference among the groups. Furthermore, Figure 2 shows the ANOVA plots for the three groups as inputs with the black horizontal lines on each of the ANOVA plots, representing the lower and upper inter quartile values of the groups (Yes, No, and Unsure). The red horizontal lines on the ANOVA plots represent the medians for Yes, No, and Unsure and were 52, 31, and 7, respectively. Each of the three groups demonstrated a normal distribution with negligible outliers. The p-value among the groups was 2.15 × 10⁻⁸, and, hence, there existed a mean significant difference among the groups.

10.4. Multiple Comparisons Among Groups

Figure 3 shows the simulation plots derived from the three groups by applying the multiple comparison test. The multiple comparison test was used to demonstrate if there exists a significant mean difference between any two groups among the three groups under investigation [38]. Each of the horizontal lines represent the upper and lower interquartile range, while the markers in the middle of the horizontal lines represent the mean for each group. The two vertical dashed lines represent both the limits of the lower and upper interquartile range for any specified group. The application of the multiple comparison method led to the generation of a set of pairs of groups (corresponding horizontal lines), which demonstrated that a mean significant difference occurred between the pairs (by checking that both groups do not overlap). Alternatively, if the two groups overlap, there is no mean significant difference between the groups. Therefore, regarding the Yes group, there existed a mean significant difference between Yes and No, plus Yes and Unsure. Considering the No group, there was a mean significant difference between No and Yes and between No and Unsure. In relation to the Unsure group, there was a mean significance difference between Unsure and Yes and between Unsure and No. Therefore, none of the groups showed any overlap when a set of pairs was compared. Hence, there was a mean significant difference among each pair of the three groups.

10.5. Relief Tests Used in the Rankings of the Inputs

The ReliefF test is a statistical technique that can be employed to arrange the predictors based on their weight of importance to the desired outputs (targets) while applying a parametric regression method [39]. The ReliefF algorithm is used in MATLAB R2020b by utilising the command [Rank Weight] = ReliefF (x, y, k), which responded by the ordering, the ranking of the inputs, and their respective weights of importance to the desired outputs. The quantities x and y represent the inputs and output, respectively, while k is the number of inputs as groups (x) influencing the output (y) and by default is attributed to the integer number (10). The weighting is between −1 and 1, and if a negative weight is ascribed to the input parameters, it describes the input parameters to be secondary factors, but a positive weight is associated with the input parameters, which explains that the factors were primary [40]. The more positive the weight values are provides more confirmation that the parameters were significantly contributing to the desired output.

Figure 4 shows a bar chart of the critical input parameters (Yes, No, and Unsure) and the weights of importance to the desired output (corresponding survey questions). It was determined by the ranking of the inputs in a descending order that “Unsure” contributed the most, while the smallest contribution was given by “yes”. The respective weights of importance in descending magnitude were 0.0337, 0.0057, and 0.0024, for Unsure, No, and Yes, respectively.

Furthermore, all the predictors (inputs) were primary factors.

10.6. Training Neural Network Using Input and Output Datasets

The processed data for the inputs (Yes, No, Unsure, and Set constant) and the one output (survey questions) composed of 20 sample datasets were imported into the input and target ports in the select data wizard of the neural network in the MATLAB R2020b neural network toolbox. The sample datasets were divided randomly in the validation and test data wizard such that 50% of the sample was used for training, 25% of the sample was used for validation, and the remaining 25% was used for testing. The hidden neural display box was set at 10 on the network architecture wizard. The Levenberg–Marquardt algorithm was chosen to perform the network training.

Table 2 shows the mean square error (MSE) and the correlation coefficient (R) for the training.

The trained neural network yielded a very good prediction, as shown in Table 2, with an MSE and R of the dataset for the modelled output that mimicked the actual output for the trained, the validated, and the tested samples, since both MSE and R values were within acceptable values. The MATLAB R2020b calculation for the neural network of the output showed the epoch as 8, the performance as 1.082, and the validation check was at epoch 6. Figure 5 shows the performance, which represented the graph of the MSE and epoch for the trained, the validated, and the tested dataset. It shows that the best validation performance is 1.082 at epoch 6.

Figure 6 shows the regression plots, representing the graph of the modelled outputs and the actual outputs (targets) for the trained, the validated, and the tested dataset of the survey questionnaires. It shows that the R for the training was 0.999, and the R for the validation was 0.990, while the R for the testing was 0.965.

Both the best fit and correlation lines fitted the data plots of the training, the validation, and the testing data, respectively. Therefore, the trained neural network generated a very good prediction.

10.7. The ANOVA to Confirm the Accuracy of the Artificial Neural Network Model

It is very imperative to affirm that the ANOVA refers to the gathering of statistical models utilised to analyse the differences in means between the groups and the related procedures, which included the “variation” between the groups [41]. Reference [42]’s colleagues indicated that the accuracy of the models improved when the training data were increased. Table 3 shows the ANOVA table for the actual and the modelled outputs based on the total sample dataset using the Artificial Neural Network algorithm.

In Table 3, the sum of squares (SS) for the columns and the error was 0.09 and 1366.53, and the total was 1366.62. The degree of freedom (df) for the columns and the error was 1 and 38, respectively, and the total was 39. The mean square (MS) is the ratio of the SS and the df concerning the columns or error. The MS for the columns was 0.0876 and 35.9613 for the error. The F-statistic is the ratio of the MS of the columns and the MS of the error, and it was equal to 0.0. The probability of the F-statistic is equal to the p-value, and it was equal to 0.9609. The p-value was greater than the threshold value (0.05), and, therefore, no significant difference existed between the actual and the modelled outputs [40].

In addition, the ANOVA plots for the actual and the modelled outputs are shown in Figure 7. The distributions for both the actual and the modelled outputs are normally distributed and without outliers. The red horizontal lines on the ANOVA plots represent the medians. The medians for the actual and the modelled outputs were 10.50 and 9.90, respectively, while the p-value was 0.9609. The larger p-value when compared to the threshold value (0.05) implies that no mean significant difference existed between the actual and the modelled outputs.

11. Policy and Practical Relevance

The Artificial Neural Network results presented in this study are promising and demonstrate substantial potential for practical application in municipal governance, urban planning, and community-based risk management. Importantly, the model is not limited to the Makause informal settlement but can be deployed in other informal settlements to predict patterns of encroachment, identify infrastructure stress points, and inform proactive interventions.

For a broader application, municipalities and planners could do as follows:

Utilise machine learning for the predictive modelling of informal settlement growth and the identification of water infrastructure vulnerabilities, enabling the data-driven prioritisation of infrastructure reinforcement and maintenance.

Integrate Internet of Things (IoT) sensors for the real-time monitoring of critical water pipelines and supply nodes, creating a dynamic feedback loop that enhances the predictive power of Artificial Neural Network models and supports timely decision-making.

From a policy perspective, the integration of Artificial Neural Network (ANN) models into municipal planning frameworks is supported by recent research. For example, Ref. [43] demonstrates that Artificial Neural Networks provide robust predictive capabilities for infrastructure management, facilitating early interventions and optimised resource allocation. Similarly, Ref. [3] highlights that data-driven tools such as Artificial Neural Networks enhance the municipal capacity to address complex urban challenges, including informal settlement pressures.

Moreover, aligning infrastructure protection with inclusive urban policies is critical. Ref. [44] advocates for zoning regulations and proactive land use planning to address informal settlement encroachment while ensuring access to legal and affordable housing. This approach is reinforced by [45], which underscores the need to balance urban expansion with the safeguarding of essential services.

In practical terms, the Artificial Neural Network model functions as a predictive tool for municipal authorities to anticipate and prevent encroachment. For instance, by forecasting high-risk areas where informal settlements are likely to emerge within a three- to six-month timeframe, municipalities can implement targeted risk communication, strengthen bylaw enforcement, and plan infrastructural reinforcements accordingly. This predictive capability enhances resilience planning and supports evidence-based policy decisions that reconcile housing demands with critical infrastructure protection.

12. Discussion and Conclusions

In conclusion, the Artificial Neural Network (ANN) models, developed using the Levenberg–Marquardt back propagation algorithm, demonstrated strong predictive power with R values exceeding 0.95 for training, validation, and testing datasets. The ReliefF test revealed that all the three predictors (Yes, No, and Unsure) were primary factors, while the one-way ANOVA test confirmed that no mean significant difference existed among the three groups. The high level of accuracy in the use of Artificial Neural Network models as tools for predicting encroachment risks provides adequate supports for the utilisation of the Artificial Neural Network tool in the development of a strategic framework for addressing these encroachment risks. Ultimately, this study suggests that while residents may have a more pronounced perception of encroachment risks, factors such as awareness and access to resources are pivotal in preparing and mitigating these risks. The predictive models offer a solid foundation for future interventions and policy development aimed at protecting water infrastructure from the encroachment of informal settlements in Ekurhuleni.

Summary:

Study Objectives:

Research Objective 1:

To design and train an Artificial Neural Network (ANN) model capable of predicting residents’ understanding of encroachment risks to water infrastructure. This study successfully demonstrated the feasibility and relevance of using Artificial Neural Networks to develop predictive models that accurately identify high-risk areas, thereby supporting more informed risk management and community engagement strategies.

Research Objective 2:

To evaluate the predictive accuracy and overall performance of the Artificial Neural Network model in classifying residents’ levels of understanding regarding encroachment risks. The results confirmed that Artificial Neural Networks can effectively assess and classify resident knowledge levels, highlighting the potential of technology-assisted solutions to support education and training initiatives within informal settlements. This technological approach enhances municipalities’ ability to implement proactive interventions and respond swiftly to emerging vulnerabilities.

Research Objective 3:

To interpret the relative significance of input variables influencing residents’ awareness and perceptions of risk, using advanced Artificial Neural Network techniques. The findings underscore the value of community awareness programmes and educational interventions, echoing insights from [46], who emphasise that raising public awareness fosters a sense of community responsibility and strengthens collective efforts to safeguard critical water resources.

Research Objective 4:

To generate actionable, data-driven insights for policymakers and urban planners aimed at developing targeted awareness campaigns and protective measures for water infrastructure in informal settlements. The successful integration of Artificial Neural Network models into municipal planning frameworks, as supported by [21], demonstrates robust predictive capabilities that facilitate early interventions, optimise resource allocation, and strengthen infrastructure resilience.

Study limitations.

This study focused on Ward 91 of the City of Ekurhuleni to investigate issues related to water infrastructure and informal settlement encroachment. The participant group was predominantly female, which may limit the generalisability of gender-related insights. Furthermore, the unique conditions of Ward 91 mean that the findings may not fully represent other informal settlements with differing socio-economic contexts and service levels. While the findings provide valuable insights into the risk of encroachment in Makause, they may not be generalisable to other informal settlements or contexts. Therefore, future studies should aim to collect larger and more diverse datasets in multiple settlements to validate and refine these results. Furthermore, the systematic optimisation of hyperparameters and the control of overfitting, such as cross-validation and early truncation, are recommended for future research to improve the robustness of the model and ensure the better generalisability of the prediction results.

13. Recommendations

Future studies are crucial in the application of machine learning and artificial intelligence in the following:

13.1. Predicting Encroachment

The application of machine learning (ML) models, such as convolutional neural networks (CNNs) with satellite imagery or drone footage, to identify land use changes over time and predict where future encroachments are likely to occur based on the proximity to infrastructure, historical encroachment patterns, and socio-economic indicators. The integration of satellite imagery and convolutional neural networks (CNNs) to complement the survey-based data and improve the spatial detection of encroachment risks.

13.2. Mitigating Infrastructure Vandalism

The utilisation of predictive analytics and pattern recognition to detect early warning signs of vandalism based on previous incidents, using natural language processing (NLP) to scan social media or local chatter for potential unrest and generate heatmaps of high-risk zones using past vandalism data and crime statistics. Replications at several informal settlement sites to test the generalisability of the Artificial Neural Network model in different socio-economic and infrastructural contexts.

13.3. Education and Training of Local Communities

Employment of machine learning in the field of:

Artificial intelligence, such as Gamified apps and simulations, to educate residents on how encroachment and vandalism impact services.

Data-sharing whereby residents can provide credible data through incentive programmes, possibly through rewards or municipal recognition. Ref. [46] emphasise that engaging communities through targeted education campaigns significantly reduces informal settlement encroachments on critical infrastructure. Encouraging collaboration and engagement among stakeholders resonates with [21], such as community-based adaptations to collective action and shared governance, which underscores that stakeholder engagement fosters sustainable solutions to infrastructure management challenges. Participatory machine learning approaches that directly involve local communities in data collection, model refinement, training, education, and validation to improve accuracy and increase confidence in prediction systems.

Practical implications:

For communities, predictive insights derived from the model can help identify areas where residents’ understanding of the risk of encroachment is low or unsafe (“Unsafe” category). In this way, targeted awareness-raising and education campaigns can be carried out, and limited resources can be deployed where they can most effectively contribute to the protection of water infrastructure.

Conclusion:

This study demonstrates that low-cost, community-driven machine learning pipelines can play a transformative role in strengthening infrastructure resilience in resource-constrained urban environments. By combining predictive modelling and local engagement, municipalities such as the City of Ekurhuleni can proactively manage encroachment risks, ensure sustainable service delivery, and empower communities to become active stewards of vital water resources.

Author Contributions

Conceptualisation: M.N.N. and D.D.; Methodology: M.N.N.; Software: S.L.T.; Validation: M.N.N., D.D. and S.L.T.; Formal analysis: S.L.T.; Investigation: M.N.N.; Resources: M.N.N. and D.D.; Data curation: M.N.N.; Writing—preparation of original draft: M.N.N.; Writing—review and editing: M.N.N., D.D. and S.L.T.; Visualisation: M.N.N.; Supervision: D.D.; Project administration: M.N.N.; Acquisition of funding: D.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the Protection of Personal Information Act, 2013 (POPIA) of South Africa.

Acknowledgments

The following institutions are acknowledged for their moral and material supports that serve as enablers for this study: City of Ekurhuleni, Rand Water, Makause Informal Settlement Residents, Management College of Southern Africa, and the Central University of Technology.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Detailed Mathematical Formulas

This appendix presents the full mathematical expressions used in this study’s sampling, statistical analyses, feature selection algorithm, and neural network model.

Appendix A.1. Sample Size Determination

The sample size was determined according to the formula of [24] and [25], which is used for unknown population sizes and is represented by Equation (A1).

n = \frac{Z^{2} P (1 - P)}{d^{2}}

(A1)

where n = the sample size, Z = statistics for the level of confidence, P = the expected prevalence or proportion (in the proportion of one; of 50%, P = 0.5), and d = precision (in the proportion of one; if 6%, d = 0.06). Z statistics (Z): for the level of confidence of 95%, which is conventional.

Equation (A1) is the simplified equation that is derived from Equation (A2).

S a m p l e s i z e = \frac{\frac{Z^{2} \times P (1 - P)}{e^{2}}}{1 + (\frac{Z^{2} \times P (1 - P)}{{N e}^{2}})}

(A2)

where N = the population size, Z = the Z-score, e = the margin of error, and P = the standard of deviation

Appendix A.2. Correlation and Determination Coefficients

The correlation coefficient is determined by the derived formula given in Equation (A3).

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}}

(A3)

where

r

= the correlation coefficient,

x_{i}

= values of the x-variable in a sample,

\bar{x}

= the mean of the values of the x-variable,

y_{i}

= values of the y-variable in a sample, and

\bar{y}

= the mean of the values of the y-variable.

The determination coefficient is determined by the derived formula shown in Equation (A4).

r^{2} = {(\frac{n \sum x y - \sum x \sum y}{(\sqrt{n \sum x^{2} - {(\sum x)}^{2}}) (\sqrt{n \sum y^{2} - {(\sum y)}^{2}})})}^{2}

(A4)

where

r^{2}

= the determination coefficient,

x

= values of the x-variable,

y

= values of the y-variable, and

n

= the number of observations.

Appendix A.3. Model Accuracy: Root Mean Square Error (RMSE)

The root mean square error is obtained using Equation (A5).

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{({\hat{y}}_{i} - y_{i})}^{2}}{n}}

(A5)

where

R M S E

= the root mean square error,

{\hat{y}}_{i}

= predicted values,

y_{i}

= observed values, and

n

= the number of observations.

Appendix A.4. ReliefF Algorithm Equations

Based on ReliefF resources in MATLAB R2020b:

➢: If $x_{r}$ and $x_{q}$ are in the same class, then Equation (A6) is applied

$W_{J}^{i} = W_{j}^{i - 1} - \frac{∆_{j} (x_{r}, x_{q})}{m} . d_{r q}$

(A6)
➢: If $x_{r}$ and $x_{q}$ are in different classes, then Equation (A7) is employed

W_{J}^{i} = W_{j}^{i - 1} + \frac{P y_{q}}{1 - P y_{r}} . \frac{∆_{j} (x_{r}, x_{q})}{m} . d_{r q}

(A7)

where

W_{j}^{i}

is the weight of the predictor

F_{j}

at the

i

th iteration step.

Where

P y_{r}

is the prior probability of the class to which

x_{r}

belongs, and

P y_{q}

is the prior probability of the class to which

x_{q}

belongs.

Where

m

is the number of iterations specified by “updates”.

Where

∆_{j} (x_{r}, x_{q})

is the difference in the value of the predictor

F_{j}

between observations

x_{r}

and

x_{q}

. Let

x_{r j}

denote the value of the

j

th predictor for observation

x_{r}

, and let

x_{q j}

denote the value of the

j

th predictor for observation

x_{q}

.

The ReliefF algorithm can be evaluated based on the set of equations:

➢: For the discrete $F_{j}$ , the equation is expressed as in Equation (A8)

$∆_{j} (x_{r,} x_{q}) = \{\begin{matrix} 0, x_{r j} = x_{q j} \\ 1, x_{r j} = x_{q j} \end{matrix}$

(A8)
➢: For the continuous $F_{j},$ the equation is expressed as in Equation (A9)

$∆_{j} (x_{r,} x_{q}) = \frac{|x_{r j} - x_{q j}|}{\max (F_{j}) - \min (F_{j})}$

(A9)

Given that

d_{r q}

is a distance function, then the equation can be expressed as shown in Equation (A10)

d_{r q} = \frac{{\tilde{d}}_{r q}}{\sum_{l = 1}^{k} {\tilde{d}}_{r l}}

(A10)

The distance is subject to the scaling as shown in Equation (A11)

a {\tilde{d}}_{r q} = e^{- {(r a n k (r . q) / s i g m a)}^{2}}

(A11)

where

r a n k (r, q)

is the position of the

q

th observation among the nearest neighbours of the

r

th observation, sorted by distance.

k

is the number of nearest neighbours, specified by k. It is possible to change the scaling by specifying the “

s i g m a

”.

Appendix A.5. Statistical Significance (t-Test for Correlation Coefficient)

The p-value based on a t-statistic is given by Equation (A12)

t = r \sqrt{\frac{(n - 2)}{(1 - r^{2})}}

(A12)

which has n-2 degrees of freedom.

The formula can be applied with the sampling distribution of r both when the null hypothesis is r = 0 and when testing for nonzero values of r.

Appendix A.6. Neural Network Transfer Function

The transfer function adopted for the neurons is a logistic sigmoid function

f (t_{i})

, which used the Levenberg–Marquardt back propagation algorithm as shown in Equations (A13) and (A14).

f (Z_{i}) = \frac{1}{1 + e^{- Z_{i}}}

(A13)

Z_{i} = \sum_{j}^{n} W_{i j} X_{j} + β_{i}

(A14)

where

Z_{i}

is the weighted sum of the inputs,

X_{j}

is the incoming signal from the jth neuron of the input layer,

W_{i j}

is the weight on the connection from neuron j to neuron i at the hidden layer,

β_{i}

is the bias of the neuron, and n is the total number of input parameters.

References

United Nations Human Settlements Programme (UN-Habitat). World Cities Report 2022: Envisaging the Future of Cities; UN-Habitat: Nairobi, Kenya, 2022; Available online: https://unhabitat.org/wcr (accessed on 12 September 2022).
Mackay, B.R.; Shaker, R.R. A megacities review: Comparing indicator-based evaluations of sustainable development and urban resilience. Sustainability 2024, 16, 8076. [Google Scholar] [CrossRef]
Thakur, R.; Onwubu, S.C. Household waste management behaviour amongst residents in an informal settlement in Durban, South Africa. J. Environ. Manag. 2024, 349, 119521. [Google Scholar] [CrossRef]
Mzinyane, B.; Ajodhia, S.; Gumbi, S.N.; Khumalo, G.; Nduli, S.B.; Mfishi, Z.; Funeka, N. An autoethnography of disaster response work with low-income communities in KZN: Implications for Afrocentric social work. J. Soc. Dev. Afr. 2024, 39, 42–67. [Google Scholar] [CrossRef]
Burstall, M. Bulk Water Pipelines, 1st ed.; Thomas Telford: Oxford, UK, 1997. [Google Scholar]
American Water Works Association. Advanced Hydraulic Modeling and Master Planning: A Water Distribution System Approach, 2nd ed.; American Water Works Association: Denver, CO, USA, 2012. [Google Scholar]
Gleick, P.H. Water use. Environ. Resour. 2003, 28, 275–314. [Google Scholar] [CrossRef]
Shaker, R.R.; Rybarczyk, G.; Brown, C.; Papp, V.; Alkins, S. (Re)emphasizing urban infrastructure resilience via scoping review and content analysis. Urban Sci. 2019, 3, 44. [Google Scholar] [CrossRef]
Shaker, R.R.; Aversa, J.; Papp, V.; Serre, B.M.; Mackay, B.R. Showcasing relationships between neighborhood design and wellbeing: Toronto indicators. Sustainability 2020, 12, 997. [Google Scholar] [CrossRef]
Andreou, S.A. Maintenance decisions for deteriorating water pipelines. J. Pipelines 1987, 7, 21–31. [Google Scholar]
Pandey, M.; Senapati, S.; Bhunia, G.S. Chapter 27—GIS-based modelling for water resource monitoring and management: A critical review. Dev. Environ. Sci. 2024, 16, 621–636. [Google Scholar] [CrossRef]
Shui, C.C. Enhancing the EPANET hydraulic model through genetic algorithm optimization of pipe roughness coefficients. Water Resour. Manag. 2024, 38, 323–341. [Google Scholar] [CrossRef]
Ozegin, K.O.; Ilugbo, S.O.; Alile, O.M.; Iluore, K. Integrating in-situ data and spatial decision support systems (SDSS) to identify groundwater potential sites in the Esan plateau, Nigeria. Groundw. Sustain. Dev. 2024, 26, 101276. [Google Scholar] [CrossRef]
Kumari, N.; Dhiman, R.; Krishnankutty, M.; Kalbar, P. Localising vulnerability assessment to urban floods: A comparative analysis of top-down and bottom-up geospatial approaches in Patna City, India. Int. J. Disaster Risk Reduct. 2024, 100, 10423. [Google Scholar] [CrossRef]
Mavin, T. Predicting the Failure Performance of Individual Water Mains; Urban Water Research Association of Australia: Melbourne, Australia, 1996. [Google Scholar]
Fu, G.; Jin, Y.; Sun, S.; Yuan, Z.; Butler, D. The role of deep learning in urban water management: A critical review. Water Res. 2022, 223, 119090. [Google Scholar] [CrossRef]
Joseph, K.; Shetty, J.; Sharma, A.K.; van Staden, R.; Wasantha, P.L.P.; Small, S.; Bennett, N. Leak and burst detection in water distribution network using logic- and machine-learning-based approaches. Water 2024, 16, 1935. [Google Scholar] [CrossRef]
Dawood, T.; Elwakil, E.; Novoa, H.M.; Gárate Delgado, J.F. Water pipe failure prediction and risk models: State-of-the-art review. Can. J. Civ. Eng. 2019, 47, 1117–1127. [Google Scholar] [CrossRef]
Shi, L.; Zhang, J.; Yu, X.; Fu, D.; Zhao, W. Artificial neural network-based water distribution scheme in real-time in long-distance water supply systems. Water Infrastruct. Ecosyst. Soc. 2024, 73, 1611–1620. [Google Scholar] [CrossRef]
Dawadi, S. Impact of the Secondary Education Examination (English) on Students and Parents in Nepal; The Open University: Kent, UK, 2019. [Google Scholar]
Fox, A.; Ziervogel, G.; Scheba, S. Strengthening community-based adaptation for urban transformation: Managing flood risk in informal settlements in Cape Town. Int. J. Justice Sustain. 2023, 28, 837–851. [Google Scholar] [CrossRef]
Creswell, J.W.; Plano Clark, V.L. Designing and Conducting Mixed Methods Research, 3rd ed.; Sage: Thousand Oaks, CA, USA, 2018. [Google Scholar]
Denzin, N.K. The Research Act: A Theoretical Introduction to Sociological Methods, 2nd ed.; McGraw-Hill: New York, NY, USA, 1978. [Google Scholar]
Daniel, W.W. Biostatistics: A Foundation for Analysis in the Health Sciences, 7th ed.; Wiley: New York, NY, USA, 1999. [Google Scholar]
Cochran, W.G. Sampling Techniques, 3rd ed.; John Wiley & Sons: New York, NY, USA, 1977. [Google Scholar]
Esen, E.; Inalhan, G.; Atrek, B. A hybrid research method integrating qualitative and quantitative techniques for airport terminal evaluation. J. Air Transp. Manag. 2017, 61, 1–9. [Google Scholar] [CrossRef]
Ponto, J. Understanding and evaluating survey research. J. Adv. Pract. Oncol. 2015, 6, 168. [Google Scholar]
Taherdoost, H. Designing a questionnaire for a research paper: A comprehensive guide to design and develop an effective questionnaire. Asian J. Manag. Sci. 2022, 11, 8–16. [Google Scholar] [CrossRef]
Noble, H.; Smith, J. Issues of validity and reliability in qualitative research. Evid.-Based Nurs. 2015, 18, 34–35. [Google Scholar] [CrossRef]
Vosloo, J.J. A Sport Management Programme for Educator Training in Accordance with the Diverse Needs of South African Schools. Ph.D. Thesis, North-West University, Potchefstroom, South Africa, 2014. [Google Scholar]
De Vos, A.S.; Strydom, H.; Fouché, C.B.; Delport, C.S.L. Research at Grass Roots: For the Social Sciences and Human Services Professions, 4th ed; Van Schaik Publishers: Pretoria, South Africa, 2011. [Google Scholar]
Braun, V.; Clarke, V. Successful Qualitative Research: A Practical Guide for Beginners; SAGE Publications: Thousand Oaks, CA, USA, 2013. [Google Scholar]
Ndawo, M. Developing a Management Framework for Protecting Critical Water Infrastructure Against Informal Settlement Encroachment in the City of Ekurhuleni. Unpublished Ph.D. Thesis, MANCOSA, Durban, South Africa, 2025. [Google Scholar]
Cooksey, R.W.; Cooksey, R.W. Descriptive statistics for summarising data. In Illustrating Statistical Procedures: Finding Meaning in Quantitative Data; Springer: Singapore, 2020; pp. 61–139. [Google Scholar]
Kim, T.K. Understanding one-way ANOVA using conceptual figures. Korean J. Anesthesiol. 2017, 70, 22. [Google Scholar] [CrossRef]
Chen, B.; Sun, X.; Wei, Y.; Liu, J. Application of one-way ANOVA in educational research: A practical guide. J. Educ. Stat. Meas. 2022, 5, 45–58. [Google Scholar]
Lakens, D. The practical alternative to the p value is the correctly used p value. Perspect. Psychol. Sci. 2021, 16, 639–648. [Google Scholar] [CrossRef]
Lee, S.W. Methods for testing statistical differences between groups in medical research: Statistical standard and guideline of Life Cycle Committee. Life Cycle 2022, 2, e1. [Google Scholar] [CrossRef]
Petković, M.; Kocev, D.; Džeroski, S. Feature ranking for multi-target regression. Mach. Learn. 2020, 109, 1179–1204. [Google Scholar] [CrossRef]
Di Leo, G.; Sardanelli, F. Statistical significance: P value, 0.05 threshold, and applications to radiomics—Reasons for a conservative approach. Eur. Radiol. Exp. 2020, 4, 18. [Google Scholar] [CrossRef] [PubMed]
Purnama, I. Increasing understanding of one-way ANOVA material for accounting students: A case study of deposit interest. Reflect. Educ. Pedagog. Insights 2023, 1, 69–73. [Google Scholar]
Sri, P.; Kumar, A.; Reddy, V. Impact of training data volume on the accuracy of machine learning models: A comparative study. J. Artif. Intell. Res. Dev. 2023, 12, 55–68. [Google Scholar]
Koliousis, A.-s.B. Artificial intelligence and policy making; can small municipalities enable digital transformation? Int. J. Prod. Econ. 2024, 274, 109324. [Google Scholar] [CrossRef]
Abdel Aziz, K.M.; Daoud, A.O.; Singh, A.K.; Alhusban, M. Integrating digital mapping technologies in urban development: Advancing sustainable and resilient infrastructure for SDG 9 achievement—A systematic review. Alex. Eng. J. 2025, 116, 512–524. [Google Scholar] [CrossRef]
United Nations Human Settlements Programme (UN-Habitat). Urbanization and Development: Emerging Futures (World Cities Report 2016); UN-Habitat: Nairobi, Kenya, 2016; Available online: https://unhabitat.org/world-cities-report (accessed on 26 March 2024).
Chumo, I.; Kabaria, C.; Oduor, C.; Amondi, C.; Njeri, A.; Mberu, B. Community advisory committee as a facilitator of health and wellbeing: A qualitative study in informal settlement in Nairobi, Kenya. Front. Public Health 2023, 10, 47133. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Two-dimensional graphs for the inputs and the output distribution.

Figure 2. ANOVA plots for the three groups (input).The red "‡ " stands for an outlier, i.e. at least one data point is significantly above the upper whisker.

Figure 3. The multiple comparison test between the groups (inputs).Each of the horizontal lines represents the upper and lower interquartile range while the markers in the middle of the horizontal lines represent the mean for each group. The two dash vertical lines represent both the limits of the lower and upper interquartile range for any specified group.

Figure 4. Inputs by the weight of contribution using the relief test.

Figure 5. The performance of the training, the validation, and the testing output.

Figure 6. Regression plots for the training, the validation, and the testing using targeted modelling.

Figure 7. ANOVA plots for the actual and the modelled outputs.

Table 1. The ANOVA table for the three groups as inputs.

			ANOVA Table
Source	SS	df	MS	F	Prob > F
Columns	15,240.4	2	7620.2	24.45	2.15 × 10⁻⁸
Error	17,765.3	57	311.67
Total	33,005.7	59

Table 2. MSE and R of dataset for output.

	Samples	MSE	R
Training	10	5.59 × 10⁻⁸	0.999
Validation	5	1.082	0.990
Testing	5	3.09	0.965

R = Correlation coefficient, MSE = Mean square error.

Table 3. The ANOVA table for the actual and the modelled outputs.

Source	Sum of Square (SS)	Degree of Freedom (df)	Mean Square (MS)	F-Statistic	Prob > F
Columns	0.09	1	0.0876	0.0	0.9609
Error	1366.53	38	35.9613
Total	1366.62	39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ndawo, M.N.; Dzansi, D.; Tangwe, S.L. Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni. Urban Sci. 2025, 9, 294. https://doi.org/10.3390/urbansci9080294

AMA Style

Ndawo MN, Dzansi D, Tangwe SL. Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni. Urban Science. 2025; 9(8):294. https://doi.org/10.3390/urbansci9080294

Chicago/Turabian Style

Ndawo, Mpondomise Nkosinathi, Dennis Dzansi, and Stephen Loh Tangwe. 2025. "Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni" Urban Science 9, no. 8: 294. https://doi.org/10.3390/urbansci9080294

APA Style

Ndawo, M. N., Dzansi, D., & Tangwe, S. L. (2025). Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni. Urban Science, 9(8), 294. https://doi.org/10.3390/urbansci9080294

Article Menu

Analysis of Residents’ Understanding of Encroachment Risk to Water Infrastructure in Makause Informal Settlement in the City of Ekurhuleni

Abstract

1. Introduction

2. The Aim of the Study

3. Objectives

4. Research Question

5. Research Methodology

6. Equations and Models Used

7. Workplace Site Exposure Sample Determination

7.1. Data Collection Methods

7.2. Ethics and Community Positioning

7.3. Sampling Procedure and Possible Biases

7.4. Pilot Testing and the Validation of the Scale

8. Validity and Reliability

9. Data Analysis

10. Findings and Discussion

10.1. The Analysis of the Questionnaires on the Encroachment Risk for Residents

10.2. Encroachment Risk

10.3. The One-Way ANOVA Test for the Inputs on Encroachment Risk

10.4. Multiple Comparisons Among Groups

10.5. Relief Tests Used in the Rankings of the Inputs

10.6. Training Neural Network Using Input and Output Datasets

10.7. The ANOVA to Confirm the Accuracy of the Artificial Neural Network Model

11. Policy and Practical Relevance

12. Discussion and Conclusions

13. Recommendations

13.1. Predicting Encroachment

13.2. Mitigating Infrastructure Vandalism

13.3. Education and Training of Local Communities

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Detailed Mathematical Formulas

Appendix A.1. Sample Size Determination

Appendix A.2. Correlation and Determination Coefficients

Appendix A.3. Model Accuracy: Root Mean Square Error (RMSE)

Appendix A.4. ReliefF Algorithm Equations

Appendix A.5. Statistical Significance (t-Test for Correlation Coefficient)

Appendix A.6. Neural Network Transfer Function

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI