1. Introduction
Transit deserts are geographical areas where public transport services are inadequate to meet the mobility demands of residents, resulting in significant accessibility constraints and socio-economic disparities [
1,
2]. These areas typically emerge in low-income, suburban, or peri-urban regions, where transit infrastructure is either non-existent, infrequent, or poorly integrated with urban centers [
3]. The deficiency of public transport in such locations leads to limited mobility options, prolonged travel times, and increased reliance on private vehicles, disproportionately affecting vulnerable populations, including low-income households, older adults, and individuals with disabilities [
4,
5].
The implications of transit deserts extend beyond transport accessibility. Restricted access to reliable and affordable public transport considerably limits opportunities for employment, education, and healthcare, thereby perpetuating cycles of poverty and social exclusion [
6]. Individuals residing in transit deserts often face longer commuting times and elevated transportation costs, making it challenging to maintain stable employment. Likewise, students in these areas may encounter difficulties in reaching educational institutions, while residents experience diminished access to healthcare services, which may lead to adverse health outcomes due to postponed or neglected medical treatment.
Furthermore, transit deserts exacerbate spatial inequalities by reinforcing patterns of urban segregation and economic disparity. In many urban contexts, well-connected transit corridors tend to attract investment, increased property value, and economic growth, whereas transit-deficient areas frequently suffer from disinvestment, constrained job opportunities, and reduced quality of life. Consequently, residents of transit deserts often find themselves caught in a cycle of marginalization, wherein poor transit accessibility impedes socio-economic mobility, thereby exacerbating income and opportunity inequalities.
As urban populations continue to grow and mobility demands become ever more complex, there is an urgent need for innovative, data-driven solutions. In response, recent advancements in Intelligent Transport Systems (ITS) have integrated machine learning (ML) techniques, which have proven effective in analysing transportation patterns and optimising transit planning to address these multifaceted challenges. Intelligent Transport Systems (ITS) (
Figure 1) comprise advanced applications that enhance various modes of transport and traffic management. These systems enable the safer, more coordinated, and intelligent use of transportation networks [
3,
4].
Building on this foundation, various ML techniques, such as decision trees, logistic regression, and random forests, have been applied to transit accessibility studies, each offering distinct advantages. For instance, decision trees provide an interpretable framework for understanding hierarchical relationships among variables influencing transit demand, thus enabling planners to pinpoint critical determinants of transit accessibility. Logistic regression, known for its robust statistical inference capabilities, has been widely used to assess the likelihood of transit desert formation by examining socio-economic and infrastructural predictors [
7,
8]. In parallel, random forests combine multiple decision trees to enhance predictive accuracy and handle complex ITS datasets, effectively reducing overfitting risks [
9,
10]. Together, these methodologies have been instrumental in improving transit forecasting, optimizing route planning, and refining service allocation.
1.1. Decision Trees in ITS
To demonstrate the practical application of ML techniques in ITS, several studies have explored decision tree (DT) algorithms. For example, the C4.5 decision tree algorithm was used to develop an Intelligent traffic management system [
11], demonstrating its effectiveness in classifying transit data and enhancing traffic flow efficiency. Moreover, the relevance of DTs was highlighted in classifying transit deserts by emphasizing the influence of socioeconomic factors, land-use diversity, and transit accessibility on underserved regions [
12]. In addition, an earlier work showed that DT models could predict areas with inadequate transportation infrastructure [
13], thereby supporting data-driven urban development. Furthermore, studies have illustrated how DT-based classification models can estimate passenger status and optimize vehicle routing [
14,
15]. Together, these findings highlight the versatility of DTs in addressing transit accessibility gaps. Although DTs are widely employed in ITS for tasks such as spotting underserved transit areas, optimizing traffic flow, and predicting passenger demand, their interpretability is sometimes offset by a tendency to overfit, potentially limiting their generalizability in complex urban environments.
1.2. Integrating Logistic Regression in ITS
Another key ML approach, logistic regression (LR), plays a critical role in ITS, particularly in binary classification challenges such as congestion prediction, transit demand estimation, and travel mode choice modeling. LR has proven effective in evaluating transit accessibility by identifying regions with insufficient public transportation. For example, LR was deployed to predict transit mode choices [
9], thereby highlighting areas with inadequate public transport services. Similarly, LR was used to model the probability of individuals selecting public transit based on socioeconomic and travel factors, which helped detect transit deserts [
10]. Further extending this approach, comprehensive transit service gap analysis was conducted using LR to examine the supply–demand relationship in public transport systems [
16]. Additional applications, such as train delay predictions [
17] and exploring the impact of real-time transit information on commuter behavior [
18], further emphasize the versatility of LR. By incorporating socioeconomic and infrastructural factors, LR not only evaluates transit gaps but also predicts regions at risk of becoming transit deserts, thereby informing policy interventions aimed at mitigating transit inequities.
1.3. Employing Random Forests for Enhanced Predictive Accuracy
In addition to DT and LR, random forests (RF) have emerged as a powerful ML tool within ITS due to their high predictive accuracy. RF models are particularly efficient at processing large-scale and complex datasets, making them ideal for transit accessibility analysis. Empirical studies have confirmed that RF models are highly effective in classifying transit deserts based on demographic and spatial data [
12]. Other applications include passenger demand predictions [
19], real-time congestion forecasting [
20], and the identification of commuter trends using smartcard data [
21]. Additionally, a study has shown that RF models can classify regions with high ITS potential [
22], hence guiding more informed transit planning strategies. The ability of RF to integrate diverse datasets and capture intricate patterns in transit accessibility makes it a robust tool for developing adaptive, real-time ITS solutions. However, the current research also indicates limitations, such as a predominant focus on individual ML models rather than a comparative, multi-model approach that could yield more comprehensive insights.
Given these challenges, Lucknow, the capital of Uttar Pradesh, serves as an ideal case study for applying ML techniques to identify and address transit deserts within an ITS framework. As one of the fastest-growing metropolitan areas of India, Lucknow faces rapid urban expansion and increasing population density, which place significant demands on its public transportation system. Despite ongoing urban development initiatives, such as metro expansion and proposed bus rapid transit (BRT) projects, the existing infrastructure struggles to meet the mobility needs of its diverse socio-economic groups, leading to apparent spatial inequities. These challenges are further compounded by the uneven distribution of transit services, inefficient routing, and infrequent transit trips, all of which contribute to the emergence of transit deserts.
The selection of Lucknow is justified by several factors that align with the objectives of ITS-driven urban mobility improvements. First, the urban heterogeneity of the city, with its mix of high-density commercial hubs, low-income residential areas, and rapidly urbanizing peri-urban zones, offers a unique opportunity to examine how transit accessibility varies across different spatial and economic landscapes. Second, ongoing infrastructure initiatives provide an ideal context for assessing the effectiveness of ML-driven ITS applications in transit planning. Evaluating transit deserts can contribute to evidence-based policy formulation. Utilizing a 100 × 100-m spatial grid of 34,969 areal units (each measuring 100 × 100 m), this study used big data within an ITS framework to capture transit accessibility patterns at an unprecedented level of granularity. Regardless of the advances described above, significant research gaps persist in the domain of transit desert identification. Most existing studies focus on single-method approaches, limiting the comprehensiveness and generalizability of their findings. Additionally, there is a notable lack of research incorporating large-scale, 100 × 100 m spatial grid datasets to examine transit accessibility at a granular spatial level in the geography of the Global South. A recent study employed transit gap analysis integrated with public transport accessibility levels (PTAL) to assess transit deserts [
5]. Their approach quantifies transit supply and demand disparities using Z-score transformations and a quartile-based classification. Another study from the USA relied on descriptive spatial analysis methods, such as spatial lag regression models and BiLISA, to identify patterns of transit inequity, and it introduced the Modal Access Gap or MAG [
23]. Although these methods offer a structured decision-making framework, they do not use predictive analytics to classify spatial transit accessibility. To our knowledge, a study from Seoul [
11] used ML models to detect transit deserts; however, their focus was on developing an AI-powered dashboard and they overlooked practical policy recommendations.
Addressing these gaps requires a holistic approach that integrates multiple ML techniques within the ITS frameworks.
Consequently, to address these gaps, this study undertakes a comparative evaluation of machine learning (ML) models, decision trees (DTs), logistic regression (LR), and random forest (RF), to systematically identify transit deserts in Lucknow. In contrast to previous research, which predominantly employs single-model approaches, this study adopts a multi-model framework to assess the relative strengths and limitations of each method in classifying transit accessibility. The comparative analysis demonstrates that RF exhibits the highest accuracy, while DT offers superior interpretability, and LR is constrained by its inability to effectively capture non-linear transit patterns. Furthermore, this research integrates 100 × 100 m spatial grid analytics within an Intelligent Transport System (ITS) framework, facilitating a more granular examination of urban mobility challenges. By establishing a link between ML-driven transit desert detection and policy implications, the study provides a robust empirical foundation for evidence-based interventions aimed at promoting greater equity in public transportation. The insights derived are not only relevant to Lucknow but also present transferable methodologies that are applicable to other rapidly expanding urban centers seeking to enhance transport efficiency through ITS-driven analytics. The research seeks to address the following key questions:
How do ML models, specifically, DT, LR, and RF, perform in detecting transit deserts in Lucknow, and what are the key determinants of transit accessibility?
How can ITS-driven ML analysis inform policy interventions aimed at improving public transportation equity and mobility in Lucknow?
By answering these questions, the study aims to provide an empirical foundation for enhancing transit accessibility in Lucknow through the integration of ML techniques. Finally, this study contributes to the broader discourse on data-driven urban mobility planning, offering insights that are applicable not only to Indian cities but also to other rapidly growing urban centers seeking to apply ITS for improved transport efficiency and accessibility.
The subsequent sections of this article are organized as follows:
Section 2 offers a comprehensive overview of the data sources and machine learning models employed for the detection of transit deserts.
Section 3 presents the results, providing a comparative analysis of model performance.
Section 4 discusses the key findings and their implications, while
Section 5 concludes by addressing the study’s limitations and suggesting directions for future research.
2. Data and Machine Learning Models for Transit Desert Detection
2.1. Data
This study utilized multiple data sources. Population density was calculated from
Census of India data, adjusted using
Lucknow Nagar Nigam estimates for 2024. For income data, housing rents were used as proxy. These data were gathered from real-estate aggregators including Housing.Com and Magic bricks. The frequency of trips was assembled using Lucknow City Transport Limited (
http://lctsl.co.in/en/page/routes, accessed on 19 March 2025), and accessibility to services was computed using GoogleMaps data. Afterwards, a Bigdata database was created for 34,969 areal units each of 10,000 square meters.
2.2. Methods
Each of DT, LR and RF model employs a distinct methodology to classify areas as transit deserts, focusing on four key features: economic status, where low values indicate poor economic conditions; trip frequency, where low values suggest less public transport usage; population density, where high values indicate a greater number of people affected; and accessibility to services, where low values denote poor access to essential services.
The DT model employs hierarchical if–else rules to split the data at key decision points. It learns threshold values for each feature and recursively partitions the dataset. For example, a rule might be: If economic status < threshold AND trip frequency < threshold AND accessibility < threshold AND population density > threshold, then transit desert = 1. The advantages of this model include its ability to handle non-linearity and its ease of interpretation. However, it can overfit the data if too many splits are made.
The LR model uses a mathematical equation to calculate the probability of an area being a transit desert. It assumes a linear relationship between the features and the outcome. Although LR assumes a linear decision boundary in feature space, it is a probabilistic classifier rather than a strictly statistical model [
24]. It works well for simple problems but assumes linearity; hence, it struggles with complex decision boundaries.
The RF model combines multiple decision trees through ensemble learning to improve accuracy. It randomly selects features and creates multiple decision trees, and then it aggregates their results. If most trees classify an area as a transit desert, the final prediction is also a transit desert. This model is very accurate, prevents overfitting, and handles complex relationships well, although it is harder to interpret than a single DT.
To systematically evaluate the suitability of various ML models for the detection of transit deserts, we compare DT, LR, and RF across multiple criteria.
Though a single DT is inherently straightforward and readily interpretable, its computational requirements escalate substantially as the tree becomes deeper, resulting in an increased number of splits and greater complexity, thus potentially slowing processing times. RF, an ensemble of multiple decision trees, optimizes computational efficiency through parallel processing and feature randomness, preventing deep recursive splitting, which often increases DT complexity. This approach not only improves predictive accuracy by reducing the likelihood of overfitting but also optimizes the use of computational resources. Therefore, RF is particularly well suited to the analysis of the large-scale, complex datasets typically encountered in fine-scale urban mobility research.
Therefore, the comparative analysis clearly suggests that RF outperform other models in managing non-linearity, scalability, and robustness against overfitting, portraying it as highly suitable for the detection of complex urban transit deserts. Theoretically, RF is the preferred and advisable algorithm. Although DTs provide high interpretability, they are vulnerable to overfitting. Conversely, despite its computational efficiency, LR encounters difficulties with complex spatial relationships. This comparison emphasizes the benefits of ensemble learning methods for transit accessibility studies.
The analysis was conducted using Python 3.8 on a system with an Intel Core i7 processor (2.6 GHz) and 16 GB RAM. The dataset was randomly partitioned, ensuring a balanced distribution of transit desert and non-transit desert locations. A stratified 80:20 train–test split was applied, and five-fold cross-validation was used to validate the model’s performance. This approach ensures that the models generalize well across different subsets of data, reducing bias and improving robustness. Apart from that, a five-fold cross-validation was used to validate the model’s performance. K-fold cross-validation (k = 5) was employed to ensure generalizability and minimize overfitting. This approach helps to assess model stability across different data subsets and improves reliability in transit desert classification.
3. Results
3.1. Decision Tree (DT) Model
Results of the DT model are based on median thresholds (
Table 1) and mean scores of features distinguishing transit deserts from non-transit deserts (
Table 2).
The thresholds employed in decision making are as follows: population density with a median value of −0.09 (high if greater than −0.09), economic status with a median value of −0.10 (low if less than −0.10), transit frequency with a median value of −0.11 (low if less than −0.11), and service accessibility with a median value of −0.18 (low if less than −0.18). A location was classified as a transit desert only if it exhibited high population density (pop_density > −0.09), low economic status (economic_status < −0.10), low transit frequency (trips < −0.11), and low service accessibility (access_services < −0.18).
In comparing transit deserts to non-transit deserts, the following observations were made (
Table 2). Population density averaged −0.0585 (low) for non-transit deserts and 0.5762 (high) for transit deserts, indicating a higher density in transit deserts. Economic status averaged 0.0950 (better) for non-transit deserts and −0.9355 (worse) for transit deserts, signifying a lower economic status in transit deserts. Transit frequency averaged 0.0823 (more) for non-transit deserts and −0.8101 (less) for transit deserts, reflecting a lower frequency in transit deserts. Service accessibility averaged 0.0720 (more) for non-transit deserts and −0.7087 (less) for transit deserts, indicating lower accessibility in transit deserts.
Key observations include the significantly higher population density in transit deserts, with a density score of +0.576 compared to −0.0585 for non-deserts, confirming that transit deserts are situated in highly populated areas. Economic status is markedly lower in transit deserts, with the score dropping from 0.0950 in non-deserts to −0.9355 in deserts, highlighting the critical need for transit access in these poorer areas. Transit frequency is substantially lower in transit deserts, with the score decreasing from 0.0823 to −0.8101, indicating severe underservice by transit. Accessibility to services is also considerably lower in transit deserts, with the score dropping from 0.0720 to −0.7087, demonstrating the struggle to access essential services in these areas.
Figure 2 shows the feature importance of the DT model.
Figure 3 illustrates the DT classification. Regarding the partitioning of dataset, the training data comprise 27,975 samples, while the test data consist of 6994 samples. The data were divided using an 80% training and 20% testing ratio. Random splitting was employed using the ‘train_test_split’ method, ensuring an even distribution.
The results of the K-fold cross-validation (five folds) show the mean cross-validation accuracy of 98.78%, with a standard deviation of 0.099%. This very low standard deviation indicates stable performance across different folds, confirming that the model generalizes well across various subsets of the data.
In brief, the classification is robustly supported by the data, affirming that transit deserts indeed exist in high-density, low-income, low-transit, and low-accessibility areas. Population density emerges as a dominant factor, reinforcing the model’s decision making.
3.2. Logistic Regression (LR) Model
The logistic regression model applied to the transit desert detection task yielded a test accuracy of 92.81%, indicating a reasonably strong overall performance. However, its ability to identify transit deserts was notably weaker than the alternative models, as reflected in a recall score of 45%. This suggests that the model struggled to correctly classify a substantial proportion of transit desert locations, leading to an underestimation of areas with poor transit access. The F1-score for transit deserts was calculated at 54%, reinforcing the observation that the model’s predictive power in distinguishing transit deserts from non-transit deserts was limited.
The cross-validation procedure, conducted using a five-fold approach, produced a mean accuracy of 93.18% with a standard deviation of 0.27%. While the stability of the model across different data subsets was relatively consistent, the lower recall score indicates that logistic regression failed to capture the full extent of transit deserts. This suggests that the linear nature of the model may not be well-suited to dealing with the complexity of the relationships between population density, economic status, transit frequency, and service accessibility.
An analysis of the feature classification within the logistic regression model provides insights into its predictive structure (
Figure 4). The model assigns coefficients to each variable, reflecting their respective influence on the likelihood of a location being classified as a transit desert. Population density exhibited the highest positive coefficient (+1.869), indicating that locations with higher population density are significantly more likely to be classified as transit deserts. Conversely, economic status (−1.498), transit frequency (−1.702), and accessibility to services (−1.139) all had negative coefficients, signifying that lower values for these features increase the probability of a location being identified as a transit desert. The particularly strong negative coefficient associated with transit frequency suggests that reductions in transport availability have a substantial impact on transit desert classification, highlighting the importance of transit provision in mitigating accessibility deficits.
3.3. Random Forest (RF) Model
In the RF model, the training dataset comprises 27,975 samples, while the test dataset consists of 6994 samples. The train–test split ratio was 80% for training and 20% for testing. The dataset was randomly partitioned, ensuring a balanced distribution of transit desert and non-transit desert locations.
The random forest model demonstrated exceptional performance in classifying transit deserts, achieving a test accuracy of 99.37% and a cross-validation accuracy of 99.77%, with a low standard deviation of 0.23%, indicating high stability across different data subsets. The model exhibited strong predictive capabilities, particularly in identifying transit deserts, with a recall of 93%, ensuring that the majority of underserved areas were correctly classified. Additionally, the F1-score for transit deserts was recorded at 97%, reflecting a well-balanced model that effectively minimized both false positives and false negatives.
A feature importance analysis revealed that population density was the most influential variable in the classification process, contributing 43.88% to the model’s decision making (
Figure 5). This finding underscores the significance of population density in determining transit desert locations, reinforcing the premise that densely populated areas face higher risks of inadequate transit access. Economic status (21.40%), transit frequency (19.28%), and accessibility to services (15.44%) also played crucial roles, indicating that, while transit deserts are primarily defined by population density, socioeconomic and infrastructural factors remain integral to their classification.
The random forest classifier outperformed both the decision tree and logistic regression models, offering the highest overall accuracy, superior recall, and greater generalization capacity. Its ability to capture non-linear interactions among variables further enhanced its predictive effectiveness, making it the most reliable model for detecting and analyzing transit deserts. The random forest model’s results suggest that interventions aimed at addressing transit deserts should prioritize high-density areas while considering economic vulnerability and transit service availability to ensure equitable access to transportation.
Moreover, the results further reveal that the RF classifier demonstrated exceptional performance in detecting transit deserts, achieving an accuracy of 100%. For non-transit desert areas (Class 0), both precision and recall were 1.00, and, for transit desert areas (Class 1), both precision and recall were also 1.00, resulting in an F1-score of 1.00, indicating perfect classification. The random forest model outperformed both logistic regression and decision tree models, achieving perfect classification for both classes. This suggests that random forest is the best choice for transit desert detection, likely due to its ability to handle complex relationships and non-linear data structures.
Visualizations of the identified transit deserts (red points) in Lucknow based on the DT and RF analyses are presented in
Figure 6. It shows that transit deserts, represented by red points, are scattered but tend to form clusters within the city. These areas meet the criteria of low economic status, low trip frequency, low accessibility to services, and high population density. Although most of the city is well-served, specific areas require targeted interventions to improve transport accessibility.
3.4. Model Performance
The comparative evaluation of the DT, LR, and RF models highlights significant differences in their predictive capabilities for transit desert classification (
Table 3). The RF model emerged as the most effective, achieving the highest test accuracy of 99.37% and a cross-validation accuracy of 99.77%, demonstrating both superior predictive power and remarkable stability across different data subsets. Its recall score of 93% ensured that the majority of transit deserts were accurately identified, while an F1-score of 97% indicated a strong balance between precision and recall, minimizing both false positives and false negatives.
The DT model also performed well, with a test accuracy of 98.71% and a recall score of 99%, suggesting that it was particularly adept at detecting transit deserts, though it exhibited marginally lower overall accuracy compared to the RF classifier. In contrast, LR was the weakest performer, yielding a test accuracy of 92.81% and a recall of only 45%, indicating that it failed to correctly classify a substantial portion of transit deserts. Its F1-score of 54% further reinforced the observation that it was not well suited to this task, likely due to its linear decision boundary, which struggled to capture the complex interactions between population density, economic status, transit frequency, and service accessibility. The analysis of feature importance within the models further illustrated the differences in their predictive approaches, with both the decision tree and random forest classifiers identifying population density as the most critical factor, while logistic regression exhibited a less structured feature influence, resulting in weaker classification performance. Overall, the RF model proved to be the most robust and reliable, excelling in accuracy, recall, and generalization, making it the preferred choice for transit desert detection in this study.
Again, the neighborhood-level analysis reveals several key insights. The first cluster, located around Kanpur Road, exhibits the lowest economic status, lowest accessibility, and high population density, indicating severe transit desert conditions that require urgent intervention. The second cluster, around Faizabad Road, has a very low economic status, the lowest trip frequency, the highest population density, and poor accessibility, suggesting it is likely an underserved area. The third cluster, around the southern part of city, shows better economic status but still below average, lower accessibility, and moderate transit desert conditions, indicating that it would benefit from transport expansion. The fourth cluster, around Hardoi Road, faces moderate economic issues, lower trip frequency, high population density, and accessibility issues, necessitating improved service access. Finally, the fifth cluster, around Balaganj, has poor economic status, poor accessibility, and a high population density, marking it as a major transit desert area requiring improvements.
4. Discussion
The primary objective of this study was to systematically identify transit deserts in Lucknow through the application and comparative analysis of three machine learning (ML) models (decision tree (DT), logistic regression (LR), and random forest (RF)), within an Intelligent Transport System (ITS) framework. Transit deserts constitute areas that lack sufficient public transit services relative to the needs of residents, exacerbating social inequalities by limiting access to employment, education, healthcare, and other vital amenities [
1,
6]. The application of DT, LR, and RF models in this study provided distinct advantages and demonstrated varied levels of effectiveness in classifying transit deserts. Consistently with previous studies, the RF model emerged as particularly robust, achieving superior classification accuracy and reliability, reflecting its ability to handle complex and non-linear spatial patterns associated with transit accessibility. These findings support recent research indicating RF’s superior ability to manage large-scale datasets and intricately interrelated variables in urban mobility contexts [
7,
8,
9].
Although DT exhibited commendable interpretability, allowing planners to intuitively understand and visualize determinants of transit accessibility, it showed some susceptibility to overfitting—a limitation recognized in earlier transit studies [
10]. Despite offering useful hierarchical insights, DT’s performance may decline when generalizing to more heterogeneous urban environments. Conversely, LR demonstrated comparatively low accuracy, primarily constrained by its linear decision boundary assumptions that inadequately represent the complex spatial interactions between demographic, infrastructural, and transit-related features. This limitation aligns with prior findings highlighting the challenges of logistic regression in handling complex transport classification problems due to inherent linearity constraints [
9,
10]. Nonetheless, logistic regression remains valuable for preliminary analyses due to its computational simplicity and straightforward interpretability.
Comparing these results to recent studies, particularly one from South Korea [
11] that employed RF among other ML models to identify transit deserts in Seoul, reveals critical insights and potential limitations within existing approaches. Although the South Korean study effectively applied supervised ML techniques and advanced interpretability methods such as SHAP (SHapley Additive exPlanations) and DICE (diverse counterfactual explanations), the study was limited by its primary emphasis on developing an interactive AI-powered dashboard, possibly at the cost of comprehensive and actionable policy insights. Moreover, the applicability of their findings beyond specific urban context of Seoul is uncertain, due to data constraints, potentially constraining generalizability to cities experiencing differing patterns of urbanization and socioeconomic disparities.
This research addresses several of the weaknesses inherent in the above approach. First, by employing a comparative multi-model analytical framework (DT, LR, RF), our study explicitly assesses relative methodological strengths and weaknesses, offering richer insights into each model’s suitability for urban mobility analysis. Second, whereas the aforementioned study focused largely on gender-based disaggregation, our analysis encompasses a broader set of socio-economic indicators including population density, economic status, trip frequency, and accessibility to services. This integrative approach allows for a deeper and more nuanced understanding of factors underpinning transit desert conditions. Furthermore, this study is contextualized within an Indian metropolitan setting, providing insights that are particularly relevant to urban development contexts in the Global South, an area still underrepresented in transit desert literature.
Additionally, a recent study from the USA introduced the concept of the modal access gap (MAG), utilizing descriptive spatial regression methods including spatial lag regression and bivariate local indicators of spatial association (BiLISA) to identify transit deserts based primarily on automobile versus transit accessibility differentials [
23]. Despite effectively elucidating transit inequities in American metropolitan contexts, their method remains primarily descriptive and static, lacking the predictive capabilities essential for proactive and dynamic urban planning. In contrast, our ML-based methodology transcends mere descriptive analyses by providing the predictive classification of transit desert locations. By employing ML models that inherently manage non-linear relationships and adaptively learning from complex datasets, our approach represents a significant methodological advancement over spatial regression models, potentially addressing dynamic urban mobility demands more effectively.
Additionally, the study conducted in the Indian context applied public transport accessibility levels (PTAL) and transit gap analysis to categorize transit deserts using quartile-based thresholds, thereby providing a valuable decision-making framework [
5]. While their method undoubtedly systematically captures transit supply–demand disparities, it lacks predictive precision and adaptability. Our ML-driven approach, in contrast, offers complex data interactions to dynamically classify and predict transit deserts with higher accuracy, thus bridging a notable methodological gap by transitioning from descriptive analytics to a predictive and adaptive urban transport planning model.
The findings from this study underscore the importance of granular data analysis. The 100 × 100 m spatial grid adopted here represents an unprecedented level of data resolution in transit desert studies, particularly within an Indian context. Previous research frequently relied on aggregated spatial units, potentially obscuring fine-scale accessibility variations. The enhanced granularity of our analytical approach facilitates targeted location-specific interventions, increasing the potential effectiveness and equity of transit planning outcomes.
Our results clearly indicate spatial clusters of transit deserts predominantly situated within high-density areas characterized by low economic status, low transit frequency, and poor accessibility to essential services. This finding aligns with extant research identifying population density and socioeconomic disadvantage as crucial determinants of transit deserts [
7,
11,
13]. The concentration of transit deserts in such disadvantaged areas highlights improving accessibility as an urgent policy priority, thus potentially alleviating the socio-economic marginalization associated with poor public transport provision.
Moreover, the comparative assessment of multiple ML models performed in this study advances the methodological discourse on transit desert detection by highlighting the relative strengths and weaknesses of each approach. While RF emerged as the most accurate and robust model, the interpretability offered by DT remains valuable for stakeholders requiring transparent decision-making rationales. Therefore, urban planners may consider hybridizing RF’s predictive strength with DT interpretability in an integrated ITS framework, capitalizing on the strengths of both predictive accuracy and intuitive policy communication.
Finally, this study significantly contributes to the literature on urban mobility by situating ML-based predictive analytics within practical policy-making frameworks. Rather than merely identifying spatial disparities, it offers actionable insights for targeted intervention strategies. Future research could beneficially extend this work through longitudinal studies to assess the temporal dynamics of transit deserts, explore the integration of real-time data sources, or further disaggregate accessibility impacts across various vulnerable groups. Additionally, adopting similar analytical methodologies in diverse urban contexts worldwide could test the generalizability and robustness of ML-based approaches, further contributing to global discourses on equitable urban mobility.
5. Conclusions
This study systematically assessed transit deserts within Lucknow by employing three machine learning (ML) methodologies—decision trees (DTs), logistic regression (LR), and random forest (RF)—integrated within an Intelligent Transport System (ITS) framework. Findings from the comparative analysis revealed the random forest model as superior due to its high predictive accuracy and robustness in managing complex, non-linear spatial data, reenforcing the findings of a recent study [
12]. Although decision trees provide valuable interpretability through the explicit representation of decision pathways, their susceptibility to overfitting highlighted their potential limitations regarding generalization to broader urban settings. Logistic regression exhibited comparatively low effectiveness, constrained primarily by its linear decision boundaries that inadequately represented complex urban mobility dynamics. Nonetheless, each model contributed unique insights, confirming that population density, economic status, transit frequency, and accessibility to essential services critically underpin transit desert conditions.
A significant strength of this research was its use of granular spatial data (100 × 100 m grids), enabling a fine scale and the precise identification of transit-deficient areas at a neighborhood scale. This spatial resolution significantly advances transit desert literature, particularly within the context of rapidly urbanizing regions in the Global South, providing policymakers with the targeted information necessary for informed interventions.
Despite these contributions, the study is subject to certain limitations. The performance and reliability of ML models fundamentally depend upon the quality and granularity of available data, which can vary significantly, especially in developing urban contexts. Additionally, the random forest model, despite exhibiting superior predictive performance, lacks intuitive interpretability, potentially complicating its direct application in urban policy-making processes.
Future research should consider the interpretability constraints of complex ML models. Additionally, further investigations could benefit from longitudinal analyses, incorporating temporal variations to enhance our understanding of transit desert dynamics over time. Lastly, future studies could integrate additional socio-demographic dimensions such as gender, age, and disability status, as well as replicating the methodology across diverse urban contexts, thereby advancing generalizability and informing more inclusive, equitable, and sustainable urban mobility policies.