Next Article in Journal
CO2e Life-Cycle Assessment: Twin Comparison of Battery–Electric and Diesel Heavy-Duty Tractor Units with Real-World Data
Next Article in Special Issue
Adaptive AI-Driven Toll Management: Enhancing Traffic Flow and Sustainability Through Real-Time Prediction, Allocation, and Task Optimization
Previous Article in Journal
The Cost Competitiveness of Electric Refrigerated Light Commercial Vehicles: A Total Cost of Ownership Approach
Previous Article in Special Issue
Optimizing Cold Chain Logistics with Artificial Intelligence of Things (AIoT): A Model for Reducing Operational and Transportation Costs
 
 
Article
Peer-Review Record

Modeling and Evaluating the Impact of Mobile Usage on Pedestrian Behavior at Signalized Intersections: A Machine Learning Perspective

Future Transp. 2025, 5(1), 11; https://doi.org/10.3390/futuretransp5010011
by Faizanul Haque 1, Farhan Ahmad Kidwai 2, Ishwor Thapa 1, Sufyan Ghani 1,* and Lincoln M. Mtapure 1
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Future Transp. 2025, 5(1), 11; https://doi.org/10.3390/futuretransp5010011
Submission received: 6 December 2024 / Revised: 14 January 2025 / Accepted: 24 January 2025 / Published: 1 February 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript addresses an important and timely topic—the impact of mobile phone usage on pedestrian safety at signalized intersections—using advanced machine learning techniques. While the study is novel and has potential, there are significant conceptual, methodological, and presentation-related issues that must be addressed to improve the overall quality of the paper.

The introduction provides an adequate overview of the problem but lacks a clear articulation of the study's theoretical foundation. The authors should integrate relevant pedestrian behavior and safety frameworks to contextualize the research. This authors can incorporate some of the recent studies in pedestrian and cyclist domains, such as “Determinants of helmet use intention among E-bikers in China: an application of the theory of planned behavior, the health belief model, and the locus of control” and “The more peers are present, the more adventurous? How peer presence influences adolescent pedestrian safety”.

Details regarding the selection criteria for the 11 intersections and the observational procedures need to be expanded. For example, how were demographic variables like age estimated accurately?

While the application of CNN, LSTM, and RNN is innovative, the choice of these models needs more justification. Why were these specific architectures chosen over simpler alternatives like logistic regression or decision trees for this analysis?

 

The reported accuracy of 94.93% for the CNN model is impressive but requires further validation. For instance, did the authors evaluate robustness using cross-validation or additional datasets?

Author Response

REVISION TABLE

Paper Title: Modeling and Evaluating the Impact of Mobile Usage on Pedestrian Behavior at Signalized Intersections: A Machine Learning Perspective

Revised Submission Date:   14 January 2025

 

The authors would like to express their gratitude to the anonymous reviewers for their valuable and helpful comments.

 

REVIEWER COMMENTS

AUTHORS’ RESPONSES

Reviewer No.

COMMENT No.

COMMENTS

RESPONSES

MODIFICATIONS

1

1.

The manuscript addresses an important and timely topic—the impact of mobile phone usage on pedestrian safety at signalized intersections—using advanced machine learning techniques. While the study is novel and has potential, there are significant conceptual, methodological, and presentation-related issues that must be addressed to improve the overall quality of the paper.

 

Thank you so much for the kind review and valuable comments. We have revised this paper following your valuable comments and suggestions.

 

2.

The introduction provides an adequate overview of the problem but lacks a clear articulation of the study's theoretical foundation. The authors should integrate relevant pedestrian behavior and safety frameworks to contextualize the research. This authors can incorporate some of the recent studies in pedestrian and cyclist domains, such as “Determinants of helmet use intention among E-bikers in China: an application of the theory of planned behavior, the health belief model, and the locus of control” and “The more peers are present, the more adventurous? How peer presence influences adolescent pedestrian safety”.

 

Thank you so much for the valuable comment. The manuscript has been updated to cover the theoretical frameworks with relevant references.

 

“To provide a strong theoretical foundation for the study, understanding of established behavioral frameworks, to contextualize pedestrian safety and distraction behaviors are required. Theory of Planned Behavior (TPB), has been widely utilized in traffic safety research to understand the factors influencing individuals' decision-making processes. It highlights that attitudes, subjective norms, and perceived behavioral control are critical determinants of behavior. For pedestrians, this theory can explain how personal attitudes toward mobile phone usage, societal norms regarding its acceptability, and perceived control over safe crossing influence compliance and distraction behaviors. TPB has been successfully applied in studies to demonstrated its utility in predicting safety-related behaviors in traffic contexts[11].

Health Belief Model (HBM) provides a framework for understanding health-related behaviors by emphasizing the role of perceived susceptibility, severity, benefits, and barriers. In the context of pedestrian distraction, this model can elucidate how individuals perceive the risks associated with distracted walking and the potential benefits of adhering to safe crossing practices. For instance, interventions like awareness campaigns or environmental cues can serve as triggers for behavioral change by enhancing perceptions of risk and the benefits of safe behavior. Social dynamics significantly affect pedestrian behaviors, as peer presence and group size influence risk-taking tendencies. Studies have highlighted the impact of social contexts on adolescent pedestrian safety, demonstrating that group settings can either mitigate or exacerbate risky behaviors[12], [13]. This perspective is particularly relevant in urban settings, where pedestrian interactions are frequent and varied.”

 

Section 1 , Para 5 and 6

3.

Details regarding the selection criteria for the 11 intersections and the observational procedures need to be expanded. For example, how were demographic variables like age estimated accurately?

 

Thank you so much for the valuable comment and suggestion. The following details are now mentioned.

Details regarding selection criteria of Intersections are mentioned in Section 2.1. Further intersection Details are shown in Table 1. Further the data extraction process is now explained in detailed in Section 2.2.

“Gender is categorized as “Male” and “Female.” Since the exact age of a pedestrian cannot be determined from video footage, age is estimated by grouping individuals into three categories: “Young,” “Middle-aged,” and “Old.” The age group was estimated based upon factors such as physical characteristics, including facial features, overall appearance, hair color, walking style, and clothing type[29]. Group size is classified based on the number of pedestrians crossing together: “Single,” “Pair,” or “More than two.” Technological distraction is identified as “Yes” if a person is clearly observed using a mobile phone, talking on the phone, or wearing headphones; otherwise, it is labeled as “No.” The arrival time is recorded when a pedestrian reaches the sidewalk or median, and the departure time is noted when they step onto the carriageway. After crossing the road, the end time is recorded when the pedestrian reaches the opposite sidewalk or median. The waiting time is calculated as the difference between the departure time and the arrival time, while the crossing time is determined as the difference between the end time and the departure time. Crossing speed is defined as the width of the carriageway divided by the crossing time. Pedestrian crosswalk compliance behavior is defined as “Yes” if the individual crosses the road within the crosswalk or within 0.5 meters on either side of it. This 0.5-meter margin is included to account for pedestrians in large groups, some of whom may be slightly outside the marked crosswalk. Crossing at any other location is considered non-compliant, labeled as “No.” If a pedestrian crosses the road during the vehicle green phase or red phase for pedestrians, it is noted as signal violation.”

 

 

Section 2.2, Para 2

4.

While the application of CNN, LSTM, and RNN is innovative, the choice of these models needs more justification. Why were these specific architectures chosen over simpler alternatives like logistic regression or decision trees for this analysis?

 

Thank you for the valuable comment and suggestion. The authors appreciate the reviewer’s insightful comment regarding the justification for using CNN, LSTM, and RNN architectures in this study.

 

The choice of these models was motivated by the nature of the problem and the data structure, specifically owing to the sequential and high-dimensional data in the dataset that includes temporal and behavioral sequences, such as crossing speeds, waiting times, and compliance behaviors. These are inherently sequential and require capturing patterns over time, and simpler models such as logistic regression or decision trees may not be adequately handled. RNN and LSTM are specifically designed for sequential data, enabling them to model time-dependent behaviors effectively. CNN was chosen for its ability to extract hierarchical and spatial features from input data. Although CNNs are traditionally used in image analysis, their utility in extracting complex patterns from structured tabular data has been established in recent studies. In this study, convolutional layers facilitated the detection of nuanced relationships between pedestrian behaviors and mobile usage. Preliminary experiments with simpler models, such as logistic regression and decision trees, revealed a significantly lower accuracy and predictive performance. These models struggled to capture the intricate nonlinear relationships within the data. For instance, logistic regression assumes a linear relationship between the independent and dependent variables, which is unsuitable for this multifaceted problem. Decision trees, while interpretable, tend to overfit small datasets and fail to generalize well for complex, high-dimensional data. As highlighted in the manuscript, CNN achieved the highest accuracy (94.93%), followed by LSTM and RNN, all outperforming simpler models in terms of both accuracy and F1 score. These deep learning models also provide better generalizability and robustness, which are critical for real-world applications. This study aimed not only to predict distracted pedestrian behavior but also to analyze and understand the intricate associations between demographic, behavioral, and compliance factors. Advanced models such as CNN, LSTM, and RNN are capable of achieving both high predictive accuracy and deeper insights into data.

We have included a detailed justification for these model choices in the revised manuscript (Section 2.3), as suggested.

 

 

 

 Section 2.3

5.

The reported accuracy of 94.93% for the CNN model is impressive but requires further validation. For instance, did the authors evaluate robustness using cross-validation or additional datasets?

 

Thank you for the suggestion.

The authors appreciate the reviewer’s acknowledgment of the model’s performance and agree on the importance of validating its robustness. A 5-fold cross-validation has been performed as part of the study to evaluate the stability and reliability of the CNN and other models. The results of the cross-validation confirmed the consistency of the model's performance, with only minimal variations in accuracy across the folds. The details of the cross-validation procedure and its results have been included in the revised manuscript (Section 4).

 

Additionally, while this study did not use external datasets due to data availability constraints, we acknowledge the importance of validating the model on additional datasets for assessing its generalizability. We propose this as a key direction for future research.

 

Section 4

 

Reviewer 2 Report

Comments and Suggestions for Authors

This paper asseses the impact of mobile usage on pedestrian road crossing hebaviour at signalized intersections. Personally, it is an interesting topic. Some suggestions are as follows:

1. The title is 'a machine learning based approach'. In fact, this paper compares three already-existed deep learning networks, i.e., CNN, LSTM and RNN without proposing any new approach. It is more like a comparison of these three models on impact assessment of MU. Is it a right understanding?

2. In 2.3, too much space has been given to introduce the concept of CNN, LSTM and RNN. It is unnecessary, and these concepts are well-known in the field of machine learning. In addition, the equations are not well presented.

3. Where is the orignility of this paper? Or in other words, what are the core contributions of this paper from the aspects of models, results or applications? After reading the paper, I think the topic as well as the findings are interesting. However, it seems that the authors just simply use and compare three machine learning algorithms in this field, and do not prepare any new models.

4. It would be better for the authors to firstly analyze the characteristics of the captured data, and then customize a model to significantly improve the performance on the data.

5. Almost all the figures are not sharp enough. Please revise them.

6. The data used in this paper is very valuable. If possible, the authors could consider open-sourcing them so as to advance the development of the research community.

Overall, I can not recommend it for publication in its current form, at least a round of major revision is needed.

Author Response

REVISION TABLE

Paper Title: Modeling and Evaluating the Impact of Mobile Usage on Pedestrian Behavior at Signalized Intersections: A Machine Learning Perspective

Revised Submission Date:   14 January 2025

 

The authors would like to express their gratitude to the anonymous reviewers for their valuable and helpful comments.

 

 

REVIEWER COMMENTS

AUTHORS’ RESPONSES

 

Reviewer No.

COMMENT No.

COMMENTS

RESPONSES

MODIFICATIONS

2

1.

This paper asseses the impact of mobile usage on pedestrian road crossing behaviour at signalized intersections. Personally, it is an interesting topic. Some suggestions are as follows:

 

The authors sincerely thank the revered reviewer for the valuable time spent on reviewing and appreciating the work. The comments and suggestions made by the esteemed reviewer to improve the present article are highly appreciated by the authors.

 

 

2.

The title is 'a machine learning based approach'. In fact, this paper compares three already-existed deep learning networks, i.e., CNN, LSTM and RNN without proposing any new approach. It is more like a comparison of these three models on impact assessment of MU. Is it a right understanding?

 

We appreciate your observation regarding the focus of the study. Your understanding is partially correct. While the study indeed employs three well-established deep learning models—CNN, LSTM, and RNN—it is important to highlight that the primary objective of the research is to evaluate the applicability and effectiveness of these machine learning techniques in modeling and predicting pedestrian mobile usage behavior at signalized intersections.

 

The title, "A Machine Learning-Based Approach," is intended to reflect the study's focus on leveraging advanced machine learning algorithms to address the problem of distracted pedestrian behavior. Although we do not propose a novel machine learning architecture, the study makes unique contributions by:

 

-Applying these models in the specific context of pedestrian safety and mobile usage, a domain with limited existing research.

-Using extensive, real-world observational data collected from signalized intersections in New Delhi, India, which adds value to the understanding of pedestrian behavior in urban environments.

-Performing sensitivity analyses to identify the most significant predictors of mobile usage behavior, which provides actionable insights for traffic safety interventions.

 

 

3.

In 2.3, too much space has been given to introduce the concept of CNN, LSTM and RNN. It is unnecessary, and these concepts are well-known in the field of machine learning. In addition, the equations are not well presented.

 

The authors appreciate the reviewer's constructive feedback. We agree that concepts like CNN, LSTM, and RNN are well-known in the field of machine learning. However, our intent in Section 2.3 was to provide a succinct yet comprehensive explanation of these methods to ensure accessibility for a broader audience, including practitioners in transportation safety and urban planning who may not be deeply familiar with advanced machine learning techniques.

In the revised manuscript, the section can be streamlined to reduce redundancy while maintaining essential details relevant to the study's context. It has also been

ensured that all equations are clearly presented, appropriately labeled, and accompanied by concise explanations.  These adjustments will align the manuscript more closely with the expectations of an expert readership while retaining sufficient clarity for interdisciplinary applicability.

 

Section 2.3

 

4.

Where is the orignility of this paper? Or in other words, what are the core contributions of this paper from the aspects of models, results or applications? After reading the paper, I think the topic as well as the findings are interesting. However, it seems that the authors just simply use and compare three machine learning algorithms in this field, and do not prepare any new models.

We thank the reviewer for their thoughtful feedback and for acknowledging the relevance and interest of the topic and findings. We would like to clarify the originality and core contributions of the paper, which lie in the following aspects:

 

-This study focuses on a critical and underexplored area of pedestrian safety by investigating the impact of mobile usage (MU) on pedestrian behavior at signalized intersections. While machine learning models like CNN, LSTM, and RNN are not novel, their application in modeling and predicting technologically distracted pedestrian behavior using real-world data is a unique contribution. This is one of the few studies to leverage advanced deep learning techniques for pedestrian safety research, particularly in the context of Indian urban environments.

-The study employs a comprehensive dataset collected from 11 signalized intersections in New Delhi, India, capturing over 5,600 pedestrian observations. The data reflect real-world behaviors in diverse urban settings, adding significant value to the field by addressing the challenges of pedestrian safety in high-density, rapidly urbanizing regions.

-The study provides a rigorous evaluation of three advanced machine learning models (CNN, LSTM, RNN) for predicting distracted pedestrian behavior. While the models themselves are pre-existing, this comparison offers critical insights into their relative performance, computational efficiency, and applicability in the pedestrian safety domain. The findings demonstrate the utility of deep learning techniques in predicting distracted behaviors with high accuracy (CNN: 94.93%), paving the way for data-driven safety interventions.

-Sensitivity analysis of the input parameters reveals the most significant predictors of mobile usage, such as crossing speed and signal compliance, along with demographic factors like age and gender. These insights are valuable for developing targeted safety measures and interventions, offering a practical contribution to urban traffic management and pedestrian safety planning.

-Based on the findings, the paper proposes actionable recommendations, including infrastructure improvements, awareness campaigns, and technology-based solutions, to mitigate the risks associated with distracted pedestrian behaviors. These contributions bridge the gap between research and practical implementation.

 

While the study does not propose a new machine learning model, its originality lies in the innovative application of advanced models to a novel domain, the insights derived from a large and diverse dataset, and the practical implications of the findings. We believe these aspects make a meaningful contribution to the existing body of knowledge in pedestrian safety and distracted behavior research.




 

 

5.

It would be better for the authors to firstly analyze the characteristics of the captured data, and then customize a model to significantly improve the performance on the data.

We thank the reviewer for this valuable suggestion. The characteristics of the captured data were analyzed in detail during the study to ensure a comprehensive understanding of the underlying patterns and factors influencing pedestrian mobile usage (MU) behavior. This analysis included the following steps, which are already discussed in the manuscript:

 

-The dataset characteristics, including pedestrian demographics (age, gender, group size), behavioral variables (crossing speed, waiting time, compliance behaviors), and environmental factors, were thoroughly analyzed. These insights were presented in Section 3 and summarized in Figures 4–7 to highlight key trends, such as the higher likelihood of distraction among younger pedestrians and females.

-A sensitivity analysis was conducted to identify the most significant predictors of MU behavior. This analysis, as presented in Figure 9, revealed that crossing speed, signal compliance, and demographic variables were highly influential, offering a deeper understanding of the dataset's structure.

 

While we acknowledge the potential value of developing a customized model, our study aimed to evaluate and compare the effectiveness of existing deep learning models (CNN, LSTM, and RNN) in predicting MU behavior. This comparison provides valuable insights into the applicability of these models in the domain of pedestrian safety.

 

 

Line 703-707

 

6.

Almost all the figures are not sharp enough. Please revise them.

 

Please see if Fig 2 and 5 can be further improved, rest other figures are ok.

 

 

 

7.

The data used in this paper is very valuable. If possible, the authors could consider open-sourcing them so as to advance the development of the research community.

 

Thanks for appreciating about the data and suggestion to open source it. As, the research is still ongoing, the data cannot be made open source at this stage, to maintain the originality of future works. However, the authors are willing to made the relevant data available on request.

 

 

                     

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This study investigates the impact of mobile device usage (MU) on pedestrian behavior and safety at signalized urban intersections. It is a interesting and significant topic. My main concerns are outlined below:

1. The abstract is generally well-organized. However, including an introduction to the inputs and outputs used in the modeling process would enhance readers' understanding of the study's overall scope.

2. The study examines the relationship between pedestrian crossing attributes and MU. While the title suggests an evaluation of MU's impact, the research primarily focuses on accuracy rather than assessment. Adjusting the title or improving the content is necessary.

3. The description of the dataset is insufficient. Was video data used? If so, with a time resolution of 3 frames per second and an average crossing time of 30 seconds, wouldn't the sample size be 3×30×5000? Additionally, does the study classify someone as using a phone if they touched it even once during the entire time sequence? Basic details about the dataset need to be supplemented.

4. In safety-related research, identifying causal relationships is more important than merely achieving high accuracy, as it supports preventive measures. While machine learning methods offer high accuracy, their black-box nature prevents an assessment of how each feature influences the output. Recently, XAI models have been developed to address this limitation, and state-of-the-art models should be included in literature review section. Relevant references include: 1) Toward Safer Highways: Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis; 2) Traffic Speed Prediction of Urban Road Networks Based on High-Importance Links Using XGB and SHAP.

5. A key focus of this study should be quantitatively evaluating the impact of each feature on the output. The study presents feature importance using sensitivity analysis. However, the methods section lacks details on sensitivity analysis. Refocusing the study on feature importance and incorporating various XAI models is recommended.

6. The inputs to the model are limited to individual attributes. However, additional variables such as crosswalk width, length, road traffic volume, and lane width should be included.

 

7. The study utilizes basic models such as CNN, LSTM, and RNN. LSTM and RNN are typically used for problems with temporal characteristics, while CNN is more relevant for image-related data. Strictly speaking, this study focuses on determining whether or not MU occurs. It is unclear whether the temporal characteristics of the data need to be considered. Three models used in this study are not the most appropriate for binary classification problems like this one. Ensemble models are more widely used in similar contexts. It is necessary to provide justification for the chosen models.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

REVISION TABLE

Paper Title: Modeling and Evaluating the Impact of Mobile Usage on Pedestrian Behavior at Signalized Intersections: A Machine Learning Perspective

Revised Submission Date:   14 January 2025

 

The authors would like to express their gratitude to the anonymous reviewers for their valuable and helpful comments.

 

 

REVIEWER COMMENTS

AUTHORS’ RESPONSES

 

Reviewer No.

COMMENT No.

COMMENTS

RESPONSES

MODIFICATIONS

3

1.

This study investigates the impact of mobile device usage (MU) on pedestrian behavior and safety at signalized urban intersections. It is a interesting and significant topic. My main concerns are outlined below:

 

Thank you so much for appreciating our work and giving your valuable comment and feedbacks. The manuscript has been updated after incorporating relevant changes.

 

 

2.

The abstract is generally well-organized. However, including an introduction to the inputs and outputs used in the modeling process would enhance readers' understanding of the study's overall scope.

 

We thank the reviewer for their constructive comment. To enhance the abstract's clarity and provide readers with a better understanding of the inputs and outputs used in the modeling process, we have revised the abstract and included the following content:

Key inputs to the modeling process include pedestrian demographics (age, gender, group size) and behavioral variables (crossing speed, waiting time, compliance behaviors). The outputs of the models focus on predicting mobile usage behavior and its association with compliance behaviors such as crosswalk and signal adherence.”

 

3.

The study examines the relationship between pedestrian crossing attributes and MU. While the title suggests an evaluation of MU's impact, the research primarily focuses on accuracy rather than assessment. Adjusting the title or improving the content is necessary.

We appreciate the reviewer’s observation regarding the alignment between the study’s title and content.  To address the comment, the new title of the paper is:

"Modeling and Evaluating the Impact of Mobile Usage on Pedestrian Behavior at Signalized Intersections: A Machine Learning Perspective."

 

4.

The description of the dataset is insufficient. Was video data used? If so, with a time resolution of 3 frames per second and an average crossing time of 30 seconds, wouldn't the sample size be 3×30×5000? Additionally, does the study classify someone as using a phone if they touched it even once during the entire time sequence? Basic details about the dataset need to be supplemented.

 

We thank the reviewer for highlighting the need for a more detailed description of the dataset. Below, we provide clarifications and additional details:

Video Data and Sampling Process:

Yes, video data was used to capture pedestrian behavior at signalized intersections. The video recordings were reviewed in ultra-slow motion (approximately 2–3 frames per second) to ensure the accurate extraction of pedestrian behaviors and characteristics. However, the sample size refers to individual pedestrian observations, not the total number of frames. Each pedestrian crossing was manually coded as a single data point, with demographic and behavioral attributes (e.g., gender, age group, crossing speed, waiting time, mobile usage) extracted for each observation.

Classification of Mobile Usage:

A pedestrian was classified as "using a phone" (MU = 1) if they were observed engaging in any mobile-related activity during the crossing period. This includes talking, texting, browsing, or interacting with the device at any point while crossing the intersection. Merely touching the phone without sustained usage (e.g., taking it out of a pocket momentarily) was not classified as mobile usage. This classification ensures that only significant interactions contributing to distraction are accounted for.

To address the reviewer's concern, we propose including the following text in the manuscript to supplement the dataset description:

" Each crossing instance was manually analyzed, with demographic (e.g., age, gender, group size) and behavioral attributes (e.g., crossing speed, waiting time, compliance behaviors) recorded. Mobile usage (MU) was classified based on observed interactions, such as texting, calling, or browsing, during the crossing period. The final dataset comprised 5,642 individual pedestrian observations, representing diverse behaviors and contexts. The extracted data were coded and entered into preset Excel formats."

 

5.

In safety-related research, identifying causal relationships is more important than merely achieving high accuracy, as it supports preventive measures. While machine learning methods offer high accuracy, their black-box nature prevents an assessment of how each feature influences the output. Recently, XAI models have been developed to address this limitation, and state-of-the-art models should be included in literature review section. Relevant references include: 1) Toward Safer Highways: Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis; 2) Traffic Speed Prediction of Urban Road Networks Based on High-Importance Links Using XGB and SHAP.

 

We appreciate the reviewer’s valuable suggestion to incorporate a discussion on explainable artificial intelligence (XAI) models, which provide transparency and interpretability to machine learning predictions. We agree that understanding the influence of individual features on the output is crucial for supporting preventive safety measures. In response, the following discussions about XAI are now included in Literature Review, including the references provided by the esteemed reviewer:

Machine learning (ML) methods have been extensively employed in pedestrian safety research to model and predict risky behaviors, evaluate compliance with traffic rules, and understand the factors influencing pedestrian safety. These methods offer high predictive accuracy and the ability to analyze complex, high-dimensional datasets. Several studies have demonstrated the effectiveness of ML techniques in pedestrian safety applications[29], [30], [31]. In one study, Support Vector Machine (SVM) and Random Forest (RF), were used to predict pedestrians’ red-light crossing behaviour[32]. Deep learning approaches have further enhanced the predictive performance of pedestrian safety studies. For example, Long Short Term Memory (LSTM) and Recurrent Neural Network (RNN) have been used to classify pedestrian behavior and detect violations at signalized intersections[33].

Despite their effectiveness, these ML methods often operate as "black-box" models, providing little insight into how input variables contribute to the predictions. In safety-related research, this lack of transparency limits the ability to identify causal relationships, which are critical for designing targeted interventions and preventive measures. Explainable artificial intelligence (XAI) addresses this limitation by providing interpretability and transparency in ML models. XAI techniques, such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), help quantify the contribution of individual features to model predictions, enabling researchers to understand the underlying factors driving the results. Recent studies have demonstrated the utility and effectiveness of XAI in transportation safety research[34], [35]. This, integrating XAI into pedestrian safety research can bridge the gap between predictive accuracy and interpretability, offering deeper insights into how demographic, behavioral, and situational factors influence distracted behaviors.”

 

 

6.

A key focus of this study should be quantitatively evaluating the impact of each feature on the output. The study presents feature importance using sensitivity analysis. However, the methods section lacks details on sensitivity analysis. Refocusing the study on feature importance and incorporating various XAI models is recommended.

Thank you for the comment

Moreover; a comparative analysis of ensemble models with deep learning architectures with XAI models in future research would help identify the most suitable models based on accuracy, computational efficiency, and generalizability for similar contexts.

 

7.

The inputs to the model are limited to individual attributes. However, additional variables such as crosswalk width, length, road traffic volume, and lane width should be included.

 

We appreciate the reviewer’s insightful suggestion to incorporate additional variables such as crosswalk width, length, road traffic volume, and lane width. These factors can provide a more comprehensive understanding of the contextual and environmental influences on pedestrian behavior. However, in the current study, the focus was primarily on individual attributes (e.g., demographic and behavioral factors) due to the availability and feasibility of data collection within the study's scope. So, these factors are considered for future scope and the following points are incorporated in the future scope:

Further, the study focused only on individual pedestrian attributes. Future research should consider integrating environmental factors, such as crosswalk width, road traffic volume, and lane width, to provide a holistic understanding of pedestrian behavior. Including these variables could improve model performance and offer richer insights into the interplay between individual and environmental factors influencing mobile usage behavior.”

 

 

8.

The study utilizes basic models such as CNN, LSTM, and RNN. LSTM and RNN are typically used for problems with temporal characteristics, while CNN is more relevant for image-related data. Strictly speaking, this study focuses on determining whether or not MU occurs. It is unclear whether the temporal characteristics of the data need to be considered. Three models used in this study are not the most appropriate for binary classification problems like this one. Ensemble models are more widely used in similar contexts. It is necessary to provide justification for the chosen models.

We appreciate the reviewer’s thoughtful feedback and acknowledge the need to clarify the rationale behind our choice of CNN, LSTM, and RNN for this study. While the primary objective of the study is to classify mobile usage (MU) as a binary outcome, the predictor variables include sequential behavioral attributes, such as waiting time and crossing speed. These variables have an inherent temporal or ordered nature in pedestrian behavior that aligns with the strengths of LSTM and RNN. These models were chosen to explore potential temporal dependencies in the dataset, which simpler models or ensemble methods might overlook. The choice of these models was motivated by the nature of the problem and the data structure, specifically owing to the sequential and high-dimensional data in the dataset that includes temporal and behavioral sequences, such as crossing speeds, waiting times, and compliance behaviors. These are inherently sequential and require capturing patterns over time, and simpler models such as logistic regression or decision trees may not be adequately handled. RNN and LSTM are specifically designed for sequential data, enabling them to model time-dependent behaviors effectively. CNN was chosen for its ability to extract hierarchical and spatial features from input data. Although CNNs are traditionally used in image analysis, their utility in extracting complex patterns from structured tabular data has been established in recent studies. In this study, convolutional layers facilitated the detection of nuanced relationships between pedestrian behaviors and mobile usage. Preliminary experiments with simpler models, such as logistic regression and decision trees, revealed a significantly lower accuracy and predictive performance. These models struggled to capture the intricate nonlinear relationships within the data. For instance, logistic regression assumes a linear relationship between the independent and dependent variables, which is unsuitable for this multifaceted problem. Decision trees, while interpretable, tend to overfit small datasets and fail to generalize well for complex, high-dimensional data. (Ghani et al. 2021) As highlighted in the manuscript, CNN achieved the highest accuracy (94.93%), followed by LSTM and RNN, all outperforming simpler models in terms of both accuracy and F1 score. These deep learning models also provide better generalizability and robustness, which are critical for real-world applications. This study aimed not only to predict distracted pedestrian behavior but also to analyze and understand the intricate associations between demographic, behavioral, and compliance factors. Advanced models such as CNN, LSTM, and RNN are capable of achieving both high predictive accuracy and deeper insights into data.

The authors agree that the choice of models should be well-justified. To strengthen the manuscript, we have elaborated on the rationale behind the model selection in the Methods section (Section 2.3). Authors also acknowledge the importance of comparing the performance of these deep learning models with widely used ensemble models such as Random Forest, Gradient Boosting, and XGBoost. While this comparison was not conducted in the current study due to scope constraints, we recognize its value and propose it as a critical direction for future research. Additionally, we incorporated the limitations of this approach and the potential applicability of ensemble models for future work in the revised Conclusion section.

 

                   

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have answered all my questions.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have basically addressed my concerns, and I can recommend the paper for publication.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have addressed all of my concerns. I recommend that this study be published.

Back to TopTop