Review Reports - Applied Techniques for Twitter Data Retrieval in an Urban Area: Insight for Trip Production Modeling

Round 1

Reviewer 1 Report

Article Title: Applied Techniques for Twitter Data Retrieval In Urban Area: Insight for Trip Production Modeling

Review comments

Abstract:

Author mention about streaming data from May 2020-April 2021. This period was Corona peak period. Whether author has checked these data thoroughly before streaming and are genuine/ authenticated?

Author mention about retrieving the intended data produces 1,090,623 documents, of which 54,103 are geotagged data from 2,495 users. Are these data sufficient for scientific / research study? Are there any standards/ literatures in support of this? Please mention.

1. Introduction:

1. Author refers to 20+ literatures in this section. Please list out limitations of present works and identify gaps (In the form of table). Correlate these gaps to present work and define objectives.

2. Data acquisition includes the process of gathering, filtering and cleaning the unstructured data. Please explain steps to do these.

2. Materials and Methods:

3. Potential Twitter data content was first determined; such as: username, coordinates of the tweet location within the study area and its surroundings, time of tweet, and text of tweet. Are these in public domain or customer’s personnel domain?

4. Data streaming retrieval was done, using standard API v2. Author receives the consumer keys, consumer secret, access key, access secret, and bearer tokens were received. As per law / regulations if ethical clearance is not required in this for Twitter streaming data retrieval and Twitter archive data retrieval, then please quote those international legal document references in the paper.

5. Algorithm was developed for the data retrieval program. Figure 3 and 4 needs more detailing. Please modify.

6. Suitable content required to be shown and explained for Python coding. Few screen shots may be added.

7. Twitter Data streaming and archive data collection for study - 2000 + user names. Is these data standard? Justify with literatures.

3,4,5. Results , Discussion, Conclusion

8. Results obtained for both streaming and archive data need comparison with similar existing works, to justify novelty.

9. Discuss about limitations of present work in detail and possible future scopes in terms of modified algorithms, new techniques , faster data mining methods etc.

Author Response

Thank you for your review on my draft article. This is my respond to your review.

On improvement in Introduction in order to enrich background and include all relevant references.

This is the improvement:

I have added a sentence: “Previous Twitter data retrieving applications have been discontinued by the providers, and Twitter company gives opportunity to researcher to utilize Twitter data as long comply with the company policy.”
On improvement in research design appropriateness.

This is the improvement:

I have added a sentence: “This study is to make sure that Twitter data retrieving techniques can capture Twitter users’ data in research area, and ensure that the collected data meet spatio-temporal data and text data analysis for trip production modeling.”
On improvement in methods description.

This is the improvement:

I have added a sentence:” Because of its advantages and open-source nature of Python-based programs, many scholars choose to utilize them for data mining practices, topic modeling, and data analysis (Abdul-Rahman, et al., 2021). A Spyder python program has been constructed with consideration of aimed type data, geographical area, time interval, and potential number of retrieved data.”
On improvement in clear presentation of the result.

The result is presented in section 3 of the article. This is the improvement

I have added a table titled “Number of Retrieved Archive Data” (As Table 3)
I have add a title (C. Data Compilation) as a slot to show result of using both techniques.
I have add a graph that shows availability of Twitter Data for trip production modeling in Serang city as the result of application the techniques.
On improvement in Conclusion supported by result.

This is the improvement:

I have added a sentence:”Twitter data, from streaming and archive, can be retrieved using a Spyder Python-based program.” In the first paragraph in Conclusion section.
I have added a sentence:”Retrieving streaming data using multi-reference locations representing residential area within a particular zone gives more captured data than using one reference location. Intermittenly running the program for an interval time, based on data occurrence, such as 20-30 minutes, is recommended.” In the third paragraph in Conclusion section.

The draft article after improvement is in file applsci-2450002_reviewer1.docx. Thank you.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper presents methods of retrieving Twitter streaming and archive data, using Application Programming Interfaces. The aim of this article is to develop trip production modeling in zonal urban areas, using geolocation and text mining. This study used data from Twitter users in the city of Serang in Indonesia.

The article is points out very interesting challenges in the field of Machine Learning technics. The methodology applied is innovative and addresses the field of knowledge mining and data streaming collection methods by retrieving data from a Location Based Social Network (Twitter)

The level of English language is appropriate and the text content is overall comprehensive.

The manuscript is well structured but some points need to be furtherly elaborated.

It is important for the readers to be aware of the research questions of this article right in the beginning of their article. The authors are suggested to include in the last part of the introduction a specific paragraph in which they will state clearly their research questions in form of RQ1, RQ2, etc. Finally, in the conclusion section those research questions should be directly responded.

The introduction section includes the literature review section. The authors are advised to clearly distinguish the literature review section from the introduction section, by creating a new section.

The related research builds a solid base for this article, with 34 articles cited. Nevertheless, the most recent references have been published in 2021. Even though the research topic is quite innovative, the authors are advised to include more recent references (from 2022,2023)

The authors are suggested to study the following paper dealing with Social Network data knowledge mining,

Kanetaki, Z.; Stergiou, C.; Bekas, G.; Jacques, S.; Troussas, C.; Sgouropoulou, C.; Ouahabi, A. Acquiring, Analyzing and Interpreting Knowledge Data for Sustainable Engineering Education: An Experimental Study Using YouTube. Electronics 2022, 11, 2210. https://doi.org/10.3390/electronics11142210

Be careful on the way you cite your references in the main text: when you place [31], there in no need to previously write “Yang and Eickhoff (2018)”. It is like combining two different citation methods.

Please use the nearest MDPI Article Template, the numbers of each line in the right side of the document are missing. This would help reviewers target their comments in your manuscript. You may find it the link below:

https://www.overleaf.com/latex/templates/mdpi-article-template/fcpwsspfzsph

At the end of your manuscript please include the below mentioned Informed Consent Statement

Informed Consent Statement: Any research article describing a study involving humans should contain this statement. Please add “Informed consent was obtained from all subjects involved in the study.” OR “Patient consent was waived due to REASON (please provide a detailed justification).” OR “Not applicable.” for studies not involving humans. You might also choose to exclude this statement if the study did not involve humans.

Written informed consent for publication must be obtained from participating patients who can be identified (including by the patients themselves). Please state “Written informed consent has been obtained from the patient(s) to publish this paper” if applicable.

An Abbreviations table would be useful after Conflicts of Interest, for

API, LBSN, IDE, URL, RPs, HBW since they are frequently met in the main text

Future work and the limitations of this research are missing or not clearly stated at the moment. The authors are advised to include those two points in their conclusion section.

Overall, the work addresses an interesting and innovative topic, but specific issues need to be revised, in order to point out its significance.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

the authors have responded to the reviewer's comments.

please remember correcting orthographic mistakes before publication

orthographic mistakes have been detected in the following sentences, please correct them before publication:

Because of its advantages snd ope-source nature of Python-based programs, many cholars choose to utilize them for data mining practices, topic modeling, and data analysis [32]. A Spyder Python program has been constructed with consideration of aimed type data, geographical area, time intervaal, and potential number of retrieved data.. In the context of home-based trip production modeling, spatio-temporal and the social-economy label of trip makers from a particular zone is is essential for developing accurate and robust transportation demand models

Author Response

Dear Editor,

I hope this email finds you well. I am writing to submit the revised version of our draft article for the second round of review from reviewer2. We have carefully addressed the feedback provided by reviewer 2 and have made correction in the draft as suggested.

In particular, we have incorporated a table of revision (below of this letter). We have prepared a comprehensive table that presents "before" and "after" revised.

We would like to express our sincere gratitude to reviewer 2 for their thoughtful and constructive feedback, which has undoubtedly helped to improve the quality of our research article. We are confident that the revised manuscript addresses all the concerns raised and provides a more comprehensive and insightful analysis of our findings.

Thank you for your time and consideration. We look forward to hearing from you and hope that you find the revised article and accompanying table satisfactory for publication in the Applied Science journal.

Kind regards, Rempu S. Rayat

Table Revision on Draft Article (Round-2 from the 2^nd Reviewer)

Before	After
Because of its advantages snd ope-source nature of Python-based programs,	Because of its advantages 2nd open-source nature of Python-based programs,
many cholars choose to utilize them for data mining practices	many scholars choose to utilize them for data mining practices
time intervaal, and potential number of retrieved data.	time interval, and potential number of retrieved data.
spatio-temporal and the social-economy label of trip makers from a particular zone is is essential for developing accurate and robust transportation demand models	spatio-temporal and the social-economy label of trip makers from a particular zone is essential for developing accurate and robust transportation demand models.

Author Response File: Author Response.docx