Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessEditor’s ChoiceArticle

Peer-Review Record

We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model

Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077

by Lihardo Faisal Simanjuntak^1,2

, Rahmad Mahendra^1,*

and Evi Yulianti¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077

Submission received: 5 April 2022 / Revised: 23 May 2022 / Accepted: 27 May 2022 / Published: 7 July 2022

(This article belongs to the Topic Machine and Deep Learning)

Round 1

Reviewer 1 Report

In this paper, the authors propose a BERT-based model to predict the locations of Twitter users. They build their own pipeline for multi-class classification, including data collection and preparation, feature selection and model design, model training, and evaluation. The results show that fine-tuning IndoBERT on user tweet aggregation achieves the best accuracy and F1 score. The entire paper is somewhat well-structured and easy to understand.

Regarding the results in the paper, I have some major concerns.

It seems that the model trained on only tweets performs the best compared with other models. This finding looks straightforward and is not informative and promising for me.
Some results (e.g., display name, description) are not attractive due to their low performance.
While considering all the three types of features, the majority vote is not a good choice. Why not try feature concatenation or cross-feature to directly learn from features? It seems interesting whether the model performance can increase when adding more other features and interactions.
Regarding baseline methods like NB, SVM, and LR, it is hard to say which one is better by just using the default settings in sklearn. Some research work has shown that their performance should be similar through parameter tuning. Therefore, the conclusion of baseline models seems not convincing.

In addition, there are some other comments; please see below.

Regarding Section 3,
- Fig 2 seems useless and can be removed.
- It would be better to draw a geo-map or table to describe the 9 regions in Indonesia since people are not as familiar as the authors.
Regarding Section 4,
- Please describe how to select up to 100 tweets in detail. Is it random selection or location-based selection?
- Please draw a distribution of tweet numbers over users. A distribution figure looks better than an average value.
- Please rephrase the feature extraction paragraph since it is not clear to me. If possible, could you list different features types and methods for word representation? Further, why do you use TF-IDF and LSTM for display terms and descriptions instead of IndoBERT?
- It is essential to draw an overview of model architecture that shows clear elements like input, output, and network.
- Regarding baseline models for comparison, move them to the experiment section.
Regarding Section 5,
- What is the x-axis in Fig 6? Does each bar represent a user? Please clarify it.

Overall, I believe this paper requires a major revision before it can be considered for publication.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

I suggest the authors identify limitations when applying this method to research studies or any relevant projects in the discussion section.

The out performance of the IndoBerTweet over machine learning also could imply limitations using it. In addition to more training resources to employ the method, provide further limitations to adopt the indoBert to location Twitter Home Information probably by adding the limitation section.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

With the acceptance of most comments, I have found the quality of this manuscript has been further improved. The authors have clarified most of my concerns and the supplementary material is convincing to support their research. I think the manuscript is ready for publication.

Article Menu

We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model

Further Information

Guidelines

MDPI Initiatives

Follow MDPI