Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (5)

Search Parameters:
Keywords = IndoBERT

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 1662 KB  
Proceeding Paper
Performance Analysis of IndoBERT for Detection of Online Gambling Promotion in YouTube Comments
by Kamdan Kamdan, Malik Pajar Anugrah, Moh Jeli Almutaali, Restu Ramdani and Ivana Lucia Kharisma
Eng. Proc. 2025, 107(1), 66; https://doi.org/10.3390/engproc2025107066 - 2 Sep 2025
Viewed by 1148
Abstract
The proliferation of online gambling promotions on social media platforms, particularly YouTube, poses a significant challenge in digital security and regulation. This study evaluates the performance of IndoBERT in detecting online gambling-related spam in YouTube comments. The research utilizes YouTube Data API to [...] Read more.
The proliferation of online gambling promotions on social media platforms, particularly YouTube, poses a significant challenge in digital security and regulation. This study evaluates the performance of IndoBERT in detecting online gambling-related spam in YouTube comments. The research utilizes YouTube Data API to collect comments, preprocess the text through cleaning and tokenization, and fine-tune IndoBERT for classification. The model’s performance is assessed using accuracy, precision, recall, and F1-score metrics. IndoBERT achieves outstanding results with an accuracy of 98.26%, proving its effectiveness in detecting online gambling promotion. The confusion matrix analysis highlights a low error rate, with minimal false positives and false negatives. IndoBERT is a promising tool for combating online gambling spam, offering high reliability for automated content moderation. Future improvements should focus on handling implicit promotional language, enhancing dataset diversity, and integrating rule-based filtering. This study contributes to NLP advancements in Indonesian text classification, supporting efforts to maintain a safer digital environment. Full article
Show Figures

Figure 1

28 pages, 2499 KB  
Article
Optimizing Aspect-Based Sentiment Analysis Using BERT for Comprehensive Analysis of Indonesian Student Feedback
by Ahmad Jazuli, Widowati and Retno Kusumaningrum
Appl. Sci. 2025, 15(1), 172; https://doi.org/10.3390/app15010172 - 28 Dec 2024
Cited by 5 | Viewed by 5519
Abstract
Evaluating the learning process requires a platform for students to express feedback and suggestions openly through online reviews. Sentiment analysis is often used to analyze review texts but typically captures only overall sentiment without identifying specific aspects. This study develops an aspect-based sentiment [...] Read more.
Evaluating the learning process requires a platform for students to express feedback and suggestions openly through online reviews. Sentiment analysis is often used to analyze review texts but typically captures only overall sentiment without identifying specific aspects. This study develops an aspect-based sentiment analysis (ABSA) model using IndoBERT, a pre-trained model tailored for the Indonesian language. The research uses 10,000 student reviews from Indonesian universities, processed through data labeling, text preprocessing, and splitting, followed by model training and performance evaluation. The model demonstrated superior performance with an aspect extraction accuracy of 0.973, an F1-score of 0.952, a sentiment classification accuracy of 0.979, and an F1-score of 0.974. Experimental results indicate that the proposed ABSA model surpasses previous state-of-the-art models in analyzing sentiment related to specific aspects of educational evaluation. By leveraging IndoBERT, the model effectively handles linguistic complexities and provides detailed insights into student experiences. These findings highlight the potential of the ABSA model in enhancing learning evaluations by offering precise, aspect-focused feedback, contributing to strategies for improving the quality of higher education. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence and Semantic Mining Technology)
Show Figures

Figure 1

28 pages, 2857 KB  
Article
IndoGovBERT: A Domain-Specific Language Model for Processing Indonesian Government SDG Documents
by Agus Riyadi, Mate Kovacs, Uwe Serdült and Victor Kryssanov
Big Data Cogn. Comput. 2024, 8(11), 153; https://doi.org/10.3390/bdcc8110153 - 9 Nov 2024
Cited by 3 | Viewed by 3369
Abstract
Achieving the Sustainable Development Goals (SDGs) requires collaboration among various stakeholders, particularly governments and non-state actors (NSAs). This collaboration results in but is also based on a continually growing volume of documents that needs to be analyzed and processed in a systematic way [...] Read more.
Achieving the Sustainable Development Goals (SDGs) requires collaboration among various stakeholders, particularly governments and non-state actors (NSAs). This collaboration results in but is also based on a continually growing volume of documents that needs to be analyzed and processed in a systematic way by government officials. Artificial Intelligence and Natural Language Processing (NLP) could, thus, offer valuable support for progressing towards SDG targets, including automating the government budget tagging and classifying NSA requests and initiatives, as well as helping uncover the possibilities for matching these two categories of activities. Many non-English speaking countries, including Indonesia, however, face limited NLP resources, such as, for instance, domain-specific pre-trained language models (PTLMs). This circumstance makes it difficult to automate document processing and improve the efficacy of SDG-related government efforts. The presented study introduces IndoGovBERT, a Bidirectional Encoder Representations from Transformers (BERT)-based PTLM built with domain-specific corpora, leveraging the Indonesian government’s public and internal documents. The model is intended to automate various laborious tasks of SDG document processing by the Indonesian government. Different approaches to PTLM development known from the literature are examined in the context of typical government settings. The most effective, in terms of the resultant model performance, but also most efficient, in terms of the computational resources required, methodology is determined and deployed for the development of the IndoGovBERT model. The developed model is then scrutinized in several text classification and similarity assessment experiments, where it is compared with four Indonesian general-purpose language models, a non-transformer approach of the Multilabel Topic Model (MLTM), as well as with a Multilingual BERT model. Results obtained in all experiments highlight the superior capability of the IndoGovBERT model for Indonesian government SDG document processing. The latter suggests that the proposed PTLM development methodology could be adopted to build high-performance specialized PTLMs for governments around the globe which face SDG document processing and other NLP challenges similar to the ones dealt with in the presented study. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

19 pages, 601 KB  
Article
Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach
by Khouloud Mnassri, Reza Farahbakhsh and Noel Crespi
Entropy 2024, 26(4), 344; https://doi.org/10.3390/e26040344 - 18 Apr 2024
Cited by 7 | Viewed by 6722
Abstract
Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing [...] Read more.
Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing cutting-edge machine learning techniques, the scarcity of data, especially labeled data, remains a considerable obstacle, which further requires the use of semisupervised approaches along with Generative Artificial Intelligence (Generative AI) techniques. This paper introduces an innovative approach, a multilingual semisupervised model combining Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs), more precisely mBERT and XLM-RoBERTa. Our approach proves its effectiveness in the detection of hate speech and offensive language in Indo-European languages (in English, German, and Hindi) when employing only 20% annotated data from the HASOC2019 dataset, thereby presenting significantly high performances in each of multilingual, zero-shot crosslingual, and monolingual training scenarios. Our study provides a robust mBERT-based semisupervised GAN model (SS-GAN-mBERT) that outperformed the XLM-RoBERTa-based model (SS-GAN-XLM) and reached an average F1 score boost of 9.23% and an accuracy increase of 5.75% over the baseline semisupervised mBERT model. Full article
Show Figures

Figure 1

22 pages, 1108 KB  
Article
We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model
by Lihardo Faisal Simanjuntak, Rahmad Mahendra and Evi Yulianti
Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077 - 7 Jul 2022
Cited by 35 | Viewed by 6133
Abstract
Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works [...] Read more.
Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

Back to TopTop