Sentiment Analysis of X Users Regarding Bandung Regency Using Support Vector Machine
Abstract
1. Introduction
2. Material and Methods
2.1. Data and Variable
2.2. Research Methodology
2.3. Data Preprocessing
- Cleaning
- toLowerCaseThe initial stage of data preprocessing involves executing the toLowerCase function, which standardises all text in the document to a uniform format, typically lowercase. We only modify alphabetic characters, discarding non-alphabetic characters as delimiters. For instance, “Segar” is rendered as “segar,” and “Rusak” is rendered as “rusak,” among others.
- TokenizingAfter the text conversion, the next procedure is tokenization, which involves segmenting each word that constitutes a document. This process entails the removal of numerals, punctuation marks, and various alphabetic characters, including zero (0), one (1), comma (,), period (.), and question mark (?). These letters serve as word separators, or delimiters, without affecting the text under processing.
- Word NormalisationTweets authored by X users predominantly employ informal language, abbreviations, or even elongated words. To standardise these terms in Indonesian, it is essential to implement the word-normalisation phase. In this word normalisation, the author incorporated multiple terms that were not standardised by the lexicon from the GitHub (https://github.com/) Nasalsabila colloquial Indonesian lexicon and included normalisation pertaining to the Sundanese language, as the data was sourced from the X account of a Sundanese individual.
- FilteringWe conduct filtering to select significant terms from the tokenisation results, specifically those that may effectively represent the content of a document, after removing delimiters and non-influential words. We eliminate words of diminished significance, such as “dan,” “yang,” “dari,” “di,” etc. This procedure involves two techniques: stoplist and wordlist. A stoplist eliminates non-descriptive or insignificant terms. The wordlist retains terms deemed significant. The author modifies the stoplist lexicon from GitHub user aliakbars by removing essential words and incorporating other terms, such as “amp” and “cc”.The adjustment of the stopword list and slang lexicon underwent a cyclical validation procedure. Initially, high-frequency, non-informative tokens (e.g., “amp” and “cc”) and colloquial idioms were discerned by corpus-level frequency analysis and manual examination. The candidates were thereafter assessed for their semantic contributions to sentiment and either eliminated or normalised as necessary. The refinement procedure was reiterated until the stopword and slang lists reached stability, indicating that no further non-informative or redundant tokens were detected in subsequent evaluations.
- StemmingStemming is the process of transforming a word into its base form or identifying the root word of each term by filtering. The stemming process seeks to revert a word to its fundamental form as defined by the dictionary. Information retrieval extensively uses the stemming process to improve the quality of acquired information. This stemming procedure transforms any word with an affix into its base form. For example, this procedure reduces “merusak” to “rusak,” “menyukai” to “suka,” and “kejelekan” to “jelek,” among other words.
2.4. Labelling
2.5. Text Transformation
- : total documents
- : number of documents containing word
- : th document,
- : th word of keyword,
- : weight of th word in th document
- : number of th word in th document
- : inversed document frequency of th word
2.6. Visualisation
2.7. Handling Imbalanced Data
- : new synthetic data
- : data from the minority class along the boundary
- : data from the k-nearest neighbours that have the closest distance to
- : random number in the range 0 to 1
2.8. Support Vector Machine Modelling
- Linear kernel:
- 2.
- Polynomial kernel
- 3.
- RBF kernel
- 4.
- Sigmoid kernel
2.9. Model Evaluation
2.10. Ethical Considerations
3. Results
3.1. Results of Data Preprocessing
3.2. Results of Labelling
3.3. Results of Visualisation
3.4. Results of Text Transformation
3.5. Results of SVM Modelling
- Polynomial Kernel
- b.
- RBF Kernel
- c.
- Sigmoid Kernel
3.6. Comparative Analysis of Class-Wise Recall Across Feature Representations
4. Discussions
4.1. Methodological Considerations and Design Trade-Offs
4.2. Analytical Discussion and Relation to Prior Studies
4.3. Contextual Analysis of “Kandang Persib”-Related Sentiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| Listing A1. Google Colaboratory Data Crawling Syntax. |
| !pip install pandas |
| !curl -sL https://deb.nodesource.com/setup_18.x | sudo -E bash - |
| !sudo apt-get install -y nodejs |
| filename = ‘akun.csv’ |
| search_keyword = ‘(@akun{mention}/to:@akun(reply}) until:2023-12-31 since:2023-01-01’ |
| limit = 10000 |
| !npx --yes tweet-harvest@latest -o "{filename}" -s "{search_keyword}" -1 {limit} --token |
| import pandas as pd |
| file_path = f"tweets-data/{filename}" |
| df= pd.read_csv(file_path, delimiter=";") |
| display(df) |
| #jika error karena terlalu banyak data -> df.to_csv(filename,in- dex=False, delimiter=“;”) |
| num_tweets = len(df) |
| print(f"Jumlah Tweet dalam dataframe adalah {num_tweets}.") |
Appendix B
| Listing A2. Google Colaboratory Data Preprocessing Syntax. |
| !pip install nltk |
| !pip install Sastrawi |
| !pip install pandas |
| !pip install numpy |
| !pip install matplotlib.pyplot as plt |
| import pandas as pd |
| import numpy as np |
| import matplotlib.pyplot as plt |
| from google.colab import files |
| uploaded = files.upload() |
| data=pd.read_csv("data.csv") |
| ##Cleaning |
| import string |
| import re |
| import nltk |
| def remove(fix_text) : |
| fix_text=str(fix_text) |
| # Menghapus kata yang tidak diperlukan |
| fix_text=re.sub("lalin", "lalu lintas", fix_text) |
| fix_text=re.sub("@\\w+", "", fix_text) |
| fix_text=re.sub("https?://.+", "", fix_text) |
| fix_text=re.sub("\\d+\\w*\\d*", "", fix_text) |
| fix_text=re.sub("#\\w+", "", fix_text) |
| fix_text=re.sub("[^\x01-\x7F]", "", fix_text) |
| fix_text=re.sub("‘’", "", fix_text) |
| fix_text=re.sub(",", "", fix_text) |
| fix_text=re.sub(r"[^\w\s]", "", fix_text) |
| # Remove spaces and newlines |
| fix_text=re.sub("\n", " ", fix_text) |
| fix_text=re.sub("^\\s+", "", fix_text) |
| fix_text=re.sub("\\s+$", "", fix_text) |
| return fix_text |
| data['tweetclean']=[remove(x) for x in data['fix_text']] |
| data[‘tweetclean’]=data[‘tweetclean’].str.lower() |
| tweet=data[‘tweetclean’] |
| tweet |
| ##Tokenizing |
| import nltk |
| nltk.download(‘punkt’) |
| from nltk.tokenize import word_tokenize |
| def word_tokenize_wrapper(text): |
| return word_tokenize (text) |
| tweet=tweet.apply(word_tokenize_wrapper) |
| tweet.head() |
| ##Normalisasi Teks |
| slw=pd.read_csv("Slangwords1.csv",sep=";") |
| print(slw) |
| def replace_slang_word(words): |
| for index in range(0,len(words)-1): |
| index_slang = slw.slang==words[index] |
| formal = list(set(slw[index_slang].formal)) |
| if len(formal)==1: |
| words[index]=formal[0] |
| return words |
| tweet1=tweet.apply(replace_slang_word) |
| tweet1.head() |
| ##Filtering |
| # Menghapus Stopword |
| stopwords =pd.read_csv("Stopwords.txt") |
| print(stopwords) |
| #fungsi menghapus stopword |
| def stopwords_removal(words): |
| return [word for word in words if word not in stopwords] |
| tweet2=tweet2.apply(stopwords_removal) |
| tweet2.head() |
| ##Stemming |
| from Sastrawi.Stemmer.StemmerFactory import StemmerFactory |
| # create stemmer |
| factory = StemmerFactory() |
| stemmer = factory.create_stemmer() |
| def stemmer_func(word): |
| return stemmer.stem(word) |
| word_dict = {} |
| for document in tweet: |
| for word in document: |
| if word not in word_dict: |
| word_dict[word] = ‘ ’ |
| for word in word_dict: |
| word_dict[word] = stemmer_func(word) |
| def get_stemmer_word(document): |
| return [word_dict[word] for word in document] |
| tweet3=tweet2.apply(get_stemmer_word) |
| tweet3.head() |
Appendix C
| Listing A3. RStudio version 2023.12.1 Sentiment Scoring and Labelling Syntax. |
| #Sentimen tweet |
| data<-read.csv("D:/Skripsi/Data/data fix/data- |
| clean3.csv",header=TRUE,sep=",") |
| colnames(data) |
| kalimat=data$tweetclean |
| pos<-read.csv("D:/Skripsi/Data/data fix/Posi- |
| tive.csv",header=FALSE,sep=",") |
| nrow(pos) |
| neg<-read.csv("D:/Skripsi/Data/data fix/Nega- |
| tive.csv",header=FALSE,sep=",") |
| nrow(neg) |
| kata.positif=pos$V1 |
| head(kata.positif) |
| kata.negatif=neg$V1 |
| library(plyr) |
| library(stringr) |
| score.sentiment=function(kalimat2,kata.positif,kata.negatif,.pro- |
| gress=‘none’){ |
| require(plyr) |
| require(stringr) |
| scores = laply(kalimat2, function(kalimat, kata.positif, |
| kata.negatif) { |
| kalimat = gsub(‘[[:punct:]]’, ‘’, kalimat) |
| kalimat = gsub(‘[[:cntrl:]]’, ‘’, kalimat) |
| kalimat = gsub(‘\\d+’, ‘’, kalimat) |
| kalimat = tolower(kalimat) |
| list.kata = str_split(kalimat,‘\\s+’) |
| kata2 = unlist(list.kata) |
| positif.matches = match(kata2, kata.positif) |
| negatif.matches = match(kata2, kata.negatif) |
| positif.matches = !is.na(positif.matches) |
| negatif.matches = !is.na(negatif.matches) |
| score = sum(positif.matches) - (sum(negatif.matches)) |
| return(score) |
| }, kata.positif, kata.negatif, .progress=.progress) |
| scores.df = data.frame(score=scores, text=kalimat) |
| return(scores.df) |
| } |
| hasil = score.sentiment(kalimat, kata.positif, kata.negatif) |
| View(hasil) |
| #CONVERT SCORE TO SENTIMENT |
| hasil$klasifikasi<- ifelse(hasil$score<0, "Negatif", |
| ifelse(hasil$score==0, "Netral", "Posi- |
| tif")) |
| #skor sentiment |
| hasil$score2<- ifelse(hasil$score<0, -1,ifelse(hasil$score==0,0,1)) |
| #EXCHANGE ROW SEQUENCE |
| data["sentimen"]<-hasil$score |
| data["score2"]<-hasil$score2 |
| data["klasifikasi"]<-hasil$klasifikasi |
| head(data,3) |
| write.csv(data, file = "D:/Skripsi/Data/data |
| fix/asencleandatalast3.csv") |
Appendix D
| Listing A4. RStudio Visualisation Syntax. |
| #bigram |
| data0=read.csv("D:/Skripsi/Data/data |
| fix/asencleandatalast.csv",header=TRUE,sep=",") |
| library(dplyr) |
| library(tidytext) |
| nrow(data0) |
| tweet_bigrams0 <- data0 %>% |
| unnest_tokens(bigram, tweetclean, token="ngrams", n = 2) %>% |
| filter(!is.na(bigram)) |
| head(tweet_bigrams0,3) |
| bigram0=data.frame(klasifikasi=tweet_bigrams0$klasifikasi, |
| bigram=tweet_bigrams0$bigram) |
| head(bigram0,5) |
| nrow(bigram0) |
| write.csv(bigram0, file = "D:/Skripsi/Data/data |
| fix/bigramlast.csv") |
| bigramfpng <- tweet_bigrams0 %>% |
| count(bigram,klasifikasi, sort = TRUE) |
| head(bigramfpng,5) |
| nrow(bigramfpng) |
| write.csv(bigramfpng, file = "D:/Skripsi/Data/data |
| fix/bigramfpnglast.csv") |
| #wordcloud sentimen |
| library(RColorBrewer) |
| library(wordcloud2) |
| library(ggplot2) |
| dpositif <- filter(bigramfpng, klasifikasi=="Positif", n < 900) |
| dnegatif <- filter(bigramfpng, klasifikasi=="Negatif", n < 170) |
| positif <- data.frame(dpositif$bigram,dpositif$n) |
| negatif <- data.frame(dnegatif$bigram,dnegatif$n) |
| head(positif) |
| head(negatif) |
| #wordcloud |
| wordcloud2(positif,backgroundColor="white", |
| color = ‘blue’, size=0.2) |
| wordcloud2(negatif,backgroundColor="white", |
| color = ‘red’, size=0.2) |
Appendix E
| Listing A5. Google Colaboratory Pre-Classification Syntax. |
| !pip install numpy |
| !pip install pandas |
| !pip install matplotlib |
| !pip install seaborn |
| !pip install nltk |
| import numpy as np |
| import pandas as pd |
| import re |
| import nltk |
| import seaborn as sns |
| import matplotlib.pyplot as plt |
| data=pd.read_csv("asencleandata2.csv") |
| # Cek attribut dataset |
| display(tweets.columns) |
| # Cek jumlah baris dan kolom dataset |
| display(tweets.shape) |
| # Cek jumlah review positive dan negative |
| plt.figure(figsize=(12,5)) |
| sns.countplot(x=‘klasifikasi’, data=tweets) |
| plt.title(‘Distribusi class sentiment Tweet’, fontsize=16) |
| plt.ylabel(‘Class Counts’, fontsize=16) |
| plt.xlabel(‘Class Label’, fontsize=16) |
| plt.xticks(rotation=‘vertical’); |
| from sklearn.preprocessing import LabelEncoder |
| X = tweets.iloc[:, 14].values |
| le = LabelEncoder() |
| le.fit(["Positif", "Negatif"]) |
| print(list(le.classes_)) |
| y = le.transform(tweets.iloc[:, 17].values) |
| # Membuat empty List |
| processed_tweets = [] |
| for tweet in range(0, len(X)): |
| # Hapus semua special characters |
| processed_tweet = re.sub(r‘\W’, ‘ ’, str(X[tweet])) |
| # Hapus semua single characters |
| processed_tweet = re.sub(r‘\s+[a-zA-Z]\s+’, ‘ ’, |
| processed_tweet) |
| # Hapus single characters dari awal |
| processed_tweet = re.sub(r‘\^[a-zA-Z]\s+’, ‘ ’, |
| processed_tweet) |
| # Substitusi multiple spaces dengan single space |
| processed_tweet= re.sub(r‘\s+’, ‘ ’, processed_tweet, |
| flags=re.I) |
| # Hapus prefixed ‘b’ |
| processed_tweet = re.sub(r‘^b\s+’, ‘’, processed_tweet) |
| # Ubah menjadi Lowercase |
| processed_tweet = processed_tweet.lower() |
| # Masukkan ke list kosong yang telah dibuat sebelumnya |
| processed_tweets.append(processed_tweet) |
Appendix F
| Listing A6. Google Collaboratory TF-IDF, Data Splitting, BorderlineSMOTE, SVM Classification Modelling, SVM Classification Modelling and Model Evaluation Syntax. |
| from sklearn.feature_extraction.text import TfIdfVectorizer |
| vectorizer= TfIdfVectorizer(ngram_range=(2,2)) |
| features_transformed = vectorizer.fit_transform |
| (processed_tweets).toarray() |
| from sklearn.model_selection import train_test_split |
| X_train, X_test, y_train, y_test = |
| train_test_split(features_transformed, y, |
| test_size=0.2, random_state=0) |
| print(len(X_train)) |
| print(len(X_test)) |
| print(len(y_train)) |
| print(len(y_test)) |
| from imblearn.over_sampling import BorderlineSMOTE |
| from collections import Counter |
| counter = Counter(y) |
| print(‘before’,counter) |
| smt=BorderlineSMOTE(type=’borderline-1’) |
| X_train_sm,y_train_sm = smt.fit_transform(X_train,y_train) |
| counter1 = Counter(y_train_sm) |
| print(‘before’,counter1) |
| from sklearn.svm import SVC |
| model = SVC(kernel=’rbf’, C=10, gamma=10) |
| model.fit(X_train_sm,y_train_sm) |
| y_pred = model.predict(X_test) |
| from sklearn.metrics import confusion_matrix |
| confusion_matrix(y_test, y_pred) |
| from sklearn.metrics import classification_report |
| classification_report(y_test, y_pred) |
Appendix G
| Listing A7. Google Collaboratory Testing Predictions and Visualisation of Classification Models Syntax. |
| tweet = "rw ku juga mantan napi kasus pembunuhan ya gitu deh |
| memperkaya diri jalan jelek ya dibiarin dan ga ada |
| pengambilan sampah juga uda mo th aku tinggal |
| disini miris" |
| # vectorizing |
| from sklearn.feature_extraction.text import TfidfVectorizer |
| tweet_vector = vectorizer.transform([tweet]).toarray() |
| print(tweet_vector.shape) |
| pred_text = model.predict(tweet_vector) |
| pred_text = le.inverse_transform(pred_text) |
| print(pred_text) |
| tweet = "Kenakalan Remaja Marak, Dadang Kurniawan Sarankan Ini https://t.co/VPTWTIbJKe @dprdjawabarat @PemkabBandung @dprdkabbandung @kotaSOREANG @Gerindra @Partai_Gerindra @GerindraJabar @infojabar @kenakalanremaja @sad_annjjinng" |
| from sklearn.feature_extraction.text import TfidfVectorizer |
| tweet_vector = vectorizer.transform([tweet]).toarray() |
| print(tweet_vector.shape) |
| pred_text = model.predict(tweet_vector) |
| pred_text = le.inverse_transform(pred_text) |
| print(pred_text) |
| tweet = "Laporan soal jalan rusak yang bikin rumah retak-retak cuman ditindaklanjuti Dinas PUPR @ProkopimKabBdg dengan menambal alakadarnya. Padahal lokasinya cuman 3KM dari kantor mereka, Bupati pun sering lewat. Tapi.. ah sudahlah, semoga dibalas Allah dengan jalan hidupnya yang rusak!" |
| from sklearn.feature_extraction.text import TfidfVectorizer |
| tweet_vector = vectorizer.transform([tweet]).toarray() |
| print(tweet_vector.shape) |
| pred_text = model.predict(tweet_vector) |
| pred_text = le.inverse_transform(pred_text) |
| print(pred_text) |
| tweet = "Kecewa! Peserta Event Motor Trail Merasa Dibohongi Panitia #rancaupas #motortrail #EVENT #ciwidey #kabupatenbandung #VideoViral #kompastvbandung" |
| from sklearn.feature_extraction.text import TfidfVectorizer |
| tweet_vector = vectorizer.transform([tweet]).toarray() |
| print(tweet_vector.shape) |
| pred_text = model.predict(tweet_vector) |
| pred_text = le.inverse_transform(pred_text) |
| print(pred_text) |
| tweet = "@persikab Stadion elit , bayar gaji syulit" |
| from sklearn.feature_extraction.text import TfidfVectorizer |
| tweet_vector = vectorizer.transform([tweet]).toarray() |
| print(tweet_vector.shape |
| pred_text = model.predict(tweet_vector) |
| pred_text = le.inverse_transform(pred_text) |
| print(pred_text) |
| import pandas as pd |
| import numpy as np |
| from sklearn.svm import SVC |
| from sklearn.decomposition import PCA |
| from matplotlib import pyplot as plt |
| %matplotlib inline |
| from matplotlib.colors import ListedColormap |
| # Reduce dimensions to 2D using PCA |
| pca = PCA(n_components=2) |
| X_train_sm_pca = pca.fit_transform(X_train_sm) |
| # Transform test data for visualization |
| X_train_pca = pca.transform(X_train) |
| X_test_pca = pca.transform(X_test) |
| # Train SVM with a sigmoid kernel |
| model = SVC(kernel=‘sigmoid’, gamma=1, C=10) |
| model.fit(X_train_sm_pca, y_train_sm) |
| x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 |
| y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 |
| xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), |
| np.arange(y_min, y_max, 0.01)) |
| Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) |
| Z = Z.reshape(xx.shape) |
| plt.contourf(xx, yy, Z, alpha=0.75, cmap=ListedColormap([‘darkma- |
| genta’, ‘olive’])) |
| # Plotting |
| plt.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap([‘darkmagenta’, |
| ‘olive’])) |
| plt.xlabel(‘Sepal length’) |
| plt.ylabel(‘Sepal width’) |
| plt.xlim(xx.min(), xx.max()) |
| plt.ylim(yy.min(), yy.max()) |
| plt.show() |
References
- Statista. Number of Social Network Users in Selected Countries in 2023 and 2029. Available online: https://www.statista.com/statistics/278341/number-of-social-network-users-in-selected-countries/ (accessed on 10 October 2024).
- APJII. Survei Penetrasi Internet Indonesia 2024. 2024. Available online: https://survei.apjii.or.id/survei/group/9 (accessed on 10 October 2024).
- Annur, C.M. 10 Negara Dengan Jumlah Pengguna Twitter Terbanyak di Dunia (Juli 2023). Available online: https://databoks.katadata.co.id/media/statistik/5cb357372e82c2d/jumlah-pengguna-twitter-indonesia-duduki-peringkat-ke-4-dunia-per-juli-2023 (accessed on 10 October 2024).
- Negara, E.S.; Andryani, R.; Saksono, P.H. Analisis Data Twitter: Ekstraksi dan Analisis Data Geospasial. INKOM J. Inform. Control. Syst. Comput. 2016, 10, 27–36. [Google Scholar] [CrossRef]
- Tonkin, E.L. A Day at Work (with Text). In Working with Text: Tools, Techniques and Approaches for Text Mining; Tonkin, E.L., Tourte, G.J.L., Eds.; Elsevier: Amsterdam, The Netherlands, 2016; Chapter 2; pp. 23–60. [Google Scholar] [CrossRef]
- Tan, P.-N.; Steinbach, M.; Kumar, V. Introduction to Data Mining; Pearson: Essex, UK, 2014. [Google Scholar]
- Song, L.-K.; Tao, F.; Li, X.-Q.; Yang, L.-C.; Wei, Y.-P.; Beer, M. Physics-embedding multi-response regressor for time-variant system reliability assessment. Reliab. Eng. Syst. Saf. 2025, 263, 111262. [Google Scholar] [CrossRef]
- Aliyu, A. Understanding the Fundamental Concept of Sentiment Analysis. Available online: https://medium.com/@datathon/introduction-to-sentiment-analysis-c8cd6228313f (accessed on 13 October 2024).
- Feldman, R.; Sanger, J. The Text Mining Handbook; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar] [CrossRef]
- Weiss, S.M.; Indurkhya, N.; Zhang, T.; Damerau, F.J. Text Mining; Springer: New York, NY, USA, 2005. [Google Scholar] [CrossRef]
- De Bock, K.W. Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships; Taylor & Francis: Abingdon, UK, 2016. [Google Scholar]
- Aroraa, G.; Lele, C.; Jindal, M. Data Analytics: Principles, Tools, and Practices; BPB Publications: New Delhi, India, 2022. [Google Scholar]
- Haryanto, E.M.O.N.; Estetikha, A.K.A.; Setiawan, R.A. Implementasi SMOTE Untuk Mengatasi Imbalanced Data Pada Sentimen Analisis Sentimen Hotel Di Nusa Tenggara Barat Dengan Menggunakan Algoritma SVM. J. Inf. Interaktif 2022, 7, 16–20. [Google Scholar]
- Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
- Patle, A.; Chouhan, D.S. SVM kernel functions for classification. In 2013 International Conference on Advances in Technology and Engineering (ICATE); IEEE: New York, NY, USA, 2013; pp. 1–9. [Google Scholar] [CrossRef]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer Science + Business Media, LLC.: New York, NY, USA, 2006. [Google Scholar]
- Ginanjar, I.; Pasaribu, U.S.; Indratno, S.W. A Measure for Objects Clustering in Principal Component Analysis Biplot: A Case Study in Inter-city Buses Maintenance Cost Data. In AIP Conference Proceedings; Andriyana, Y., Suparman, Y., Suprijadi, J., Eds.; American Institute of Physics: College Park, MD, USA, 2017; pp. 1–7. [Google Scholar] [CrossRef]
- Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]




| Source Name | Created at | Full Text |
|---|---|---|
| #kabbandung (hashtag) | Mon Jul 17 08:33:23 +0000 2023 | Wargi Bandung Bedas… Bupati juga berpesan kepada para PPPK, untuk menjalankan semua tugas secara optimal, profesional, bersatu dalam satu komando, adaptif dengan dinamisasi teknologi informasi. #kabbandung #bandungbedas #pppk #skbupati https://t.co/3JiQPK2IOa (tweet posted on 17 July 2023; accessed during data collection: January 2024) |
| bandungpemkab (mention) | Mon May 08 17:14:45 +0000 2023 | Serius nanya, emang tiap hujan banjir wae wajar kitu? @bandungpemkab @humasjabar |
| dinkes_kab_bdg2 (reply) | Sat Jan 28 09:38:29 +0000 2023 | @DINKES_KAB_BDG boleh info jadwal vaksin COVID ke 1, utk anak 12 thn di kab.bandung |
| Predictions | |||
| Positive | Negative | ||
| Actuals | Positive | ||
| Negative | |||
| Before Preprocessing | After Preprocessing |
|---|---|
| Wargi Bandung Bedas… Bupati juga berpesan kepada para PPPK, untuk menjalankan semua tugas secara optimal, profesional, bersatu dalam satu komando, adaptif dengan dinamisasi teknologi informasi. #kabbandung #bandungbedas #pppk #skbupati https://t.co/3JiQPK2IOa, accessed on 21 December 2025 | [‘warga’, ‘bandung’, ‘bedas’, ‘bupati’, ‘pesan’, ‘pppk’, ‘jalan’, ‘tugas’, ‘optimal’, ‘profesional’, ‘satu’, ‘satu’, ‘komando’, ‘adaptif’, ‘dinamisasi’, ‘teknologi’, ‘informasi’] |
| Text After Preprocessing | Positive | Negative | Score | Label |
|---|---|---|---|---|
| [‘warga’, ‘bandung’, ‘bedas’, ‘bupati’, ‘pesan’, ‘pppk’, ‘jalan’, ‘tugas’, ‘optimal’, ‘profesional’, ‘satu’, ‘satu’, ‘komando’, ‘adaptif’, ‘dinamisasi’, ‘teknologi’, ‘informasi’] | 2 | 0 | 2 | Positive |
| TF-IDF Bigram | gaji asn | asn telat | bedas bupati | besok libur | libur semester | ||
|---|---|---|---|---|---|---|---|
| 1 | 9.0895 | 9.0895 | 0 | 0 | 0 | ||
| 2 | 0 | 0 | 0 | 0 | 0 | ||
| 165 | 0 | 0 | 2.7995 | 0 | 0 | ||
| 4108 | 0 | 0 | 0 | 0 | 0 | ||
| 4109 | 0 | 0 | 0 | 9.0895 | 9.0895 |
| Gamma ( ) | Degree | Positive Recall | Negative Recall | Gamma () | Degree | Positive Recall | Negative Recall | ||
|---|---|---|---|---|---|---|---|---|---|
| 0.1 | 0.1 | 2 | 1 | 0.0762 | 0.1 | 0.1 | 3 | 1 | 0.0762 |
| 0.1 | 1 | 2 | 1 | 0.0905 | 0.1 | 1 | 3 | 1 | 0.081 |
| 0.1 | 10 | 2 | 0.1095 | 0.9952 | 0.1 | 10 | 3 | 0.0572 | 1 |
| 1 | 0.1 | 2 | 1 | 0.0762 | 1 | 0.1 | 3 | 1 | 0.0762 |
| 1 | 1 | 2 | 0.9984 | 0.2238 | 1 | 1 | 3 | 1 | 0.1571 |
| 1 | 10 | 2 | 0.2238 | 0.9952 | 1 | 10 | 3 | 0.0572 | 1 |
| 10 | 0.1 | 2 | 1 | 0.0952 | 10 | 0.1 | 3 | 1 | 0.0762 |
| 10 | 1 | 2 | 0.1095 | 0.9952 | 10 | 1 | 3 | 0.0572 | 1 |
| 10 | 10 | 2 | 0.1095 | 0.9952 | 10 | 10 | 3 | 0.0572 | 1 |
| Gamma () | Positive Recall | Negative Recall | |
|---|---|---|---|
| 0.1 | 0.1 | 0.9951 | 0.1095 |
| 0.1 | 1 | 0.9951 | 0.1048 |
| 0.1 | 10 | 0.9951 | 0.0952 |
| 1 | 0.1 | 0.9951 | 0.1333 |
| 1 | 1 | 0.9951 | 0.1762 |
| 1 | 10 | 0.9951 | 0.1381 |
| 10 | 0.1 | 0.9935 | 0.2667 |
| 10 | 1 | 0.9951 | 0.2143 |
| 10 | 10 | 0.9951 | 0.1381 |
| Gamma () | Positive Recall | Negative Recall | |
|---|---|---|---|
| 0.1 | 0.1 | 1 | 0.0952 |
| 0.1 | 1 | 0.6013 | 0.7429 |
| 0.1 | 10 | 0.6029 | 0.7952 |
| 1 | 0.1 | 0.598 | 0.7429 |
| 1 | 1 | 0.7663 | 0.7143 |
| 1 | 10 | 0.7843 | 0.6952 |
| 10 | 0.1 | 0.7549 | 0.7286 |
| 10 | 1 | 0.5909 | 0.8301 |
| 10 | 10 | 0.7533 | 0.7048 |
| Feature | Balancing | Negative Recall | Positive Recall | Accuracy | Average Precision | Average F1-Score |
|---|---|---|---|---|---|---|
| Unigram | No | 0.8010 | 0.9675 | 0.9258 | 0.9138 | 0.8988 |
| Bigram | No | 0.3010 | 0.9935 | 0.8200 | 0.8745 | 0.7439 |
| Unigram | Yes | 0.8107 | 0.9627 | 0.9246 | 0.9086 | 0.8975 |
| Bigram | Yes | 0.8301 | 0.5909 | 0.6509 | 0.6583 | 0.6834 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ginanjar, I.; Shabir, A.M.; Pravitasari, A.A.; Pangastuti, S.S.; Darmawan, G.; Sukono. Sentiment Analysis of X Users Regarding Bandung Regency Using Support Vector Machine. Appl. Sci. 2026, 16, 560. https://doi.org/10.3390/app16010560
Ginanjar I, Shabir AM, Pravitasari AA, Pangastuti SS, Darmawan G, Sukono. Sentiment Analysis of X Users Regarding Bandung Regency Using Support Vector Machine. Applied Sciences. 2026; 16(1):560. https://doi.org/10.3390/app16010560
Chicago/Turabian StyleGinanjar, Irlandia, Abdan Mulkan Shabir, Anindya Apriliyanti Pravitasari, Sinta Septi Pangastuti, Gumgum Darmawan, and Sukono. 2026. "Sentiment Analysis of X Users Regarding Bandung Regency Using Support Vector Machine" Applied Sciences 16, no. 1: 560. https://doi.org/10.3390/app16010560
APA StyleGinanjar, I., Shabir, A. M., Pravitasari, A. A., Pangastuti, S. S., Darmawan, G., & Sukono. (2026). Sentiment Analysis of X Users Regarding Bandung Regency Using Support Vector Machine. Applied Sciences, 16(1), 560. https://doi.org/10.3390/app16010560

