Sentiment Analysis of Indonesian TikTok Review Using LSTM and IndoBERTweet Algorithm

Jerry Cahyo Setiawan - [ http://orcid.org/0000-0002-9635-545X ]
Kemas M. Lhaksmana
Bunyamin Bunyamin


DOI: https://doi.org/10.29100/jipi.v8i3.3911

Abstract


TikTok is currently the most popular app in the world and thus gets many reviews on the Google Play Store and other app marketplace platforms. These reviews are valuable user opinions that can be analyzed further for many purposes. Harnessing valuable analyses from these reviews can be obtained manually, which will be time-consuming and costly, or automatically with machine learning methods. This paper implements the latter with LSTM and IndoBERTweet, a derivative of BERT, using Indonesian vocabulary from Twitter post data. This research aims to determine the appropriate method to create a model that can automatically classify TikTok reviews into negative, neutral, and positive sentiments. The result demonstrates that IndoBERTweet outperforms the other, with an accuracy of 80%, whereas the LSTM accuracy is at 78%.

Keywords


Sentiment Analysis; LSTM; IndoBERTweet; NLP; TikTok

Full Text:

PDF

Article Metrics :

References


P. A. Permatasari, L. Linawati, and L. Jasa, “Survei Tentang Analisis Sentimen Pada Media Sosial,” Majalah Ilmiah Teknologi Elektro, vol. 20, no. 2, pp. 177–186, Dec. 2021, doi: 10.24843/mite.2021.v20i02.p01.

L. Stappen, E. Cambria, and B. W. Schuller, “Sentiment Analysis and Topic Recognition in Video Transcriptions,” Department: Affec-tive Computing and Sentiment Analysis, vol. 36, no. 2, pp. 88–95, Apr. 2021, doi: 10.1109/mis.2021.3062200.

O. Somantri and D. Apriliani, “Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung dan Restoran Kuliner Kota Tegal,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 5, pp. 537–548, Oct. 2018, doi: 10.25126/jtiik.201855867.

B. A. Rachid, H. Azza, and B. G. Henda, “Sentiment analysis approaches based on granularity levels,” in WEBIST 2018 - Proceedings of the 14th International Conference on Web Information Systems and Technologies, 2018, pp. 324–331. doi: 10.5220/0007187603240331.

D. Wahyudi and Y. Sibaroni, “Deep Learning for Multi-Aspect Sentiment Analysis of TikTok App using the RNN-LSTM Method,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 169–177, Jun. 2022, doi: 10.47065/bits.v4i1.1665.

R. Mas, R. W. Panca, K. Atmaja, and W. Yustanti, “Analisis Sentimen Customer Review Aplikasi Ruang Guru dengan Metode BERT (Bidirectional Encoder Representations from Transformers),” JEISBI, vol. 2, no. 3, p. 2021, Jul. 2021, [Online]. Available: ejour-nal.unesa.ac.id/index.php/JEISBI/article/view/41567

F. Koto, J. H. Lau, and T. Baldwin, “IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,” in Association for Computational Linguistics, 2021, pp. 10660–10668. doi: 10.18653/v1/2021.emnlp-main.833.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed Tools Appl, pp. 1–32, Jul. 2022, doi: 10.1007/s11042-022-13428-4.

Q. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler, “What Is Machine Learning: a Primer for the Epidemiologist Qifang,” Am J Epi-demiol, vol. 188, no. 12, pp. 2222–2239, Dec. 2019, doi: https://doi.org/10.1093/aje/kwz189.

A. A. V. A. Jayaweera, Y. N. Senanayake, and P. S. Haddela, “Dynamic Stopword Removal for Sinhala Language,” in 2019 National Information Technology Conference (NITC), Oct. 2019, pp. 1–6. doi: 10.1109/NITC48475.2019.9114476.

S. Ahmadi, “A Tokenization System for the Kurdish Language,” in Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020, pp. 114–127. [Online]. Available: https://aclanthology.org/2020.vardial-1.11

D. Khyani, S. B. S, N. N. M, and D. B. M, “An Interpretation of Lemmatization and Stemming in Natural Language Processing,” Jour-nal of University of Shanghai for Science and Technology, vol. 22, no. 10, pp. 350–357, Oct. 2021, [Online]. Available: https://www.researchgate.net/publication/348306833

K. K. Purnamasari and I. S. Suwardi, “Rule-based Part of Speech Tagger for Indonesian Language,” in IOP Conference Series: Materi-als Science and Engineering, 2018, vol. 407, no. 1, p. 012151. doi: 10.1088/1757-899X/407/1/012151.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Doc-ument Classification Using Sastrawi,” in IOP Conference Series: Materials Science and Engineering, 2020, vol. 874, no. 1, p. 012017. doi: 10.1088/1757-899X/874/1/012017.

V. Nasteski, “An overview of the supervised machine learning methods,” Horizons. B, vol. 4, pp. 51–62, Dec. 2017, doi: 10.20544/horizons.b.04.1.17.p05.

E. Breck, N. Polyzotis, S. Roy, S. E. Whang, and M. Zinkevich, “Data Validation for Machine Learning,” in Proceedings of Machine Learning and Systems 1, 2019, pp. 334–347. [Online]. Available: https://proceedings.mlsys.org/paper/2019/file/5878a7ab84fb43402106c575658472fa-Paper.pdf

Dr. G. S. N. Murthy, S. R. Allu, B. Andhavarapu, M. Bagadi, and M. Belusonti, “Text based Sentiment Analysis using LSTM,” Inter-national Journal of Engineering Research and, vol. V9, no. 5, pp. 299–303, May 2020, doi: 10.17577/ijertv9is050290.

P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec and Long Short-Term Memory (LSTM) for Indonesian Hotel Reviews,” in Procedia Computer Science, 2021, vol. 179, no. 2020, pp. 728–735. doi: 10.1016/j.procs.2021.01.061.

G. W. Anderson and D. J. Castaño, “Measures of fine tuning,” Physics Letters B, vol. 347, no. 3–4, pp. 300–308, Mar. 1995, doi: 10.1016/0370-2693(95)00051-L.

P. Izsak, M. Berchansky, and O. Levy, “How to Train BERT with an Academic Budget,” in Conference on Empirical Methods in Natu-ral Language Processing, Proceedings, Sep. 2021. doi: 10.18653/v1/2021.emnlp-main.831.

K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken Dw, F. A. Bachtiar, and N. Yudistira, “BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,” in ACM International Conference Proceeding Series, 2021, pp. 258–264. doi: 10.1145/3479645.3479679.

D. Rengasamy, M. Jafari, B. Rothwell, X. Chen, and G. P. Figueredo, “Deep learning with dynamically weighted loss function for sen-sor-based prognostics and health management,” Sensors (Switzerland), vol. 20, no. 3, Jan. 2020, doi: 10.3390/s20030723.

N. Tatbul, T. J. Lee, S. Zdonik, M. Alam, and J. Gottschlich, “Precision and recall for time series,” in Advances in Neural Information Processing Systems, Dec. 2018, pp. 1–11. doi: doi.org/10.48550/arXiv.1803.03639.

P. A. Flach and M. Kull, “Precision-Recall-Gain curves: PR analysis done right,” in Advances in neural information processing systems 28, 2015, pp. 1–9.

D. K. Wright, “Accuracy vs. Precision: Understanding Potential Errors from Radiocarbon Dating on African Landscapes,” African Archaeological Review, vol. 34, no. 3, pp. 303–319, Jun. 2017, doi: 10.1007/s10437-017-9257-z.

E. Beauxis-aussalet and L. Hardman, “Visualization of Confusion Matrix for Non-Expert Users,” in IEEE Information Visualization (InfoVis 2014), 2014.

S. Saadah, Kaenova Mahendra Auditama, Ananda Affan Fattahila, Fendi Irfan Amorokhman, Annisa Aditsania, and Aniq Atiqi Rohmawati, “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indo-nesia,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 648–655, 2022, doi: 10.29207/resti.v6i4.4215.