Implementation of BiLSTM and IndoBERT for Sentiment Analysis of TikTok Reviews



DOI: https://doi.org/10.29100/jipi.v10i1.5815

Abstract


The significant increase in users on TikTok has led to a notable rise in the number of reviews in the form of opinions given to the application. The large number of opinions can be analyzed to identify the prevailing sentiment among the community towards the application. The sentiment analysis method employing machine learning is particularly well-suited to this problem due to its practicality and efficiency. The objective of this research is to develop a model that can be utilized as a sentiment analysis tool with a high degree of accuracy. In this research, the BiLSTM algorithm, combined with IndoBERT, a pre-trained model, is employed. The BiLSTM can comprehend the interrelationships between words within a sentence in a bidirectional manner. IndoBERT is pertinent to this research as it is a model that has been fine-tuned using Indonesian language datasets from various sources on the Internet. To support this research, a scenario was created by considering various aspects when adding methods as an optimization scheme until the optimal model was identified. The outcomes of experimentation demonstrate that sentiment analysis using the BiLSTM+IndoBERT method achieved the highest accuracy, reaching 81% in the classification report test and an average accuracy of 92.03% in cross-validation testing with a total of 10 folds.

Keywords


TikTok; Sentiment Analysis; BiLSTM; IndoBERT; Deep Learning

Full Text:

PDF

Article Metrics :

References


P. A. Permatasari, L. Linawati, and L. Jasa, “Survei Tentang Analisis Sentimen Pada Media Sosial,” Majalah Ilmiah Teknologi Elektro, vol. 20, no. 2, pp. 177–186, Dec. 2021, doi: 10.24843/mite.2021.v20i02.p01.

L. Stappen, A. Baird, E. Cambria, B. W. Schuller, and E. Cambria, “Sentiment Analysis and Topic Recognition in Video Transcriptions,” IEEE Intell Syst, vol. 36, no. 2, pp. 88–95, Apr. 2021, doi: 10.1109/MIS.2021.3062200.

O. Somantri and D. Apriliani, “Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung dan Restoran Kuliner Kota Tegal,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 5, pp. 537–548, Oct. 2018, doi: 10.25126/jtiik.201855867.

J. C. Setiawan, K. M. Lhaksmana, and B. Bunyamin, “Sentiment Analysis of Indonesian TikTok Review Using LSTM and IndoBERTweet Algo-rithm,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 8, no. 3, pp. 774–780, 2023, doi: 10.29100/jipi.v8i3.3911.

S. Tam, R. Ben Said, and Ö. Tanriöver, “A ConvBiLSTM Deep Learning Model-Based Approach for Twitter Sentiment Classification,” IEEE Access, vol. 9, pp. 41283–41293, 2021, doi: 10.1109/ACCESS.2021.3064830.

Y. Huang, Y. Jiang, T. Hasan, Q. Jiang, and C. Li, “Topic BiLSTM model for sentiment classification,” ACM International Conference Proceed-ing Series, vol. Part F1376, pp. 143–147, 2018, doi: 10.1145/3194206.3194240.

R. Mas, R. W. Panca, K. Atmaja, and W. Yustanti, “Analisis Sentimen Customer Review Aplikasi Ruang Guru dengan Metode BERT (Bidirec-tional Encoder Representations from Transformers),” JEISBI, vol. 2, no. 3, p. 2021, Jul. 2021, [Online]. Available: ejour-nal.unesa.ac.id/index.php/JEISBI/article/view/41567

S. Saadah, Kaenova Mahendra Auditama, Ananda Affan Fattahila, Fendi Irfan Amorokhman, Annisa Aditsania, and Aniq Atiqi Rohmawati, “Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 648–655, 2022, doi: 10.29207/resti.v6i4.4215.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed Tools Appl, pp. 1–32, Jul. 2022, doi: 10.1007/s11042-022-13428-4.

Q. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler, “What Is Machine Learning: a Primer for the Epidemiologist Qifang,” Am J Epidemiol, vol. 188, no. 12, pp. 2222–2239, Dec. 2019, doi: https://doi.org/10.1093/aje/kwz189.

A. A. V. A. Jayaweera, Y. N. Senanayake, and P. S. Haddela, “Dynamic Stopword Removal for Sinhala Language,” in 2019 National Infor-mation Technology Conference (NITC), Oct. 2019, pp. 1–6. doi: 10.1109/NITC48475.2019.9114476.

K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken Dw, F. A. Bachtiar, and N. Yudistira, “BERT Fine-Tuning for Sentiment Analysis on Indone-sian Mobile Apps Reviews,” ACM International Conference Proceeding Series, pp. 258–264, 2021, doi: 10.1145/3479645.3479679.

B. Juarto and Yulianto, “Indonesian News Classification Using IndoBert,” International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 2, pp. 454–460, 2023.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, no. Mlm, pp. 4171–4186, 2019.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indone-sian NLP,” COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference, pp. 757–770, 2020, doi: 10.18653/v1/2020.coling-main.66.

S. Ahmadi, “A Tokenization System for the Kurdish Language,” Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 114–127, 2020, [Online]. Available: https://aclanthology.org/2020.vardial-1.11

D. Khyani, S. B. S, N. N. M, and D. B. M, “An Interpretation of Lemmatization and Stemming in Natural Language Processing,” Journal of Uni-versity of Shanghai for Science and Technology, vol. 22, no. 10, pp. 350–357, Oct. 2021, [Online]. Available: https://www.researchgate.net/publication/348306833

V. Nasteski, “An overview of the supervised machine learning methods,” Horizons.B, vol. 4, no. December 2017, pp. 51–62, 2017, doi: 10.20544/horizons.b.04.1.17.p05.

E. Breck, N. Polyzotis, S. Roy, S. E. Whang, and M. Zinkevich, “Data Validation for Machine Learning,” Proceedings of Machine Learning and Systems 1 (MLSys 2019), pp. 334–347, 2019, [Online]. Available: https://proceedings.mlsys.org/paper/2019/file/5878a7ab84fb43402106c575658472fa-Paper.pdf

Dr. G. S. N. Murthy, S. R. Allu, B. Andhavarapu, M. Bagadi, and M. Belusonti, “Text based Sentiment Analysis using LSTM,” International Journal of Engineering Research and, vol. V9, no. 5, pp. 299–303, May 2020, doi: 10.17577/ijertv9is050290.

A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review,” Artif Intell Rev, vol. 53, no. 6, pp. 4335–4385, 2020, doi: 10.1007/s10462-019-09794-5.

J. Xie, B. Chen, X. Gu, F. Liang, and X. Xu, “Self-Attention-Based BiLSTM Model for Short Text Fine-Grained Sentiment Classification,” IEEE Access, vol. 7, pp. 180558–180570, 2019, doi: 10.1109/ACCESS.2019.2957510.

E. Beauxis-aussalet and L. Hardman, “Visualization of Confusion Matrix for Non-Expert Users,” in IEEE Information Visualization (InfoVis 2014), 2014.

J. M. Gorriz, F. Segovia, J. Ramirez, A. Ortiz, and J. Suckling, “Is K-fold cross validation the best model selection method for Machine Learning?,” no. Ml, 2024, [Online]. Available: http://arxiv.org/abs/2401.16407

Y. Liu, J. Lu, J. Yang, and F. Mao, “Sentiment analysis for e-commerce product reviews by deep learning model of Bert-BiGRU-Softmax,” Math-ematical Biosciences and Engineering, vol. 17, no. 6, pp. 7819–7837, 2020, doi: 10.3934/MBE.2020398.