Detection of Indonesian Hate Speech in the Comments Column of Indone-sian Artists' Instagram Using the RoBERTa Method

Adhe Akram Azhari
Yuliant Sibaroni
Sri Suryani Prasetiyowati


DOI: https://doi.org/10.29100/jipi.v8i3.3898

Abstract


This study detects hate speech comments from Instagram post comments where the method used is RoBERTa. Roberta's model was chosen based on the consideration that this model has a high level of accuracy in classifying text in English compared to other models, and possibly has good potential in detecting Indonesian as used in this research. There are two test scenarios namely full-preprocessing and non full-preprocessing where the experimental results show that non full-preprocessing has an average value of accuracy higher than full-preprocessing, and the average value of non full-preprocessing accuracy is 85.09%. Full-preprocessing includes several preprocessing stages, namely cleansing, case folding, normalization, tokenization, and stemming. While non full-preprocessing includes all processes in preprocessing except the stemming process. This shows that RoBERTa predicts comments well when not using full-preprocessing.

Keywords


Instagram, RoBERTa, Full PreProcessing, Non Full-PreProcessing, Hate Speech

Full Text:

PDF

Article Metrics :

References


D. Kusumasari and S. Arifianto, “Makna Teks Ujaran Kebencian Pada Media Sosial,” Jurnal Komunikasi, vol. 12, no. 1, p. 1, Jan. 2020, doi: 10.24912/jk.v12i1.4045.

“Digital around the world in April 2020 - We Are Social UK.” https://wearesocial.com/uk/blog/2020/04/digital-around-the-world-in-april-2020/ (accessed Jan. 29, 2023).

“BUDAYA BERKOMENTAR WARGANET DI MEDIA SOSIAL: UJARAN KEBENCIAN SEBAGAI SEBUAH TREN – Environmen-tal Geography Student Association.” https://egsa.geo.ugm.ac.id/2022/02/06/budaya-berkomentar-warganet-di-media-sosial-ujaran-kebencian-sebagai-sebuah-tren/ (accessed Jan. 29, 2023).

C. Bagdon, “Profiling Spreaders of Hate Speech with N-grams and RoBERTa Notebook for PAN at CLEF 2021,” 2021. [Online]. Available: http://ceur-ws.org

B. P. I. B. & S. C. Putra, “Deteksi Ujaran Kebencian dengan Menggunakan Algoritma Convolutional Neural Network pada Gambar,” 2018.

Oryza Habibie Rahman, Gunawan Abdillah, and Agus Komarudin, “Klasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 17–23, Feb. 2021, doi: 10.29207/resti.v5i1.2700.

Y. Suryani, R. Istianingrum, and S. U. Hanik, “Linguistik Forensik Ujaran Kebencian terhadap Artis Aurel Hermansyah di Media Sosial Instagram,” BELAJAR BAHASA: Jurnal Ilmiah Program Studi Pendidikan Bahasa dan Sastra Indonesia, vol. 6, no. 1, pp. 107–118, Mar. 2021, doi: 10.32528/bb.v6i1.4167.

M. Kurnia Maulidina and E. Itje Sela, . “ANALISIS SENTIMEN KOMENTAR WARGANET TERHADAP POSTINGAN INSTAGRAM MENGGUNAKAN METODE NAÏVE BAYES CLASSIFIER DAN TF-IDF (Studi Kasus: Instagram Gubernur Jawa Barat Ridwan Kamil).”

S. P. P. B. Batara, “Deteksi Ujaran Kebencian Dalam Bahasa Indonesia Pada Kolom Komentar Instagram Dengan Metode Klasifikasi Deep Neural Network,” 2019.

A. Briliani, “Deteksi Ujaran Kebencian Dalam Bahasa Indonesia Pada Kolom Komentar Instagram Dengan Metode Klasifikasi K-Nearest Neighbor,” 2019.

E. Erizal, “Deteksi Ujaran Kebencian Dalam Bahasa Indonesia Pada Kolom Komentar Instagram Dengan Metode Klasifikasi Maxi-mum Entropy,” 2019.

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” Jul. 2019, [Online]. Available: http://arxiv.org/abs/1907.11692

“Glossary.” https://huggingface.co/docs/transformers/glossary (accessed Jan. 29, 2023).

“Dasar Text Preprocessing dengan Python | by Kuncahyo Setyo Nugroho | Medium.” https://ksnugroho.medium.com/dasar-text-preprocessing-dengan-python-a4fa52608ffe (accessed Jan. 29, 2023).

S. Zimmerman, C. Fox, and U. Kruschwitz, “Improving Hate Speech Detection with Deep Learning Ensembles.” [Online]. Available: https://www.economist.com/news/europe/21734410-

“Mengenal Accuracy, Precision, Recall dan Specificity serta yang diprioritaskan dalam Machine Learning | by Resika Arthana | Medi-um.” https://rey1024.medium.com/mengenal-accuracy-precission-recall-dan-specificity-serta-yang-diprioritaskan-b79ff4d77de8 (ac-cessed Jan. 29, 2023).

“Understanding Confusion Matrix | by Sarang Narkhede | Towards Data Science.” https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62 (accessed Jan. 29, 2023).