IDENTIFIKASI JENIS OPERASI DATA MANIPULATION LANGUAGE BERBASIS BILSTM PADA KALIMAT BERBAHASA INDONESIA

Agung Prasetya
Yayak Kartika Sari
Joko Iskandar
Mohamad Khoirul Ansor


DOI: https://doi.org/10.29100/jipi.v9i4.8695

Abstract


Text-to-SQL memungkingkan penggunaan bahasa alami untuk mendapatkan informasi dari database. Melalui pendekatan ini, pengguna non teknis tidak perlu memahami sintaks SQL untuk melakukan query database. Penelitian ini mengusulkan pendekatan berbasis Bidirectional Long Short-Term Memory (BiLSTM) untuk mengklasifikasikan jenis operasi Data Manipulation Language (DML), seperti SELECT, INSERT, UPDATE, dan DELETE, pada kalimat ber-bahasa Indonesia. Pendekatan dibangun dengan merepresentasikan kalimat sebagai urutan vektor kata menggunakan word embedding, lalu diproses oleh arsitektur BiLSTM untuk menangkap konteks sek-uensial dua arah. Dataset berisi 1600 kalimat dari tiga domain utama: pendidikan, e-commerce, dan layanan publik. Setiap kalimat telah dia-notasi sesuai dengan operasi DML yang terkandung. Hasil evaluasi menunjukkan bahwa model BiLSTM mencapai akurasi sebesar 93% dan F1-score sebesar 92%. Analisis per label mengungkapkan bahwa model sangat efektif mengenali operasi SELECT dan INSERT, namun sedikit kesulitan membedakan UPDATE dan DELETE. Penelitian ini menunjukkan bahwa pendekatan BiLSTM mampu mengklasifikasi-kan tipe DML secara efektif dan efisien dalam konteks bahasa Indo-nesia.

Keywords


Bidirectional Long Short-Term Memory; BiLSTM; Data Manipulation Language; Text-to-SQL; Word Embedding

Full Text:

PDF

Article Metrics :

References


R. Zhong, S. Su, and H. Chen, ‘SketchRefine: Hierarchical Text-to-SQL via Sketch and Refine’, Transactions of the Association for Com-putational Linguistics, vol. 11, pp. 210–225, 2023.

T. Yu, R. Zhang, A. Konstantinov, and others, ‘Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task’, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018, pp. 3911–3921.

P. Wang, V. Kishore, B. Keller, and others, ‘RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers’, in Pro-ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 7564–7576.

D. Guo, P. Pasupat, and T. Jaim, ‘Towards Generating Complex Text-to-SQL using Sub-query Encoding’, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, pp. 725–734.

A. Mohammadjafari, A. Maida, and R. Gottumukkala, ‘From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems’, arXiv preprint arXiv:2410.01066, 2024.

T. Mahmud and others, ‘A rule based approach for nlp based query processing’, in 2nd IEEE International Conference on Electrical In-formation and Communication Technologies (EICT), 2015, pp. 78–82.

I. Androutsopoulos, G. D. Ritchie, and P. Thanisch, ‘Natural language interfaces to databases–An introduction’, Journal of Natural Lan-guage Engineering, vol. 1, no. 1, pp. 29–81, 1995.

D. Patel, ‘SyntaxSQLNet: Syntax-tree-based Decoding for Text-to-SQL’, Medium blog, 2019.

D. Lee, J. Yoon, J. Song, and others, ‘One-Shot Learning for Text-to-SQL Generation’, arXiv preprint arXiv:1905.11499, 2019.

A. Elgohary and others, ‘Interactive Text-to-SQL via Editable Step-by-Step Explanations’, in EMNLP 2023, 2023, p.

R. Zhong and others, ‘Addressing Limitations of Encoder–Decoder Based Approach to Text-to-SQL’, in COLING 2022, 2022, p.

X. Xu, C. Liu, and D. Song, ‘SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning’, arXiv preprint arXiv:1711.04436, 2017.

M. authors Unknown, ‘Improving Text-to-SQL with a Hybrid Decoding Method’, Entropy, vol. 25, no. 3, p. 513, 2023.

H. Sankhla, ‘A Comprehensive Survey of LLM-Based Text-to-SQL’, dev.to, 2024.

N. Rajkumar, R. Li, and D. Bahdanau, ‘Evaluating the Text-to-SQL Capabilities of Large Language Models’, arXiv preprint arXiv:2204.00498, 2022.

A. Mohammadjafari, A. Maida, and R. Gottumukkala, ‘From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems’, arXiv preprint arXiv:2410.01066, 2024.

A. Gao and others, ‘Text-to-SQL Empowered by Large Language Models: A Benchmark on Prompt Engineering’, PVLDB, vol. 17, p. 1132, 2023.

T. Wu and others, ‘Intent recognition model based on sequential information and sentence features’, Neurocomputing, 2023.

A. S. Varghese and V. Mahalakshmi, ‘Bidirectional LSTM joint model for intent classification and named entity recognition in natural language understanding’, Human-centric Computing and Information Sciences, vol. 10, no. 1, pp. 1–17, 2020.

P. Gupta, ‘Building a Text Classification model using BiLSTM’. 2020. [Online]. Available: https://medium.com/analytics-vidhya/building-a-text-classification-model-using-bilstm-c0548ace26f2

C. O. Bilah and T. B. Adji, ‘Deteksi Intent pada Teks Bahasa Indonesia Menggunakan Bidirectional LSTM’, Undergraduate Thesis, Uni-versitas Gadjah Mada, 2023. [Online]. Available: https://etd.repository.ugm.ac.id/penelitian/detail/228084

A. Kurniawan, A. Abdiansah, and A. S. Utami, ‘NL2SQL untuk Chatbot dengan Semantic Parsing menggunakan Metode Berbasis Aturan’, Universitas Sriwijaya, Technical Report, 2022.

D. W. Chandra, ‘Pembuatan aplikasi penerjemah Indonesian Query Language ke dalam bentuk SQL dengan metode non-deterministic finite automata’, PhD Thesis, Universitas Kristen Petra, 2006.

I. Androutsopoulos, G. D. Ritchie, and P. Thanisch, ‘Natural language interfaces to databases–An introduction’, Journal of Natural Lan-guage Engineering, vol. 1, no. 1, pp. 29–81, 1995.

D. Patel, ‘SyntaxSQLNet: Syntax-tree-based Decoding for Text-to-SQL’, Medium blog, 2019.

D. Lee, J. Yoon, J. Song, and others, ‘One-Shot Learning for Text-to-SQL Generation’, arXiv preprint arXiv:1905.11499, 2019.

A. Elgohary and others, ‘Interactive Text-to-SQL via Editable Step-by-Step Explanations’, in Proceedings of EMNLP 2023, 2023.

V. Zhong, C. Xiong, and R. Socher, ‘Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning’, in arXiv preprint arXiv:1709.00103, 2017.

R. Zhong and others, ‘Addressing Limitations of Encoder–Decoder Based Approach to Text-to-SQL’, in Proceedings of COLING 2022, 2022.

X. Xu, C. Liu, and D. Song, ‘SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning’, arXiv preprint arXiv:1711.04436, 2017.