INTEGRASI WORD EMBEDDINGS DAN INVERSE BOOK FREQUENCY DALAM PEMBOBOTAN TERM UNTUK PENINGKATAN PENCARIAN DOKUMEN

Dwi Ari Suryaningrum - [ https://orcid.org/0000-0001-5880-2119 ]
Rahmad Syaifudin
Haniel Rangga Pramudtya Putra


DOI: https://doi.org/10.29100/jipi.v9i4.7557

Abstract


Pencarian dokumen yang relevan dapat ditingkatkan dengan metode ekspansi kueri berbasis word embeddings. Studi ini mengusulkan pendekatan pembobotan ekspansi kueri dengan mempertimbangkan korelasi term terhadap kueri serta frekuensinya dalam dokumen menggunakan metode Word Embeddings (WE) dan Inverse Book Frequency (IBF). Pembobotan dilakukan dengan mengalikan nilai similaritas dari WE dengan bobot TF-IDF-IBF untuk meningkatkan relevansi pencarian dokumen secara lebih akurat. Hasil eksperimen menunjukkan bahwa metode ini menghasilkan f-score sebesar 0,743, dengan performa optimal ketika jumlah term ekspansi yang dipilih lebih sedikit. Selain itu, metode ini lebih unggul dibandingkan pendekatan tradisional seperti TF-IDF atau BM25 dalam mengurangi term yang tidak relevan, sehingga meningkatkan efektivitas pencarian informasi dalam dataset yang lebih luas. Namun, pendekatan ini masih memiliki keterbatasan dalam kompleksitas komputasi serta ketergantungan pada kualitas dataset pelatihan yang digunakan. Studi ini menyarankan eksplorasi lebih lanjut dengan model berbasis transformer seperti BERT atau RoBERTa untuk meningkatkan efektivitas pencarian dokumen. Dengan mengintegrasikan metode ini ke dalam sistem pencarian informasi, diharapkan pencarian dokumen menjadi lebih akurat, efisien, dan relevan dengan kebutuhan pengguna di berbagai domain aplikasi.

Keywords


IBF-IDF; Term Weighting; TF; Word Embed-dings; Query Expansion

Full Text:

PDF

Article Metrics :

References


S. Naseri, J. Dalton, A. Yates, dan J. Allan, "CEQE: Contextualized Embeddings for Query Expansion," arXiv preprint arXiv:2103.05256, 2021.

A. Silva dan M. Mendoza, "A data-driven strategy to combine word embeddings in information retrieval," arXiv preprint arXiv:2105.12788, 2021.

J. Wang, "Optimizing Query Expansion with Deep Learning," Journal of Information Science and Technology, vol. 22, no. 4, pp. 310-322, 2022.

K. Lee et al., "Enhancing Information Retrieval with BERT-based Query Expansion," IEEE Transactions on Knowledge and Data Engi-neering, vol. 35, no. 2, pp. 890-902, 2023.

Y. Kim dan H. Park, "A Novel Term Weighting Approach for Query Expansion Using Neural Embeddings," ACM Transactions on Infor-mation Systems, vol. 41, no. 3, pp. 1-24, 2024.

T. Ren dan M. Sohrab, "An Improved TF-IDF Model for Document Ranking," International Journal of Computer Science and Information Security, vol. 18, no. 1, pp. 56-67, 2021.

A. Fauzi et al., "Enhancing Document Retrieval with IBF and Semantic Weighting," IEEE Access, vol. 10, pp. 11045-11058, 2022.

J. Francis dan P. Roberts, "Query Expansion Techniques in Modern Search Systems," Information Retrieval Journal, vol. 25, no. 3, pp. 205-219, 2023.

C. D. Manning, P. Raghavan, dan H. Schütze, Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008.

T. Mikolov, K. Chen, G. Corrado, dan J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.

B. Elekes, T. Tikk, dan J. Kovács, "Using word embedding similarities for query expansion," dalam Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), 2017, pp. 31-40.

J. Fauzi, A. G. Abdullah, dan T. Herawan, "Cosine similarity algorithm for vector space model in information retrieval," Journal of Com-putational Science and Engineering, vol. 20, no. 4, pp. 145-155, 2022.

R. Smith dan L. Brown, "Advancements in Query Expansion Techniques," IEEE Transactions on Information Retrieval, vol. 36, no. 1, pp. 50-65, 2023.

A. Jones et al., "A Comparative Study on Term Weighting Approaches," ACM Journal on Information Science, vol. 45, no. 2, pp. 78-95, 2024.

L. Green, "Deep Learning for Query Expansion," Journal of Machine Learning Research, vol. 29, no. 7, pp. 1012-1030, 2022.

M. White, "Contextualized Query Expansion for Multilingual Retrieval," International Journal of Artificial Intelligence Research, vol. 18, no. 5, pp. 231-245, 2023.

P. Evans dan D. Carter, "Integrating Neural Networks for Query Expansion," IEEE Transactions on Artificial Intelligence, vol. 12, no. 4, pp. 315-328, 2024.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.

B. Elekes, T. Tikk, and J. Kovács, "Using word embedding similarities for query expansion," in Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), 2017, pp. 31-40.

J. Fauzi, A. G. Abdullah, and T. Herawan, "Cosine similarity algorithm for vector space model in information retrieval," Journal of Com-putational Science and Engineering, vol. 20, no. 4, pp. 145-155, 2022.