Implementasi Lowk-Rank Adaptation of Large Langauage Model (LoRA) Untuk Effisiensi Large Language Model
Abstract
Model transformator seperti LlaMA 2 sangat kuat untuk memproses berbagai tugas bahasa alami, namun memiliki kekuatan pemrosesan yang signifikan dan keterbatasan memori yang membuatnya sulit untuk diimplementasikan. Tantangan terbesarnya terletak pada konsumsi sumber daya penyimpanan yang besar dan kebutuhan daya komputasi dalam jumlah besar. Untuk mengatasi permasalahan tersebut, dikembangkan solusi berupa implementasi LoRA (Low Rank Adapter). LoRA, khususnya di LlaMA 2, menggunakan pendekatan adaptif dalam mengompresi model Transformer menggunakan adaptor berdaya rendah. Penerapan LoRA pada model ini mengurangi jumlah operasi floating-point, sehingga mempercepat proses pelatihan dan inferensi. Secara signifikan mengurangi konsumsi daya dan penggunaan memori. Tujuan utama penerapan LoRA di LlaMA 2 adalah untuk mengoptimalkan efisiensi model, dengan fokus pada pengurangan operasi floating-point dan meningkatkan penggunaan memori GPU.
Keywords
Full Text:
PDFArticle Metrics :
References
F. Petroni et al., “Language models as knowledge bases?,” EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 2463–2473, 2019, doi: 10.18653/v1/d19-1250.
D. Chenxi, “How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation,” pp. 1–9.
H. Azzuni, S. Jamal, and A. Elsaddik, “uTalk: Bridging the Gap Between Humans and AI,” no. 1, pp. 12–15, 2023, [Online]. Available: http://arxiv.org/abs/2310.02739
E. Hu et al., “Lora: Low-Rank Adaptation of Large Language Models,” ICLR 2022 - 10th Int. Conf. Learn. Represent., pp. 1–26, 2022.
S. Sun, D. Gupta, and M. Iyyer, “Exploring the impact of low-rank adaptation on the performance , efficiency , and regularization of RLHF,” pp. 1–14, 2023.
T. For, “L O RA-FA : M EMORY - EFFICIENT L OW - RANK A DAPTA -,” vol. 1, pp. 1–15, 2023.
Y. Li et al., “LoftQ : LoRA-Fine-Tuning-Aware Quantization for Large,” vol. 2023, 2023.
R. Zhang et al., “LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention,” pp. 1–22, 2023, [Online]. Available: http://arxiv.org/abs/2303.16199
P. Gao et al., “LLaMA-Adapter V2 : Parameter-Efficient Visual Instruction Model”.
J. Armengol-Estapé, “A pipeline for large raw text preprocessing and model training of language models at scale,” 2021.
J. Vig, “Analyzing the Structure of Attention in a Transformer Language Model,” 2019.
N. Goyal, J. Du, M. Ott, G. Anantharaman, and A. Conneau, “Larger-Scale Transformers for Multilingual Masked Language Modeling,” RepL4NLP 2021 - 6th Work. Represent. Learn. NLP, Proc. Work., pp. 29–33, 2021, doi: 10.18653/v1/2021.repl4nlp-1.4.
S. Lermen, C. Rogers-smith, and J. Ladish, “LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B,” 2022.
K. S. Kalyan, A. Rajasekharan, and S. Sangeetha, “AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing,” pp. 1–42, 2021, [Online]. Available: http://arxiv.org/abs/2108.05542
J. M. Nápoles-Duarte, A. Biswas, M. I. Parker, J. P. Palomares-Baez, M. A. Chávez-Rojo, and L. M. Rodríguez-Valdez, “Stmol: A component for building interactive molecular visualizations within streamlit web-applications,” Front. Mol. Biosci., vol. 9, no. September, pp. 1–10, 2022, doi: 10.3389/fmolb.2022.990846.
M. Saravanan, P. C. R. Raj, and S. Raman, “Summarization and categorization of text data in high-level data cleaning for information retrieval,” Appl. Artif. Intell., vol. 17, no. 5–6, pp. 461–474, 2003, doi: 10.1080/713827177.
A. Nagarajan and A. Raghunathan, “TokenDrop + BucketSampler: Towards Efficient Padding-free Fine-tuning of Language Models,” Find. Assoc. Comput. Linguist. EMNLP 2023, pp. 11682–11695, 2023, doi: 10.18653/v1/2023.findings-emnlp.782.
J. Hewitt, C. D. Manning, and P. Liang, “Truncation Sampling as Language Model Desmoothing,” Find. Assoc. Comput. Linguist. EMNLP 2022, no. 1, pp. 3414–3427, 2022, doi: 10.18653/v1/2022.findings-emnlp.249.
Y. Goldberg and O. Levy, “word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method,” no. 2, pp. 1–5, 2014, [Online]. Available: http://arxiv.org/abs/1402.3722
H. Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” 2023, [Online]. Available: http://arxiv.org/abs/2307.09288