Paper ID: 2412.00125

Efficient Learning Content Retrieval with Knowledge Injection

Batuhan Sariturk, Rabia Bayraktar, Merve Elmas Erdem

With the rise of online education platforms, there is a growing abundance of educational content across various domain. It can be difficult to navigate the numerous available resources to find the most suitable training, especially in domains that include many interconnected areas, such as ICT. In this study, we propose a domain-specific chatbot application that requires limited resources, utilizing versions of the Phi language model to help learners with educational content. In the proposed method, Phi-2 and Phi-3 models were fine-tuned using QLoRA. The data required for fine-tuning was obtained from the Huawei Talent Platform, where courses are available at different levels of expertise in the field of computer science. RAG system was used to support the model, which was fine-tuned by 500 Q&A pairs. Additionally, a total of 420 Q&A pairs of content were extracted from different formats such as JSON, PPT, and DOC to create a vector database to be used in the RAG system. By using the fine-tuned model and RAG approach together, chatbots with different competencies were obtained. The questions and answers asked to the generated chatbots were saved separately and evaluated using ROUGE, BERTScore, METEOR, and BLEU metrics. The precision value of the Phi-2 model supported by RAG was 0.84 and the F1 score was 0.82. In addition to a total of 13 different evaluation metrics in 4 different categories, the answers of each model were compared with the created content and the most appropriate method was selected for real-life applications.

Submitted: Nov 28, 2024