Paper ID: 2412.17075
Iterative NLP Query Refinement for Enhancing Domain-Specific Information Retrieval: A Case Study in Career Services
Elham Peimani (1), Gurpreet Singh (1), Nisarg Mahyavanshi (1), Aman Arora (1), Awais Shaikh (1) ((1) Humber College, Toronto, Canada)
Retrieving semantically relevant documents in niche domains poses significant challenges for traditional TF-IDF-based systems, often resulting in low similarity scores and suboptimal retrieval performance. This paper addresses these challenges by introducing an iterative and semi-automated query refinement methodology tailored to Humber College's career services webpages. Initially, generic queries related to interview preparation yield low top-document similarities (approximately 0.2--0.3). To enhance retrieval effectiveness, we implement a two-fold approach: first, domain-aware query refinement by incorporating specialized terms such as resources-online-learning, student-online-services, and career-advising; second, the integration of structured educational descriptors like "online resume and interview improvement tools." Additionally, we automate the extraction of domain-specific keywords from top-ranked documents to suggest relevant terms for query expansion. Through experiments conducted on five baseline queries, our semi-automated iterative refinement process elevates the average top similarity score from approximately 0.18 to 0.42, marking a substantial improvement in retrieval performance. The implementation details, including reproducible code and experimental setups, are made available in our GitHub repositories \url{this https URL} and \url{this https URL}. We also discuss the limitations of our approach and propose future directions, including the integration of advanced neural retrieval models.
Submitted: Dec 22, 2024