Domain Specific
Domain-specific adaptation of large language models (LLMs) focuses on enhancing their performance and reliability within specialized fields by overcoming limitations stemming from data scarcity and domain-specific terminology. Current research emphasizes developing effective methods for data curation, including synthetic data generation and techniques like knowledge distillation to transfer knowledge from domain-specific to general-purpose models, alongside novel architectures like graph-oriented databases for improved performance and maintenance. This work is crucial for broadening the applicability of LLMs to diverse sectors, improving efficiency in areas like finance, healthcare, and scientific research, and addressing concerns about bias and hallucination in sensitive domains.
Papers
A Compass for Navigating the World of Sentence Embeddings for the Telecom Domain
Sujoy Roychowdhury, Sumit Soman, H. G. Ranjani, Vansh Chhabra, Neeraj Gunda, Subhadip Bandyopadhyay, Sai Krishna Bala
AutoDSL: Automated domain-specific language design for structural representation of procedures with constraints
Yu-Zhe Shi, Haofei Hou, Zhangqian Bi, Fanxu Meng, Xiang Wei, Lecheng Ruan, Qining Wang
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
Shangqing Tu, Yuanchun Wang, Jifan Yu, Yuyang Xie, Yaran Shi, Xiaozhi Wang, Jing Zhang, Lei Hou, Juanzi Li
DocCGen: Document-based Controlled Code Generation
Sameer Pimparkhede, Mehant Kammakomati, Srikanth Tamilselvam, Prince Kumar, Ashok Pon Kumar, Pushpak Bhattacharyya
Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models
Scott Barnett, Zac Brannelly, Stefanus Kurniawan, Sheng Wong
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu
Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages
Federico Mora, Justin Wong, Haley Lepe, Sahil Bhatia, Karim Elmaaroufi, George Varghese, Joseph E. Gonzalez, Elizabeth Polgreen, Sanjit A. Seshia
Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle
Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation
Tingjia Shen, Hao Wang, Jiaqing Zhang, Sirui Zhao, Liangyue Li, Zulong Chen, Defu Lian, Enhong Chen