Domain Specific

Domain-specific adaptation of large language models (LLMs) focuses on enhancing their performance and reliability within specialized fields by overcoming limitations stemming from data scarcity and domain-specific terminology. Current research emphasizes developing effective methods for data curation, including synthetic data generation and techniques like knowledge distillation to transfer knowledge from domain-specific to general-purpose models, alongside novel architectures like graph-oriented databases for improved performance and maintenance. This work is crucial for broadening the applicability of LLMs to diverse sectors, improving efficiency in areas like finance, healthcare, and scientific research, and addressing concerns about bias and hallucination in sensitive domains.

Papers

June 25, 2024

FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model
Feijie Wu, Zitao Li, Yaliang Li, Bolin Ding, Jing Gao
Large Language Model Full Model Domain Specific LLM Fine Tuning LLM Compression

June 21, 2024

Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research
Tianyu Liu, Yijia Xiao, Xiao Luo, Hua Xu, W. Jim Zheng, Hongyu Zhao
Large Language Model Multimodal Large Language Model Domain Specific Spatial Transcriptomics Golden Collection Closed Source Model Spectrometry Based Proteomics

June 20, 2024

DIRAS: Efficient LLM Annotation of Document Relevance in Retrieval Augmented Generation
Jingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold
Retrieval Augmented Generation Domain Specific Information Retrieval Document Relevance LLM Annotation

June 18, 2024

June 17, 2024

June 16, 2024

Evaluating the Performance of Large Language Models via Debates
Behrad Moniri, Hamed Hassani, Edgar Dobriban
System Performance Domain Specific State of the Art Large Debate Evaluation Benchmark Framework

June 14, 2024

Domain-Specific Shorthand for Generation Based on Context-Free Grammar
Andriy Kanyuka, Elias Mahfoud
Generative AI Domain Specific Context Free Grammar Structured Generation Multiple Generation

June 11, 2024

When is an Embedding Model More Promising than Another?
Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida
Natural Language Processing Full Model Domain Specific

June 9, 2024

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou
Retrieval Augmented Generation Domain Specific Internet Service Domain Chinese Benchmark Multi Document

June 6, 2024

On The Persona-based Summarization of Domain-Specific Documents
Ankan Mullick, Sombit Bose, Rounak Saha, Ayan Kumar Bhowmick, Pawan Goyal, Niloy Ganguly, Prasenjit Dey, Ravi Kokku
Domain Specific Domain Knowledge Generated Summary Personalized Summarization

June 5, 2024

May 30, 2024

Reasoning about concepts with LLMs: Inconsistencies abound
Rosario Uceda-Sosa, Karthikeyan Natesan Ramamurthy, Maria Chang, Moninder Singh
Large Language Model Knowledge Graph Domain Specific Top Level Ontology Concept Identification Hard to Easy Inconsistency Abstract Concept

May 24, 2024

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis
Shraddha Barke, Emmanuel Anaya Gonzalez, Saketh Ram Kasibatla, Taylor Berg-Kirkpatrick, Nadia Polikarpova
Large Language Model Domain Specific Program Synthesis Sound Synthesizer LLM Based Machine Translation

May 23, 2024

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data
Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang
Language Model Synthetic Data Domain Specific Knowledge Transfer Federated Prompt Cooperation Domain Knowledge Transfer

Domain Specific

Papers

FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model

Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

DIRAS: Efficient LLM Annotation of Document Relevance in Retrieval Augmented Generation

A Compass for Navigating the World of Sentence Embeddings for the Telecom Domain

AutoDSL: Automated domain-specific language design for structural representation of procedures with constraints

R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models

DocCGen: Document-based Controlled Code Generation

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

Evaluating the Performance of Large Language Models via Debates

Domain-Specific Shorthand for Generation Based on Context-Free Grammar

When is an Embedding Model More Promising than Another?

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

On The Persona-based Summarization of Domain-Specific Documents

Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages

Does your data spark joy? Performance gains from domain upsampling at the end of training

Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation

Reasoning about concepts with LLMs: Inconsistencies abound

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data