Instruction Datasets
Instruction datasets are collections of task instructions and corresponding desired outputs used to fine-tune large language models (LLMs), improving their ability to follow diverse user instructions. Current research emphasizes creating larger, higher-quality datasets, often through automated generation techniques, and optimizing their composition for specific tasks or model architectures, including exploring methods like curriculum learning and submodular optimization for data selection. This work is crucial for advancing LLMs across various domains, from improving voice assistants and enhancing multimodal models to enabling more effective applications in specialized fields like biomedicine and cybersecurity.
Papers
January 29, 2024
December 16, 2023
November 27, 2023
November 22, 2023
October 14, 2023
October 2, 2023
July 13, 2023
July 12, 2023
July 5, 2023
June 13, 2023
June 5, 2023
May 24, 2023
May 4, 2023