Instruction Datasets
Instruction datasets are collections of task instructions and corresponding desired outputs used to fine-tune large language models (LLMs), improving their ability to follow diverse user instructions. Current research emphasizes creating larger, higher-quality datasets, often through automated generation techniques, and optimizing their composition for specific tasks or model architectures, including exploring methods like curriculum learning and submodular optimization for data selection. This work is crucial for advancing LLMs across various domains, from improving voice assistants and enhancing multimodal models to enabling more effective applications in specialized fields like biomedicine and cybersecurity.
Papers
November 9, 2024
October 17, 2024
October 12, 2024
October 3, 2024
October 2, 2024
September 23, 2024
September 11, 2024
September 3, 2024
August 17, 2024
June 19, 2024
June 14, 2024
May 31, 2024
May 18, 2024
April 2, 2024
March 25, 2024
March 13, 2024
February 22, 2024
February 16, 2024
February 6, 2024
February 5, 2024