Synthetic Task
Synthetic tasks are artificial datasets designed to evaluate and improve the performance of machine learning models, particularly large language models (LLMs) and multimodal models. Current research focuses on developing more comprehensive and realistic synthetic benchmarks to assess capabilities like long-context understanding, reasoning, and compositional generalization, often employing techniques like program synthesis and controlled data generation to probe specific model weaknesses. These efforts aim to provide more reliable and efficient evaluation methods, ultimately leading to better model development and a deeper understanding of model strengths and limitations across diverse applications. The use of synthetic tasks also facilitates research into mitigating issues like hallucination and copy bias in LLMs.
Papers
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou, Ke Ding, Daniel Fleischer, Peter Izasak, Moshe Wasserblat, Danqi Chen
Plots Unlock Time-Series Understanding in Multimodal Models
Mayank Daswani, Mathias M.J. Bellaiche, Marc Wilson, Desislav Ivanov, Mikhail Papkov, Eva Schnider, Jing Tang, Kay Lamerigts, Gabriela Botea, Michael A. Sanchez, Yojan Patel, Shruthi Prabhakara, Shravya Shetty, Umesh Telang
Improving the State of the Art for Training Human-AI Teams: Technical Report #3 -- Analysis of Testbed Alternatives
Lillian Asiala, James E. McCarthy, Lixiao Huang
Improving the State of the Art for Training Human-AI Teams: Technical Report #2 -- Results of Researcher Knowledge Elicitation Survey
James E. McCarthy, Lillian Asiala, LeeAnn Maryeski, Dawn Sillars
Improving the State of the Art for Training Human-AI Teams: Technical Report #1 -- Results of Subject-Matter Expert Knowledge Elicitation Survey
James E. McCarthy, Lillian Asiala, LeeAnn Maryeski, Nyla Warren