New Datasets
Recent research highlights a surge in the creation of new datasets across diverse fields, driven by the need for more comprehensive and representative data to train and evaluate machine learning models. Current efforts focus on addressing limitations in existing datasets, such as biases, lack of diversity (e.g., non-English languages), and insufficient coverage of real-world scenarios, particularly in areas like automated driving and large language model safety. These new datasets, often accompanied by novel evaluation frameworks and benchmark algorithms, are crucial for advancing the reliability and fairness of AI systems and fostering more robust and generalizable models across various applications.
Papers
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka
Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets
Hang Chen, Xinyu Yang, Keqing Du