Open Sampling
Open sampling in machine learning focuses on efficiently selecting data subsets for training and inference, aiming to improve model performance and reduce computational costs. Current research explores diverse sampling strategies, including those based on gradient information, low-discrepancy sequences, and normalizing flows, often integrated with various model architectures like neural networks, diffusion models, and generative adversarial networks. These advancements are crucial for handling large datasets, improving the accuracy and efficiency of various applications, from image synthesis and video summarization to drug discovery and autonomous driving. The development of efficient and effective sampling methods is a key challenge across many subfields of machine learning.
Papers
Language Models (Mostly) Know What They Know
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan
Keep your Distance: Determining Sampling and Distance Thresholds in Machine Learning Monitoring
Al-Harith Farhad, Ioannis Sorokos, Andreas Schmidt, Mohammed Naveed Akram, Koorosh Aslansefat, Daniel Schneider