Strong Generalization
Strong generalization, the ability of machine learning models to perform well on unseen data, is a central objective in current research. Active areas of investigation include improving the robustness of self-supervised learning, understanding the optimization dynamics of transformers and other architectures (including CNNs and RNNs), and developing methods to enhance generalization through data augmentation, regularization techniques (e.g., logical regularization, consistency regularization), and improved training strategies (e.g., few-shot learning, meta-learning). These advancements are crucial for building reliable and adaptable AI systems across diverse applications, from image classification and natural language processing to healthcare and robotics.
Papers
TOOLVERIFIER: Generalization to New Tools via Self-Verification
Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu
Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains
Steven Wilkins-Reeves, Xu Chen, Qi Ma, Christine Agarwal, Aude Hofleitner
Zero-shot generalization across architectures for visual classification
Evan Gerritz, Luciano Dyballa, Steven W. Zucker
Revisiting Data Augmentation in Deep Reinforcement Learning
Jianshu Hu, Yunpeng Jiang, Paul Weng
Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures
Paul Viallard, Rémi Emonet, Amaury Habrard, Emilie Morvant, Valentina Zantedeschi
The effect of Leaky ReLUs on the training and generalization of overparameterized networks
Yinglong Guo, Shaohan Li, Gilad Lerman
Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP
Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis
Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy
Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model
Salman Rahman, Lavender Yao Jiang, Saadia Gabriel, Yindalon Aphinyanaphongs, Eric Karl Oermann, Rumi Chunara