Strong Generalization
Strong generalization, the ability of machine learning models to perform well on unseen data, is a central objective in current research. Active areas of investigation include improving the robustness of self-supervised learning, understanding the optimization dynamics of transformers and other architectures (including CNNs and RNNs), and developing methods to enhance generalization through data augmentation, regularization techniques (e.g., logical regularization, consistency regularization), and improved training strategies (e.g., few-shot learning, meta-learning). These advancements are crucial for building reliable and adaptable AI systems across diverse applications, from image classification and natural language processing to healthcare and robotics.
Papers
Flexible task abstractions emerge in linear networks with fast and bounded units
Kai Sandbrink, Jan P. Bauer, Alexandra M. Proca, Andrew M. Saxe, Christopher Summerfield, Ali Hummos
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models
Wen-Chin Huang, Erica Cooper, Tomoki Toda
Proxy-informed Bayesian transfer learning with unknown sources
Sabina J. Sloman, Julien Martinelli, Samuel Kaski
Theoretically Guaranteed Distribution Adaptable Learning
Chao Xu, Xijia Tang, Guoqing Liu, Yuhua Qian, Chenping Hou
Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization
Pengkun Jiao, Na Zhao, Jingjing Chen, Yu-Gang Jiang
Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression: A Distribution-Free Analysis
Yingzhen Yang, Ping Li
On the Comparison between Multi-modal and Single-modal Contrastive Learning
Wei Huang, Andi Han, Yongqiang Chen, Yuan Cao, Zhiqiang Xu, Taiji Suzuki
Generalization and Risk Bounds for Recurrent Neural Networks
Xuewei Cheng, Ke Huang, Shujie Ma
Classifier Chain Networks for Multi-Label Classification
Daniel J. W. Touw, Michel van de Velden
Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading
Sharon Chokuwa, Muhammad Haris Khan
How Far is Video Generation from World Model: A Physical Law Perspective
Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, Jiashi Feng
Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis
Neel Dey, Benjamin Billot, Hallee E. Wong, Clinton J. Wang, Mengwei Ren, P. Ellen Grant, Adrian V. Dalca, Polina Golland
Training on test proteins improves fitness, structure, and function prediction
Anton Bushuiev, Roman Bushuiev, Nikola Zadorozhny, Raman Samusevich, Hannes Stärk, Jiri Sedlar, Tomáš Pluskal, Josef Sivic
Shortcut Learning in In-Context Learning: A Survey
Rui Song, Yingji Li, Fausto Giunchiglia, Hao Xu
How Analysis Can Teach Us the Optimal Way to Design Neural Operators
Vu-Anh Le, Mehmet Dik