Deep Model
Deep models, encompassing a broad range of neural network architectures, aim to learn complex patterns from data for various tasks like image classification, time series forecasting, and system identification. Current research emphasizes improving efficiency (e.g., through constant-time learning algorithms and layer caching), enhancing explainability (e.g., via gradient-free methods), and mitigating issues like bias and memorization. These advancements are significant because they improve the reliability, trustworthiness, and applicability of deep models across diverse scientific fields and real-world applications, including healthcare, finance, and autonomous systems.
Papers
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang
Investigating the Nature of 3D Generalization in Deep Neural Networks
Shoaib Ahmed Siddiqui, David Krueger, Thomas Breuel