Linear Neural Network
Linear neural networks, despite their apparent simplicity, are a crucial area of research in deep learning, serving as a tractable model for investigating fundamental optimization and generalization properties. Current research focuses on understanding the dynamics of gradient descent in these networks, particularly in overparameterized settings and across various architectures including fully-connected and convolutional models, analyzing phenomena like double descent and the impact of regularization techniques such as batch normalization and L2 regularization. These studies provide valuable insights into the behavior of more complex nonlinear networks and contribute to a deeper theoretical understanding of deep learning's success, impacting both algorithm design and the interpretation of model behavior.
Papers
Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Zhenghao Xu, Yuqing Wang, Tuo Zhao, Rachel Ward, Molei Tao
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
Hongru Yang, Bhavya Kailkhura, Zhangyang Wang, Yingbin Liang