Deeper Model
Deeper models, characterized by increased layers in neural networks like Transformers and Graph Transformers, aim to improve performance by capturing more complex relationships in data. Current research focuses on mitigating challenges associated with deeper models, such as rank collapse, oversmoothing, and increased computational costs, through techniques like attention masking, exponential decay, and parameter-efficient architectures including multi-path structures and knowledge distillation. These efforts are significant because they address the trade-off between model depth, performance, and resource consumption, impacting both the efficiency of training large models and their deployment in resource-constrained environments.
Papers
May 29, 2024
April 24, 2024
January 24, 2024
January 17, 2024
January 9, 2024
December 11, 2023
October 16, 2023
June 15, 2023
June 5, 2023
May 10, 2023
April 19, 2023
April 3, 2023
March 27, 2023
February 9, 2023
October 19, 2022
September 30, 2022
July 15, 2022
May 21, 2022