Multi Head
Multi-head architectures, featuring multiple parallel processing pathways within a single neural network, are a burgeoning area of research aiming to improve efficiency, accuracy, and robustness in various machine learning tasks. Current research focuses on optimizing multi-head attention mechanisms in transformers, developing efficient multi-head models for diverse applications like speech recognition, image processing, and time series analysis, and exploring their use in continual learning and multi-task learning scenarios. These advancements hold significant potential for improving the performance and scalability of AI systems across numerous fields, from healthcare and environmental monitoring to natural language processing and computer vision.
Papers
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Tao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Lixing Shen, Zhan Chen, Xipeng Qiu, Qi Zhang, Tao GuiFudan University●East China Normal University●Hikvision Inc●Shanghai Al LabDoes Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information
Yein Park, Chanwoong Yoon, Jungwoo Park, Minbyul Jeong, Jaewoo KangKorea University●Upstage AI●AIGEN Sciences
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie+16Multi-Head Encoding for Extreme Label Classification
Daojun Liang, Haixia Zhang, Dongfeng Yuan, Minggao Zhang