Cascaded Transformer
Cascaded transformers represent a powerful approach to various computer vision and signal processing tasks by leveraging the strengths of multiple transformer stages for progressive refinement. Current research focuses on applying this architecture to diverse problems, including human video generation, action detection, facial landmark detection, and keyword spotting, often incorporating specialized attention mechanisms and auxiliary tasks to improve accuracy and efficiency. These advancements demonstrate the versatility of cascaded transformers in achieving state-of-the-art performance across a range of applications, impacting fields from virtual reality to efficient edge device deployment. The resulting improvements in accuracy and efficiency are significant for numerous practical applications.
Papers
Cascade Transformers for End-to-End Person Search
Rui Yu, Dawei Du, Rodney LaLonde, Daniel Davila, Christopher Funk, Anthony Hoogs, Brian Clipp
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Yangji He, Weihan Liang, Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, Wenqiang Zhang