Binary Vision Transformer

Binary Vision Transformers (Bi-ViTs) aim to drastically reduce the computational cost and memory footprint of Vision Transformers (ViTs) by binarizing their weights and activations, enabling deployment on resource-constrained devices. Current research focuses on improving the accuracy of these binarized models through techniques like novel binarization methods tailored to the unique characteristics of ViTs (e.g., addressing attention mechanisms and gradient issues), architectural modifications inspired by Convolutional Neural Networks, and knowledge distillation from higher-precision models. The success of Bi-ViTs would significantly broaden the accessibility and applicability of ViTs, impacting various computer vision tasks and expanding the potential of edge computing.

Papers