Paper ID: 2312.07128
MS-Twins: Multi-Scale Deep Self-Attention Networks for Medical Image Segmentation
Jing Xu
Although transformer is preferred in natural language processing, some studies has only been applied to the field of medical imaging in recent years. For its long-term dependency, the transformer is expected to contribute to unconventional convolution neural net conquer their inherent spatial induction bias. The lately suggested transformer-based segmentation method only uses the transformer as an auxiliary module to help encode the global context into a convolutional representation. How to optimally integrate self-attention with convolution has not been investigated in depth. To solve the problem, this paper proposes MS-Twins (Multi-Scale Twins), which is a powerful segmentation model on account of the bond of self-attention and convolution. MS-Twins can better capture semantic and fine-grained information by combining different scales and cascading features. Compared with the existing network structure, MS-Twins has made progress on the previous method based on the transformer of two in common use data sets, Synapse and ACDC. In particular, the performance of MS-Twins on Synapse is 8% higher than SwinUNet. Even compared with nnUNet, the best entirely convoluted medical image segmentation network, the performance of MS-Twins on Synapse and ACDC still has a bit advantage.
Submitted: Dec 12, 2023