Paper ID: 2410.02523

Med-TTT: Vision Test-Time Training model for Medical Image Segmentation

Jiashu Xu

Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning. Although models based on convolutional neural networks (CNNs) and Transformers have achieved remarkable success in medical image segmentation tasks, they still face challenges such as high computational complexity and the loss of local features when capturing long-range dependencies. To address these limitations, we propose Med-TTT, a visual backbone network integrated with Test-Time Training (TTT) layers, which incorporates dynamic adjustment capabilities. Med-TTT introduces the Vision-TTT layer, which enables effective modeling of long-range dependencies with linear computational complexity and adaptive parameter adjustment during inference. Furthermore, we designed a multi-resolution fusion mechanism to combine image features at different scales, facilitating the identification of subtle lesion characteristics in complex backgrounds. At the same time, we adopt a frequency domain feature enhancement strategy based on high pass filtering, which can better capture texture and fine-grained details in images. Experimental results demonstrate that Med-TTT significantly outperforms existing methods on multiple medical image datasets, exhibiting strong segmentation capabilities, particularly in complex image backgrounds. The model achieves leading performance in terms of accuracy, sensitivity, and Dice coefficient, providing an efficient and robust solution for the field of medical image this http URL code is available at this https URL .

Submitted: Oct 3, 2024