Paper ID: 2306.14590

CST-YOLO: A Novel Method for Blood Cell Detection Based on Improved YOLOv7 and CNN-Swin Transformer

Ming Kang, Chee-Ming Ting, Fung Fung Ting, Raphaël Phan

Blood cell detection is a typical small-scale object detection problem in computer vision. In this paper, we propose a CST-YOLO model for blood cell detection based on YOLOv7 architecture and enhance it with the CNN-Swin Transformer (CST), which is a new attempt at CNN-Transformer fusion. We also introduce three other useful modules: Weighted Efficient Layer Aggregation Networks (W-ELAN), Multiscale Channel Split (MCS), and Concatenate Convolutional Layers (CatConv) in our CST-YOLO to improve small-scale object detection precision. Experimental results show that the proposed CST-YOLO achieves 92.7%, 95.6%, and 91.1% mAP@0.5, respectively, on three blood cell datasets, outperforming state-of-the-art object detectors, e.g., RT-DETR, YOLOv5, and YOLOv7. Our code is available at this https URL.

Submitted: Jun 26, 2023