Video Text Spotting

Video text spotting (VTS) aims to automatically detect, recognize, and track text within video sequences, a challenging task with applications in various fields. Current research focuses on improving the accuracy and efficiency of VTS systems, often employing transformer-based architectures and exploring techniques like contrastive learning and global associations to better handle temporal dependencies and complex text appearances. Efforts are also directed towards developing more robust and scalable methods, including those suitable for resource-constrained environments like unmanned aerial vehicles, and creating larger, higher-quality datasets with more precise annotations. These advancements are crucial for improving the performance of applications such as video indexing, content analysis, and autonomous systems.

Papers

May 29, 2024

LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model
Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan
Text Spotting Glyph Shape Glyph Image Video Text Spotting

January 13, 2024

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao
Video Text Long Span Text Spotting Video Text Spotting

January 8, 2024

GloTSFormer: Global Video Text Spotting Transformer
Han Wang, Yanjie Wang, Yang Li, Can Huang
Wasserstein Distance Video Dataset Transformer Tracker Video Text Spotting

May 2, 2023

Scalable Mask Annotation for Video Text Spotting
Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, Dacheng Tao
Video Text Ground Truth Annotation Scene Text Image Mask Annotation Video Text Spotting

July 18, 2022

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning
Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo
Text Detection Video Text Contrastive Representation Learning Video Text Spotting

June 5, 2022

E^2VTS: Energy-Efficient Video Text Spotting from Unmanned Aerial Vehicles
Zhenyu Hu, Zhenyu Wu, Pengcheng Pi, Yunhe Xue, Jiayi Shen, Jianchao Tan, Xiangru Lian, Zhangyang Wang, Ji Liu
Unmanned Aerial Vehicle Text to Video Video Text Video Text Spotting

May 4, 2022

SVTS: Scalable Video-to-Speech Synthesis
Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic
Critical Synthesis Text to Video Audio Spectrogram Transformer Video to Speech Synthesis Lip to Speech Video Text Spotting

November 9, 2021

Video Text Tracking With a Spatio-Temporal Complementary Model
Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Jing Wang, Shenggao Zhu, Xiang Bai
Web Tracking Video Text Tracking by Detection Video Text Spotting