Recognition Rate
Recognition rate, the accuracy of correctly identifying objects or patterns, is a central theme across diverse fields, from biometric security to image analysis. Current research focuses on improving recognition rates through advanced deep learning architectures like Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and recurrent models, often incorporating techniques like transfer learning, multi-modal fusion, and generative models to enhance performance, particularly in challenging scenarios such as low-resolution images or noisy data. These advancements have significant implications for various applications, including automated surveillance, medical diagnosis, and human-computer interaction, by enabling more reliable and efficient systems.
Papers
Zero-resource Speech Translation and Recognition with LLMs
Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff
HAUR: Human Annotation Understanding and Recognition Through Text-Heavy Images
Yuchen Yang, Haoran Yan, Yanhao Chen, Qingqiang Wu, Qingqi Hong
Speech Retrieval-Augmented Generation without Automatic Speech Recognition
Do June Min, Karel Mundnich, Andy Lapastora, Erfan Soltanmohammadi, Srikanth Ronanki, Kyu Han
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa, Seungjun Lee, Hyeseung Cho, Bumsoo Kim, Woohyung Lim
Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
Niko Moritz, Ruiming Xie, Yashesh Gaur, Ke Li, Simone Merello, Zeeshan Ahmed, Frank Seide, Christian Fuegen
Synchronized and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition
Hao Huang, Yujie Lin, Siyu Chen, Haiyang Liu