End to End Sign Language
End-to-end sign language translation (SLT) aims to directly convert sign language videos into text without intermediate steps like gloss transcription, addressing the limitations and costs associated with gloss-based approaches. Current research focuses on improving model performance through techniques like contrastive learning to enhance feature discrimination and self-supervised pretraining on anonymized data to mitigate data scarcity and privacy concerns. Transformer-based architectures are prevalent, with ongoing efforts to leverage cross-modality data augmentation and unified models that jointly learn from multiple related tasks to bridge the modality gap between video and text, ultimately improving translation accuracy and expanding the applicability of SLT systems.