Vocabulary Word
Vocabulary word research centers on handling words absent from a model's training data (out-of-vocabulary or OOV words), a critical challenge across various natural language processing tasks. Current efforts focus on improving OOV handling in machine translation, speech recognition, and text generation through techniques like data augmentation (creating synthetic data with OOV words), sub-word tokenization (breaking words into smaller units), and contrastive learning (improving model robustness to unseen words). These advancements are crucial for building more robust and generalizable language models, impacting applications ranging from improved machine translation of low-resource languages to more accurate speech recognition systems.
Papers
"When Words Fail, Emojis Prevail": Generating Sarcastic Utterances with Emoji Using Valence Reversal and Semantic Incongruity
Faria Binte Kader, Nafisa Hossain Nujat, Tasmia Binte Sogir, Mohsinul Kabir, Hasan Mahmud, Kamrul Hasan
Actively Discovering New Slots for Task-oriented Conversation
Yuxia Wu, Tianhao Dai, Zhedong Zheng, Lizi Liao