Large Vocabulary
Research on large vocabularies in natural language processing focuses on optimizing the size and composition of vocabularies for improved performance in large language models (LLMs). Current efforts explore methods for efficiently handling diverse vocabularies across multiple models, including vocabulary alignment techniques and dynamic embedding pruning to reduce memory footprint. These advancements aim to improve the accuracy and efficiency of LLMs across various tasks, such as machine translation, semantic segmentation, and ad-hoc video search, ultimately leading to more robust and adaptable NLP systems. The impact extends to broader applications by enabling better handling of specialized domains and evolving ontologies.
Papers
Bridging the Gap between Different Vocabularies for LLM Ensemble
Yangyifan Xu, Jinliang Lu, Jiajun Zhang
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
Zhongrui Gui, Shuyang Sun, Runjia Li, Jianhao Yuan, Zhaochong An, Karsten Roth, Ameya Prabhu, Philip Torr