Multilingual Data
Multilingual data research focuses on developing natural language processing (NLP) models capable of handling multiple languages effectively, aiming to overcome the limitations of English-centric models and address performance disparities across languages. Current research emphasizes improving model architectures, such as incorporating language-specific modules or leveraging techniques like low-rank adaptation and self-distillation, to enhance multilingual capabilities and mitigate biases stemming from imbalanced data. This work is crucial for broadening NLP's accessibility and impact, enabling applications in diverse fields like finance, e-commerce, and healthcare, where multilingual data is prevalent and accurate analysis is essential.
Papers
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Andreas Koukounas, Nan Wang, Han Xiao
Cross-Lingual News Event Correlation for Stock Market Trend Prediction
Sahar Arshad, Nikhar Azhar, Sana Sajid, Seemab Latif, Rabia Latif