Language Model
Language models are computational systems designed to understand and generate human language, primarily aiming to improve tasks like translation, question answering, and text summarization. Current research focuses on enhancing efficiency (e.g., through novel learning rate schedules and optimized architectures), improving alignment with human preferences (via preference optimization and reward modeling), and addressing biases and limitations (including techniques for mitigating toxicity and enhancing robustness). These advancements have significant implications for various fields, impacting natural language processing research and enabling the development of more powerful and reliable AI applications.
Papers
MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Savya Khosla, Kushal Kafle, Simon Jenni, Handong Zhao, John Collomosse, Jing Shi
LlamaRestTest: Effective REST API Testing with Small Language Models
Myeongsoo Kim, Saurabh Sinha, Alessandro Orso
LoRS: Efficient Low-Rank Adaptation for Sparse Large Language Model
Yuxuan Hu, Jing Zhang, Xiaodong Chen, Zhe Zhao, Cuiping Li, Hong Chen
Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation
Jiaxin Guo, Yuanchang Luo, Daimeng Wei, Ling Zhang, Zongyao Li, Hengchao Shang, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Zhanglin Wu, Hao Yang
Large Language Models For Text Classification: Case Study And Comprehensive Review
Arina Kostina, Marios D. Dikaiakos, Dimosthenis Stefanidis, George Pallis
Religious Bias Landscape in Language and Text-to-Image Models: Analysis, Detection, and Debiasing Strategies
Ajwad Abrar, Nafisa Tabassum Oeshy, Mohsinul Kabir, Sophia Ananiadou
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Anurag Kumar, Rohit Paturi, Amber Afshan, Sundararajan Srinivasan
OptiChat: Bridging Optimization Models and Practitioners with Large Language Models
Hao Chen, Gonzalo Esteban Constante-Flores, Krishna Sri Ipsit Mantri, Sai Madhukiran Kompalli, Akshdeep Singh Ahluwalia, Can Li
PokerBench: Training Large Language Models to become Professional Poker Players
Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli
Exploring Robustness of Multilingual LLMs on Real-World Noisy Data
Amirhossein Aliakbarzadeh, Lucie Flek, Akbar Karimi
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Abhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi
Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing
Pulkit Arora, Akbar Karimi, Lucie Flek
Towards Best Practices for Open Datasets for LLM Training
Stefan Baack, Stella Biderman, Kasia Odrozek, Aviya Skowron, Ayah Bdeir, Jillian Bommarito, Jennifer Ding, Maximilian Gahntz, Paul Keller, Pierre-Carl Langlais, Greg Lindahl, Sebastian Majstorovic, Nik Marda, Guilherme Penedo, Maarten Van Segbroeck, Jennifer Wang, Leandro von Werra, Mitchell Baker, Julie Belião, Kasia Chmielinski, Marzieh Fadaee, Lisa Gutermuth, Hynek Kydlíček, Greg Leppert, EM Lewis-Jong, Solana Larsen, Shayne Longpre, Angela Oduor Lungati, Cullen Miller, Victor Miller, Max Ryabinin, Kathleen Siminyu, Andrew Strait, Mark Surman, Anna Tumadóttir, Maurice Weber, Rebecca Weiss, Lee White, Thomas Wolf
Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models
Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
Yifang Xu, Yunzhuo Sun, Benxiang Zhai, Ming Li, Wenxin Liang, Yang Li, Sidan Du
GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism
Chen Tang, Bo Lv, Zifan Zheng, Bohao Yang, Kun Zhao, Ning Liao, Xiaoxing Wang, Feiyu Xiong, Zhiyu Li, Nayu Liu, Jingchi Jiang
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Yaowen Ye, Cassidy Laidlaw, Jacob Steinhardt
Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques
Shobhit Ratan, Farley Knight, Ghada Jerfel, Sze Chung Ho
Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models
Dhruv Dhamani, Mary Lou Maher