Language Model
Language models are computational systems designed to understand and generate human language, primarily aiming to improve tasks like translation, question answering, and text summarization. Current research focuses on enhancing efficiency (e.g., through novel learning rate schedules and optimized architectures), improving alignment with human preferences (via preference optimization and reward modeling), and addressing biases and limitations (including techniques for mitigating toxicity and enhancing robustness). These advancements have significant implications for various fields, impacting natural language processing research and enabling the development of more powerful and reliable AI applications.
Papers
A Text Classification Model Combining Adversarial Training with Pre-trained Language Model and neural networks: A Case Study on Telecom Fraud Incident Texts
Liu Zhuoxian, Shi Tuo, Hu Xiaofeng
Reverse Prompt Engineering
Hanqing Li, Diego Klabjan
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance
Hong Meng Yam, Nathan J Paek
CriticAL: Critic Automation with Language Models
Michael Y. Li, Vivek Vajipey, Noah D. Goodman, Emily B. Fox
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Yu Gu, Boyuan Zheng, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su
Fineweb-Edu-Ar: Machine-translated Corpus to Support Arabic Small Language Models
Sultan Alrashed, Dmitrii Khizbullin, David R. Pugh
LLM Vocabulary Compression for Low-Compute Environments
Sreeram Vennam, Anish Joishy, Ponnurangam Kumaraguru
ZhoBLiMP: a Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese
Yikang Liu, Yeting Shen, Hongao Zhu, Lilong Xu, Zhiheng Qian, Siyuan Song, Kejia Zhang, Jialong Tang, Pei Zhang, Baosong Yang, Rui Wang, Hai Hu
Concept Bottleneck Language Models For protein design
Aya Abdelsalam Ismail, Tuomas Oikarinen, Amy Wang, Julius Adebayo, Samuel Stanton, Taylor Joren, Joseph Kleinhenz, Allen Goodman, Héctor Corrada Bravo, Kyunghyun Cho, Nathan C. Frey
Zyda-2: a 5 Trillion Token High-Quality Dataset
Yury Tokpanov, Paolo Glorioso, Quentin Anthony, Beren Millidge
Learning Mixtures of Experts with EM
Quentin Fruytier, Aryan Mokhtari, Sujay Sanghavi
The Empirical Impact of Data Sanitization on Language Models
Anwesan Pal, Radhika Bhargava, Kyle Hinsz, Jacques Esterhuizen, Sudipta Bhattacharya
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Yen-Ting Lin, Chao-Han Huck Yang, Zhehuai Chen, Piotr Zelasko, Xuesong Yang, Zih-Ching Chen, Krishna C Puvvada, Szu-Wei Fu, Ke Hu, Jun Wei Chiu, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang
Using Language Models to Disambiguate Lexical Choices in Translation
Josh Barua, Sanjay Subramanian, Kayo Yin, Alane Suhr
Autonomous Industrial Control using an Agentic Framework with Large Language Models
Javal Vyas, Mehmet Mercangöz
Towards Multi-Modal Mastery: A 4.5B Parameter Truly Multi-Modal Small Language Model
Ben Koska, Mojmír Horváth
Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation
Long Truong To, Hung Tuan Le, Dat Van-Thanh Nguyen, Manh Trong Nguyen, Tri Thien Nguyen, Tin Van Huynh, Kiet Van Nguyen
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun
Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning
Mohammad Ghiasvand Mohammadkhani