Language Model
Language models are computational systems designed to understand and generate human language, primarily aiming to improve tasks like translation, question answering, and text summarization. Current research focuses on enhancing efficiency (e.g., through novel learning rate schedules and optimized architectures), improving alignment with human preferences (via preference optimization and reward modeling), and addressing biases and limitations (including techniques for mitigating toxicity and enhancing robustness). These advancements have significant implications for various fields, impacting natural language processing research and enabling the development of more powerful and reliable AI applications.
Papers
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
Thang M. Pham, Phat T. Nguyen, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Trung Bui
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee, Jiwoong Park, Jinseok Kim, Yongjik Kim, Jungju Oh, Jinwook Oh, Jungwook Choi
Adaptive Decoding via Latent Preference Optimization
Shehzaad Dhuliawala, Ilia Kulikov, Ping Yu, Asli Celikyilmaz, Jason Weston, Sainbayar Sukhbaatar, Jack Lanchantin
On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse
Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas
Accelerating Knowledge Graph and Ontology Engineering with Large Language Models
Cogan Shimizu, Pascal Hitzler
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, Xiaohui Zeng
BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
Akari Haga, Akiyo Fukatsu, Miyu Oba, Arianna Bisazza, Yohei Oseki
Enhancing Financial Domain Adaptation of Language Models via Model Augmentation
Kota Tanabe, Masanori Hirano, Kazuki Matoya, Kentaro Imajo, Hiroki Sakaji, Itsuki Noda
Language Models for Music Medicine Generation
Emmanouil Nikolakakis, Joann Ching, Emmanouil Karystinaios, Gabrielle Sipin, Gerhard Widmer, Razvan Marinescu
Can sparse autoencoders be used to decompose and interpret steering vectors?
Harry Mayne, Yushi Yang, Adam Mahdi
Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
Dingdong Wang, Mingyu Cui, Dongchao Yang, Xueyuan Chen, Helen Meng
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models
Felix Stahlberg, Jared Lichtarge, Shankar Kumar
Neural Topic Modeling with Large Language Models in the Loop
Xiaohao Yang, He Zhao, Weijie Xu, Yuanyuan Qi, Jueqing Lu, Dinh Phung, Lan Du
Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
Farouq Sammour, Jia Xu, Xi Wang, Mo Hu, Zhenyu Zhang
Language Models as Causal Effect Generators
Lucius E.J. Bynum, Kyunghyun Cho
Derivational Morphology Reveals Analogical Generalization in Large Language Models
Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich Schütze, Janet Pierrehumbert
Towards Low-bit Communication for Tensor Parallel LLM Inference
Harry Dong, Tyler Johnson, Minsik Cho, Emad Soroush
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders
Xiaofeng Zhu, Jaya Krishna Mandivarapu