Large Language Model
Large language models (LLMs) are sophisticated AI systems designed to process and generate human-like text, aiming to improve various natural language processing tasks. Current research focuses on enhancing LLM safety, efficiency (through techniques like quantization and optimized decoding), and fairness, as well as improving their ability to perform complex reasoning and handle diverse instructions. These advancements are significant because they address critical limitations in current LLMs and pave the way for broader applications across diverse fields, including healthcare, legal tech, and autonomous systems.
Papers
FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists
Tenghao Huang, Donghee Lee, John Sweeney, Jiatong Shi, Emily Steliotes, Matthew Lange, Jonathan May, Muhao Chen
Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
Eduardo Pignatelli, Johan Ferret, Tim Rockäschel, Edward Grefenstette, Davide Paglieri, Samuel Coward, Laura Toni
Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Parameter Optimization
Thomas Savage, Stephen Ma, Abdessalem Boukil, Vishwesh Patel, Ekanath Rangan, Ivan Rodriguez, Jonathan H Chen
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang, Yazhou Zhang, Bo Wang, Lu Rong, Jing Qin
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research
Yi Yang, Hanyu Duan, Jiaxin Liu, Kar Yan Tam
Exploring Large Language Models for Product Attribute Value Identification
Kassem Sabeh, Mouna Kacimi, Johann Gamper, Robert Litschko, Barbara Plank
Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios
Aditya Joshi, Diptesh Kanojia, Heather Lent, Hour Kaing, Haiyue Song
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories
Hikaru Asano, Ryo Yonetani, Taiki Sekii, Hiroki Ouchi
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries
Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shivanna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, Rohan Anil, Ethan Dyer, Siamak Shakeri, Roopali Vij, Harsh Mehta, Vinay Ramasesh, Quoc Le, Ed Chi, Yifeng Lu, Orhan Firat, Angeliki Lazaridou, Jean-Baptiste Lespiau, Nithya Attaluri, Kate Olszewska
Enhancing TinyBERT for Financial Sentiment Analysis Using GPT-Augmented FinBERT Distillation
Graison Jos Thomas
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning
Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas
Enhancing SLM via ChatGPT and Dataset Augmentation
Tom Pieper, Mohamad Ballout, Ulf Krumnack, Gunther Heidemann, Kai-Uwe Kühnberger
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights
Mohamad Ballout, Ulf Krumnack, Gunther Heidemann, Kai-Uwe Kühnberger
LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks
Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger
Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment
Tianyu Peng, Jiajun Zhang
Scaling FP8 training to trillion-token LLMs
Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward
Dongheng Li, Yongchang Hao, Lili Mou
CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs
Junlin Lv, Yuan Feng, Xike Xie, Xin Jia, Qirong Peng, Guiming Xie
Familiarity-aware Evidence Compression for Retrieval Augmented Generation
Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen
CodePlan: Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning
Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, Minlie Huang