LLM Response

Large language model (LLM) response generation is a rapidly evolving field focused on improving the accuracy, reliability, and safety of LLM outputs, particularly in high-stakes domains like healthcare and education. Current research emphasizes mitigating issues like hallucinations and factual inaccuracies through techniques such as retrieval-augmented generation (RAG) and active inference prompting, as well as developing robust evaluation methods that go beyond traditional question-answering benchmarks. These advancements are crucial for responsible LLM deployment, impacting various fields by improving access to information, automating tasks, and enhancing decision-making processes.

Papers

July 22, 2024

Leveraging LLM Reasoning Enhances Personalized Recommender Systems
Alicia Y. Tsai, Adam Kraft, Long Jin, Chenwei Cai, Anahita Hosseini, Taibai Xu, Zemin Zhang, Lichan Hong, Ed H. Chi, Xinyang Yi
Recommender System Reasoning Task Reasoning Capability LLM Response Reasoning Question

July 15, 2024

July 10, 2024

Attribute or Abstain: Large Language Models as Long Document Assistants
Jan Buchmann, Xiao Liu, Iryna Gurevych
Evidence Piece Long Document LLM Response Multiple Attribute

July 2, 2024

Learning to Refine with Fine-Grained Natural Language Feedback
Manya Wadhwa, Xinyu Zhao, Junyi Jessy Li, Greg Durrett
Large Language Model LeArning Abstract Fine Grained LLM Response Fine Grained Feedback Model Criticism

June 21, 2024

How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions
Julia Kharchenko, Tanya Roosta, Aman Chadha, Chirag Shah
Stereotype Content Empirical Analysis Local Culture LLM Response Language Alignment Cultural Value Cultural Dimension Cultural Understanding

June 14, 2024

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues
Human Evaluation LLM Response Exam Document State of the Art Large Automatic Grading Unknown Question

May 20, 2024

Can AI Relate: Testing Large Language Model Response for Mental Health Support
Saadia Gabriel, Isha Puri, Xuhai Xu, Matteo Malgaroli, Marzyeh Ghassemi
Large Language Model Artificial Intelligence Mental Health LLM Response Conversational Assistant LLM Based Chatbot

May 9, 2024

Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias, Michael J. Curry, Sven Seuken
Large Language Model Application Proficiency LLM Response Online Advertising Self Interested Agent Auction Mechanism Fairness Aware Aggregation

April 20, 2024

UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions
Ana-Cristina Rogoz, Radu Tudor Ionescu
Large Language Model Multiple Choice Question LLM Response Difficulty Level Response Time

April 4, 2024

Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang
Medical LLM Error Feedback Error Detection LLM Response

March 21, 2024

RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
William James Bolton, Rafael Poyiadzi, Edward R. Morrell, Gabriela van Bergen Gonzalez Bueno, Lea Goetz
Large Language Model New Framework Research Assistant LLM Response LLM Evaluator LLM Powered Writing

February 27, 2024

Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim, Jeongeun Lee, Yoonho Chang, Chanyeol Choi, Junseong Kim, Jy-yong Sohn
Large Language Model Line by Line Explanation Mitigating Hallucination LLM Response LLM Generated Text Factual Error

February 19, 2024

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham
Evidence Piece LLM Response G Eval Language Model Output Factual Error Fact Checking Model

February 4, 2024

Factuality of Large Language Models: A Survey
Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov
Large Language Model Last Decade Factual Claim LLM Response Open Ended Text Generation

February 3, 2024

How well do LLMs cite relevant medical references? An evaluation framework and analyses
Kevin Wu, Eric Wu, Ally Cassasola, Angela Zhang, Kevin Wei, Teresa Nguyen, Sith Riantawan, Patricia Shi Riantawan, Daniel E. Ho, James Zou
Large Language Model Medical Text Evaluation Framework Expert Annotation Multiple Source Medical Annotation LLM Response

January 29, 2024

Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning
Kenta Izumi, Hiroki Tanaka, Kazuhiro Shidara, Hiroyoshi Adachi, Daisuke Kanayama, Takashi Kudo, Satoshi Nakamura
Comparative Study Response Generation LLM Response Cognitive Behavioral Therapy Counseling Dataset Socratic QUESTIONING

January 24, 2024

Supporting Sensemaking of Large Language Model Outputs at Scale
Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman
Large Language Model Visual Analogue Scale LLM Response Language Model Output Text Document LLM Use Sensemaking Tool

January 23, 2024

Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study
Zhe He, Balu Bhasuran, Qiao Jin, Shubo Tian, Karim Hanna, Cindy Shavor, Lisbeth Garcia Arguello, Patrick Murray, Zhiyong Lu
Large Language Model GPT 4 Generative Large Language Model Question Answer Pair LLM Based Evaluation Peer Agent LLM Response Laboratory Value

November 15, 2023

Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
Yikun Wang, Rui Zheng, Haoming Li, Qi Zhang, Tao Gui, Fei Liu
Response Generation Supervised Fine Tuning Search and Rescue LLM Response Limited Annotation Partial Order