Truthful Space

"Truthful Space" in AI research focuses on developing large language models (LLMs) that reliably produce accurate and honest responses, avoiding both unintentional errors ("hallucinations") and deliberate deception. Current research emphasizes evaluating and improving LLM truthfulness through various methods, including analyzing internal model representations, developing new evaluation benchmarks (like TruthfulQA), and designing techniques to filter misleading information or steer models towards truthful generation. This work is crucial for building trust in LLMs and ensuring their safe and responsible deployment in diverse applications, ranging from question answering to decision support systems.

Papers

January 11, 2024

SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully
Jushi Kai, Tianhang Zhang, Hai Hu, Zhouhan Lin
Large Language Model Text Generation Truthful Space

December 29, 2023

Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning
Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu
Large Language Model Hidden Representation Truthful Space Early Intervention

December 12, 2023

Alignment for Honesty
Yuqing Yang, Ethan Chern, Xipeng Qiu, Graham Neubig, Pengfei Liu
Large Language Model Alignment Problem Truthful Space Truthful Incentive Mechanism

December 3, 2023

Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt
Game Theory Truthful Space Symbolic AI AI Deception High Performing Policy Deceptive Power Structural Causal Game

November 27, 2023

Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Kevin Liu, Stephen Casper, Dylan Hadfield-Menell, Jacob Andreas
Neural Language Model Internal Representation Truthful Space Language Model Output Intent Communication Disagreement Problem Cognitive Dissonance

November 13, 2023

On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models
Naman Goel
Non Negative Textual Response Truthful Space Posterior Probability Likely Criterion

November 6, 2023

On the Intersection of Self-Correction and Trust in Language Models
Satyapriya Krishna
Language Model Appropriate Trust Intersection Scenario Self Correction Truthful Space

October 27, 2023

Personas as a Way to Model Truthfulness in Language Models
Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He
Large Language Model Language Model New Way Truthful Space Persona Information Truthful Incentive Mechanism

October 19, 2023

Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé, Jordan Boyd-Graber
Language Model Real Human Natural Language Explanation Truthful Space Appropriate Contrastive Node Pair Contrastive Demonstration

September 13, 2023

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie
Large Language Model Natural Language Multimodal LLM Multi Modal Large Language Model Intercultural Ethic Truthful Space Tuned Llama Model Text to Vi

June 16, 2023

CANDID: Correspondence AligNment for Deep-burst Image Denoising
Arijit Mallick, Raphael Braun, Hendrik PA Lensch
Image Denoising Truthful Space Burst Image Correspondence Network Burst Denoising

June 6, 2023

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
Large Language Model Language Model Scientific Inference Truthful Space Early Intervention Inference Time Intervention

May 25, 2023

Linguistic Properties of Truthful Response
Bruce W. Lee, Benedict Florance Arockiaraj, Helen Jin
Linguistic Feature Truthful Space Linguistic Landscape Linguistic Property

April 26, 2023

The Internal State of an LLM Knows When It's Lying
Amos Azaria, Tom Mitchell
Medical LLM Internal State Truthful Space

April 20, 2023

Why Does ChatGPT Fall Short in Providing Truthful Answers?
Shen Zheng, Jie Huang, Kevin Chen-Chuan Chang
Large Language Model ChatGPT Generated Conversation Factual Claim Truthful Space Factual Recall Knowledge Memorization

February 8, 2023

Convolutional Neural Networks Trained to Identify Words Provide a Surprisingly Good Account of Visual Form Priming Effects
Dong Yin, Valerio Biscione, Jeffrey Bowers
Convolutional Neural Network Word List Truthful Space Language Similarity Word Recognition Structural Priming

October 5, 2022

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Jacob Eisenstein, Daniel Andor, Bernd Bohnet, Michael Collins, David Mimno
Language Model Pretrained Language Model Truthful Space Free Text Explanation Mean Teacher Book Question Answering