Internal Representation
Internal representations in artificial neural networks, particularly large language models (LLMs) and vision-language models (VLMs), are the focus of intense research aimed at understanding how these models process information and generate outputs. Current work investigates the nature of these representations, exploring their structure, how they encode knowledge (both parametric and non-parametric), and how they relate to model behaviors like hallucination and reasoning. This research utilizes techniques like probing classifiers, activation analysis, and tensor decomposition to analyze internal states and improve model performance and reliability. Understanding internal representations is crucial for enhancing model interpretability, mitigating biases and errors, and ultimately building more robust and trustworthy AI systems.
Papers
FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs
Deema Alnuhait, Neeraja Kirtane, Muhammad Khalifa, Hao Peng
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Reichart, Idan Szpektor, Hadas Kotek, Yonatan Belinkov