GPT 2

GPT-2, a large language model, is a powerful tool for generating human-quality text and has become a focus for research in mechanistic interpretability, aiming to understand its internal workings and biases. Current research investigates GPT-2's performance across diverse tasks, including multiple-choice question answering, acronym prediction, and even recipe generation, often employing techniques like sparse autoencoders to analyze its internal representations and mitigate biases like positional anchoring. This work is significant for advancing our understanding of how large language models function and for developing methods to improve their reliability and fairness in various applications.

Papers