Language Model Behavior

Research on language model behavior aims to understand and control the diverse outputs and capabilities of large language models (LLMs), focusing on issues like reliability, bias, and alignment with human values. Current investigations explore techniques such as persona assignment, steering vectors, and knowledge circuit analysis to probe internal model mechanisms and improve control over LLM behavior, often using transformer architectures. This work is crucial for mitigating risks associated with LLMs, such as misinformation and harmful outputs, and for developing more robust and beneficial AI systems across various applications.

Papers