Steering Vector
Steering vectors are low-dimensional vectors used to subtly alter the behavior of large language models (LLMs) at inference time, without requiring retraining. Current research focuses on developing methods to extract effective steering vectors from model activations, optimizing their application for specific tasks like safety alignment, improved reasoning, and bias mitigation, often employing techniques like activation steering and mean-centring. This approach offers a computationally efficient way to improve LLM performance and safety, addressing issues like exaggerated safety responses, harmful outputs, and unreliable generalization, with implications for both enhancing model capabilities and mitigating risks in real-world applications.