Hidden Knowledge

Hidden knowledge research explores the latent information and capabilities embedded within complex systems, particularly machine learning models, aiming to understand, extract, and mitigate their implications. Current research focuses on detecting hidden biases and vulnerabilities in models like LLMs and neural networks, employing techniques such as steganalysis, quiver representation theory, and contrastive learning to analyze hidden activations and emergent behaviors. This work is crucial for enhancing model safety, improving interpretability, and addressing concerns about fairness and security in various applications, from medical diagnosis to autonomous systems.

Papers