Hidden Knowledge
Hidden knowledge research explores the latent information and capabilities embedded within complex systems, particularly machine learning models, aiming to understand, extract, and mitigate their implications. Current research focuses on detecting hidden biases and vulnerabilities in models like LLMs and neural networks, employing techniques such as steganalysis, quiver representation theory, and contrastive learning to analyze hidden activations and emergent behaviors. This work is crucial for enhancing model safety, improving interpretability, and addressing concerns about fairness and security in various applications, from medical diagnosis to autonomous systems.
Papers
November 9, 2024
November 3, 2024
November 1, 2024
October 11, 2024
October 2, 2024
September 20, 2024
September 19, 2024
September 13, 2024
August 31, 2024
August 14, 2024
August 12, 2024
July 24, 2024
June 27, 2024
May 29, 2024
May 24, 2024
May 21, 2024
May 6, 2024
April 7, 2024
April 1, 2024