Hidden Knowledge
Hidden knowledge research explores the latent information and capabilities embedded within complex systems, particularly machine learning models, aiming to understand, extract, and mitigate their implications. Current research focuses on detecting hidden biases and vulnerabilities in models like LLMs and neural networks, employing techniques such as steganalysis, quiver representation theory, and contrastive learning to analyze hidden activations and emergent behaviors. This work is crucial for enhancing model safety, improving interpretability, and addressing concerns about fairness and security in various applications, from medical diagnosis to autonomous systems.
Papers
August 14, 2024
August 12, 2024
July 24, 2024
June 27, 2024
May 29, 2024
May 24, 2024
May 21, 2024
May 6, 2024
April 7, 2024
April 1, 2024
March 14, 2024
February 8, 2024
December 6, 2023
November 23, 2023
November 22, 2023
November 21, 2023
October 10, 2023
September 27, 2023
August 13, 2023
July 27, 2023