Encoded Concept
Encoded concepts research aims to understand how human-interpretable concepts are represented within the latent spaces of complex machine learning models, primarily to improve model interpretability and trustworthiness. Current efforts focus on developing methods to identify and manipulate these encoded concepts using various techniques, including probabilistic encoding with energy-based models, binarized regularization in generative models, and subspace analysis in transformer networks. This research is crucial for building more reliable and explainable AI systems, fostering greater understanding of model behavior and potentially leading to improved model design and more effective human-computer interaction.
Papers
September 22, 2024
October 3, 2023
March 22, 2023
February 25, 2023
February 7, 2023
July 27, 2022
June 27, 2022