Local Intrinsic
Local intrinsic dimension (LID) quantifies the effective dimensionality of data points within their local neighborhood, revealing the complexity of the underlying data manifold. Current research focuses on developing efficient LID estimation methods, particularly leveraging deep generative models like diffusion models and employing techniques from information geometry and extreme value theory to improve accuracy and scalability, even for high-dimensional data like images. These advancements are impacting diverse fields, including outlier detection, the assessment of large language model outputs, and the improvement of self-supervised learning algorithms by addressing issues like dimensional collapse. Ultimately, accurate LID estimation enhances our understanding of data structure and improves the performance of various machine learning tasks.
Papers
CoLafier: Collaborative Noisy Label Purifier With Local Intrinsic Dimensionality Guidance
Dongyu Zhang, Ruofan Hu, Elke Rundensteiner
Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis
Alastair Anderberg, James Bailey, Ricardo J. G. B. Campello, Michael E. Houle, Henrique O. Marques, Miloš Radovanović, Arthur Zimek