Paper ID: 2412.03584

Resampled Mutual Information for Clustering and Community Detection

Cheaheon Lim

We introduce resampled mutual information (ResMI), a novel measure of clustering similarity that combines insights from information theoretic and pair counting approaches to clustering and community detection. Similar to chance-corrected measures, ResMI satisfies the constant baseline property, but it has the advantages of not requiring adjustment terms and being fully interpretable in the language of information theory. Experiments on synthetic datasets demonstrate that ResMI is robust to common biases exhibited by existing measures, particularly in settings with high cluster counts and asymmetric cluster distributions. Additionally, we show that ResMI identifies meaningful community structures in two real contact tracing networks.

Submitted: Nov 21, 2024