Paper ID: 2402.10213

Clustering Inductive Biases with Unrolled Networks

Jonathan Huml, Abiy Tasissa, Demba Ba

The classical sparse coding (SC) model represents visual stimuli as a linear combination of a handful of learned basis functions that are Gabor-like when trained on natural image data. However, the Gabor-like filters learned by classical sparse coding far overpredict well-tuned simple cell receptive field profiles observed empirically. While neurons fire sparsely, neuronal populations are also organized in physical space by their sensitivity to certain features. In V1, this organization is a smooth progression of orientations along the cortical sheet. A number of subsequent models have either discarded the sparse dictionary learning framework entirely or whose updates have yet to take advantage of the surge in unrolled, neural dictionary learning architectures. A key missing theme of these updates is a stronger notion of \emph{structured sparsity}. We propose an autoencoder architecture (WLSC) whose latent representations are implicitly, locally organized for spectral clustering through a Laplacian quadratic form of a bipartite graph, which generates a diverse set of artificial receptive fields that match primate data in V1 as faithfully as recent contrastive frameworks like Local Low Dimensionality, or LLD \citep{lld} that discard sparse dictionary learning. By unifying sparse and smooth coding in models of the early visual cortex through our autoencoder, we also show that our regularization can be interpreted as early-stage specialization of receptive fields to certain classes of stimuli; that is, we induce a weak clustering bias for later stages of cortex where functional and spatial segregation (i.e. topography) are known to occur. The results show an imperative for \emph{spatial regularization} of both the receptive fields and firing rates to begin to describe feature disentanglement in V1 and beyond.

Submitted: Nov 30, 2023