Paper ID: 2503.07853 • Published Mar 10, 2025
Learning and Evaluating Hierarchical Feature Representations
Depanshu Sani, Saket Anand
Indraprastha Institute of Information Technology
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Hierarchy-aware representations ensure that the semantically closer classes
are mapped closer in the feature space, thereby reducing the severity of
mistakes while enabling consistent coarse-level class predictions. Towards this
end, we propose a novel framework, Hierarchical Composition of Orthogonal
Subspaces (Hier-COS), which learns to map deep feature embeddings into a vector
space that is, by design, consistent with the structure of a given taxonomy
tree. Our approach augments neural network backbones with a simple
transformation module that maps learned discriminative features to subspaces
defined using a fixed orthogonal frame. This construction naturally improves
the severity of mistakes and promotes hierarchical consistency. Furthermore, we
highlight the fundamental limitations of existing hierarchical evaluation
metrics popularly used by the vision community and introduce a preference-based
metric, Hierarchically Ordered Preference Score (HOPS), to overcome these
limitations. We benchmark our method on multiple large and challenging datasets
having deep label hierarchies (ranging from 3 - 12 levels) and compare with
several baselines and SOTA. Through extensive experiments, we demonstrate that
Hier-COS achieves state-of-the-art hierarchical performance across all the
datasets while simultaneously beating top-1 accuracy in all but one case. We
also demonstrate the performance of a Vision Transformer (ViT) backbone and
show that learning a transformation module alone can map the learned features
from a pre-trained ViT to Hier-COS and yield substantial performance benefits.
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.