Paper ID: 2408.05535

Latent class analysis for multi-layer categorical data

Huan Qing

Traditional categorical data, often collected in psychological tests and educational assessments, are typically single-layer and gathered only once.This paper considers a more general case, multi-layer categorical data with polytomous responses. To model such data, we present a novel statistical model, the multi-layer latent class model (multi-layer LCM). This model assumes that all layers share common subjects and items. To discover subjects' latent classes and other model parameters under this model, we develop three efficient spectral methods based on the sum of response matrices, the sum of Gram matrices, and the debiased sum of Gram matrices, respectively. Within the framework of multi-layer LCM, we demonstrate the estimation consistency of these methods under mild conditions regarding data sparsity. Our theoretical findings reveal two key insights: (1) increasing the number of layers can enhance the performance of the proposed methods, highlighting the advantages of considering multiple layers in latent class analysis; (2) we theoretically show that the algorithm based on the debiased sum of Gram matrices usually performs best. Additionally, we propose an approach that combines the averaged modularity metric with our methods to determine the number of latent classes. Extensive experiments are conducted to support our theoretical results and show the powerfulness of our methods in the task of learning latent classes and estimating the number of latent classes in multi-layer categorical data with polytomous responses.

Submitted: Aug 10, 2024