Paper ID: 2503.16945 • Published Mar 21, 2025
PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition
Ibtissam Saadi, Abdenour Hadid, Douglas W. Cunningham, Abdelmalik Taleb-Ahmed, Yassin El Hillali
Univ. BTU Cottbus-Senftenberg•Sorbonne University Abu Dhabi•Univ. Polytechnique Hauts-de-France
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Vision-Language Models (VLMs) like CLIP offer promising solutions for Dynamic
Facial Expression Recognition (DFER) but face challenges such as inefficient
full fine-tuning, high complexity, and poor alignment between textual and
visual representations. Additionally, existing methods struggle with
ineffective temporal modeling. To address these issues, we propose PE-CLIP, a
parameter-efficient fine-tuning (PEFT) framework that adapts CLIP for DFER
while significantly reducing trainable parameters while maintaining high
accuracy. PE-CLIP introduces two specialized adapters: a Temporal Dynamic
Adapter (TDA) and a Shared Adapter (ShA). The TDA is a GRU-based module with
dynamic scaling that captures sequential dependencies while emphasizing
informative temporal features and suppressing irrelevant variations. The ShA is
a lightweight adapter that refines representations within both textual and
visual encoders, ensuring consistency and efficiency. Additionally, we
integrate Multi-modal Prompt Learning (MaPLe), introducing learnable prompts
for visual and action unit-based textual inputs, enhancing semantic alignment
between modalities and enabling efficient CLIP adaptation for dynamic tasks. We
evaluate PE-CLIP on two benchmark datasets, DFEW and FERV39K, achieving
competitive performance compared to state-of-the-art methods while requiring
fewer trainable parameters. By balancing efficiency and accuracy, PE-CLIP sets
a new benchmark in resource-efficient DFER. The source code of the proposed
PE-CLIP will be publicly available at this https URL .
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.