Paper ID: 2504.02329 • Published Apr 3, 2025
Towards Assessing Deep Learning Test Input Generators
Seif Mzoughi, Ahmed Hajyahmed, Mohamed Elshafei, Foutse Khomh anb Diego Elias Costa
Polytechnique Montreal•Concordia University
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Deep Learning (DL) systems are increasingly deployed in safety-critical
applications, yet they remain vulnerable to robustness issues that can lead to
significant failures. While numerous Test Input Generators (TIGs) have been
developed to evaluate DL robustness, a comprehensive assessment of their
effectiveness across different dimensions is still lacking. This paper presents
a comprehensive assessment of four state-of-the-art TIGs--DeepHunter,
DeepFault, AdvGAN, and SinVAD--across multiple critical aspects:
fault-revealing capability, naturalness, diversity, and efficiency. Our
empirical study leverages three pre-trained models (LeNet-5, VGG16, and
EfficientNetB3) on datasets of varying complexity (MNIST, CIFAR-10, and
ImageNet-1K) to evaluate TIG performance. Our findings reveal important
trade-offs in robustness revealing capability, variation in test case
generation, and computational efficiency across TIGs. The results also show
that TIG performance varies significantly with dataset complexity, as tools
that perform well on simpler datasets may struggle with more complex ones. In
contrast, others maintain steadier performance or better scalability. This
paper offers practical guidance for selecting appropriate TIGs aligned with
specific objectives and dataset characteristics. Nonetheless, more work is
needed to address TIG limitations and advance TIGs for real-world,
safety-critical systems.
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.