Transient Fault

Transient faults, unpredictable and temporary disruptions in hardware or software, pose a significant threat to the reliability of increasingly prevalent deep learning systems, particularly in safety-critical applications. Current research focuses on quantifying the impact of these faults on model accuracy (e.g., using metrics like resiliency accuracy) and developing fault-tolerant designs and training methods for various architectures, including deep feedforward networks and deep autoencoders, often employing deep learning techniques to mitigate fault effects. This work is crucial for ensuring the safe and dependable operation of AI systems across diverse domains, from autonomous vehicles to industrial control systems.

Papers