Transient Fault
Transient faults, unpredictable and temporary disruptions in hardware or software, pose a significant threat to the reliability of increasingly prevalent deep learning systems, particularly in safety-critical applications. Current research focuses on quantifying the impact of these faults on model accuracy (e.g., using metrics like resiliency accuracy) and developing fault-tolerant designs and training methods for various architectures, including deep feedforward networks and deep autoencoders, often employing deep learning techniques to mitigate fault effects. This work is crucial for ensuring the safe and dependable operation of AI systems across diverse domains, from autonomous vehicles to industrial control systems.
Papers
December 5, 2022
November 1, 2022
October 16, 2022
May 28, 2022
May 24, 2022
March 14, 2022