Differential Testing

Differential testing, a software testing method comparing outputs of different implementations of the same algorithm, is increasingly used to ensure the reliability of machine learning (ML) systems, particularly deep learning libraries and federated learning frameworks. Current research focuses on leveraging large language models (LLMs) to automatically generate diverse test inputs and oracles, improving the efficiency and effectiveness of this approach across various applications, including medical rule engines and automated speech recognition. This methodology is crucial for identifying subtle bugs and vulnerabilities in complex ML systems, enhancing their robustness and trustworthiness in critical applications.

Papers