Trigger Inversion

Trigger inversion is a technique used to detect and remove backdoors from machine learning models, particularly deep neural networks, that have been maliciously modified to exhibit unintended behavior when presented with specific "trigger" inputs. Current research focuses on improving the efficiency and effectiveness of trigger inversion algorithms, often employing optimization techniques like gradient-based methods and joint optimization across multiple modalities (e.g., image and text). Successfully identifying and neutralizing these backdoors is crucial for ensuring the security and reliability of AI systems across various applications, ranging from code analysis to visual question answering, where compromised models could have significant real-world consequences.

Papers