Attack Paradigm
Attack paradigms in machine learning explore vulnerabilities in models, primarily focusing on how adversarial inputs can elicit unexpected or harmful outputs. Current research emphasizes developing both sophisticated attacks, such as those leveraging bijection learning or exploiting internal model flaws to generate targeted responses, and robust defenses, including versatile methods that adapt to diverse attack strategies and those employing reinforcement learning for improved detection. This research is crucial for enhancing the security and reliability of machine learning systems across various applications, from language models to image recognition, by identifying and mitigating vulnerabilities before deployment.
Papers
October 2, 2024
August 27, 2024
March 13, 2024
December 13, 2023
December 3, 2023
February 19, 2023
July 25, 2022