PKU SafeRLHF
PKU-related research focuses on developing and evaluating datasets for improving the safety and reliability of AI systems, particularly large language models (LLMs). Current efforts concentrate on creating large-scale annotated datasets to train and benchmark safety-focused reinforcement learning from human feedback (RLHF) algorithms, as well as developing methods for assessing the quality of AI-generated images and detecting anomalies in visual data like supermarket goods. These datasets and associated benchmarks are crucial for advancing research in AI safety and enabling the development of more robust and trustworthy AI systems across various applications.
Papers
June 20, 2024
March 24, 2024
November 27, 2023
July 11, 2023