Direct Assessment
Direct assessment encompasses a broad range of techniques for evaluating diverse systems and phenomena, from the psychological traits of language models to the precision of 3D models and the performance of autonomous vehicles. Current research focuses on developing robust and reliable assessment methods, often employing machine learning models like VQ-VAEs, various neural networks (including vision transformers and graph neural networks), and large language models (LLMs) for automated analysis and evaluation. These advancements are crucial for improving the trustworthiness and reliability of AI systems, enhancing diagnostic capabilities in healthcare, and optimizing performance in various engineering and scientific domains.
Papers
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo, Boshi Wang, Muhao Chen, Huan Sun
DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack
Perry Gibson, José Cano, Elliot J. Crowley, Amos Storkey, Michael O'Boyle