Evaluation Funnel

An evaluation funnel systematically combines multiple evaluation methods to efficiently assess the performance of complex systems, such as large language models (LLMs) or recommender systems. Current research focuses on developing user-friendly interfaces for designing these funnels, improving the interpretability and efficiency of automated evaluation metrics, and addressing the challenges of evaluating nuanced aspects like attitudes and opinions within LLMs. This work is crucial for advancing the development and deployment of AI systems by enabling more rigorous and comprehensive performance assessment, ultimately leading to improved model design and more reliable applications.

Papers

October 28, 2024

Semantic Search Evaluation
Chujie Zheng, Jeffrey Wang, Shuqian Albee Zhang, Anand Kishore, Siddharth Singh
Search Query Topic Analysis Semantic Matching Semantic Search Relevance Ranking Evaluation Funnel

October 22, 2024

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
Itamar Pres, Laura Ruis, Ekdeep Singh Lubana, David Krueger
Medical LLM Model Behavior Representation Engineering Evaluation Funnel

September 20, 2024

ChainBuddy: An AI Agent System for Generating LLM Pipelines
Jingyue Zhang, Ian Arawjo
Large Language Model Side Chain Agent System Open Ended AI Assistance LLM Based Pipeline Evaluation Funnel

June 16, 2024

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter
Technical Challenge Policy Value Human Opinion Psychological Trait Human AI Alignment Evaluation Funnel

April 3, 2024

Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender Systems
Claire Schultzberg, Brammert Ottens
Recommender System Efficient Evaluation Evaluation Funnel

May 25, 2023

A Semi-Automated Corner Case Detection and Evaluation Pipeline
Isabelle Tulleners, Tobias Moers, Thomas Schulik, Martin Sedlacek
Top Level Ontology Perception System Detection Network Perception Datasets Corner Case Evaluation Funnel

March 7, 2023

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation
Yixin Liu, Alexander R. Fabbri, Yilun Zhao, Pengfei Liu, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev
Interpretable Way Automatic Metric Summarization Evaluation Neural Metric Evaluation Funnel

February 21, 2023

Assessment of Reinforcement Learning for Macro Placement
Chung-Kuan Cheng, Andrew B. Kahng, Sayak Kundu, Yucheng Wang, Zhiang Wang
Reinforcement Learning Direct Assessment Deep Reinforcement Learning Approach Google AI Evaluation Funnel