Paper ID: 2410.19974 • Published Oct 25, 2024
Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements
Silvia Terragni, Hoang Cuong, Joachim Daiber, Pallavi Gudipati, Pablo N. Mendes
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Large Language Models (LLMs) have demonstrated potential as effective search
relevance evaluators. However, there is a lack of comprehensive guidance on
which models consistently perform optimally across various contexts or within
specific use cases. In this paper, we assess several LLMs and Multimodal
Language Models (MLLMs) in terms of their alignment with human judgments across
multiple multimodal search scenarios. Our analysis investigates the trade-offs
between cost and accuracy, highlighting that model performance varies
significantly depending on the context. Interestingly, in smaller models, the
inclusion of a visual component may hinder performance rather than enhance it.
These findings highlight the complexities involved in selecting the most
appropriate model for practical applications.