Paper ID: 2410.21549

Semantic Search Evaluation

Chujie Zheng, Jeffrey Wang, Shuqian Albee Zhang, Anand Kishore, Siddharth Singh

We propose a novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system. We introduce a metric called "on-topic rate" to measure the percentage of results that are relevant to the query. To achieve this, we design a pipeline that defines a golden query set, retrieves the top K results for each query, and sends calls to GPT 3.5 with formulated prompts. Our semantic evaluation pipeline helps identify common failure patterns and goals against the metric for relevance improvements.

Submitted: Oct 28, 2024