Cosine Similarity
Cosine similarity, a measure of the angle between vectors, is widely used to quantify the similarity between data points in high-dimensional spaces, particularly in natural language processing and machine learning. Recent research focuses on addressing limitations of cosine similarity, such as its sensitivity to dimensionality and underestimation of similarity for high-frequency words, leading to the development of alternative metrics and refined algorithms like Dimension Insensitive Euclidean Metric (DIEM) and methods incorporating L2 norm discounting. These advancements improve the accuracy and interpretability of similarity comparisons, impacting diverse applications from semantic segmentation in autonomous driving to robust representation learning and bias detection in large language models.
Papers
Comparing in context: Improving cosine similarity measures with a metric tensor
Isa M. Apallius de Vos, Ghislaine L. van den Boogerd, Mara D. Fennema, Adriana D. Correia
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings
Niko Brümmer, Albert Swart, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Themos Stafylakis, Lukáš Burget
The SAME score: Improved cosine based bias score for word embeddings
Sarah Schröder, Alexander Schulz, Barbara Hammer