Semantic Leakage

Semantic leakage refers to the unintended inclusion of irrelevant information in machine learning models' outputs, hindering accurate representation of the intended semantics. Current research focuses on mitigating this issue in various contexts, including cross-lingual embeddings, language models, and text-to-image generation, employing techniques like orthogonality constraints, bounded attention mechanisms, and data augmentation to disentangle semantic and non-semantic information. Addressing semantic leakage is crucial for improving the reliability and robustness of AI systems across diverse applications, ranging from natural language processing to knowledge graph construction and computer vision.

Papers