Character Embeddings

Character embeddings represent textual units (characters, words, or even radicals in ideographic languages) as numerical vectors, aiming to capture semantic and contextual information for downstream tasks. Current research focuses on improving these embeddings for diverse applications, including number representation, character re-identification in comics, and analyzing historical linguistic changes, often leveraging deep learning architectures like convolutional neural networks and transformers, sometimes incorporating additional linguistic information or mathematical priors. These advancements enhance various NLP tasks, such as quotation attribution, grapheme-to-phoneme conversion, and hate speech detection, by providing richer and more robust representations of textual data.

Papers