Paper ID: 2402.14379

Novi jezi\v{c}ki modeli za srpski jezik

Mihailo Škorić

The paper will briefly present the development history of transformer-based language models for the Serbian language. Several new models for text generation and vectorization, trained on the resources of the Society for Language Resources and Technologies, will also be presented. Ten selected vectorization models for Serbian, including two new ones, will be compared on four natural language processing tasks. Paper will analyze which models are the best for each selected task, how does their size and the size of their training sets affect the performance on those tasks, and what is the optimal setting to train the best language models for the Serbian language.

Submitted: Feb 22, 2024

Topics

Language Model
Text Generation
Transformer Based Language Model
Natural Language Processing Task
Vectorization Method

Links

arXiv PDF