Paper ID: 2310.15852

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

Lina Conti, Guillaume Wisniewski

Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.

Submitted: Oct 24, 2023

Topics

Gender Bias
Path Breaking Emergence
Neural Model
Neural Language Model
Transformer Language Model
Linguistic Property
French Census

Links

arXiv PDF