Paper ID: 2407.16886

GPT-4's One-Dimensional Mapping of Morality: How the Accuracy of Country-Estimates Depends on Moral Domain

Pontus Strimling, Joel Krueger, Simon Karlsson

Prior research demonstrates that Open AI's GPT models can predict variations in moral opinions between countries but that the accuracy tends to be substantially higher among high-income countries compared to low-income ones. This study aims to replicate previous findings and advance the research by examining how accuracy varies with different types of moral questions. Using responses from the World Value Survey and the European Value Study, covering 18 moral issues across 63 countries, we calculated country-level mean scores for each moral issue and compared them with GPT-4's predictions. Confirming previous findings, our results show that GPT-4 has greater predictive success in high-income than in low-income countries. However, our factor analysis reveals that GPT-4 bases its predictions primarily on a single dimension, presumably reflecting countries' degree of conservatism/liberalism. Conversely, the real-world moral landscape appears to be two-dimensional, differentiating between personal-sexual and violent-dishonest issues. When moral issues are categorized based on their moral domain, GPT-4's predictions are found to be remarkably accurate in the personal-sexual domain, across both high-income (r = .77) and low-income (r = .58) countries. Yet the predictive accuracy significantly drops in the violent-dishonest domain for both high-income (r = .30) and low-income (r = -.16) countries, indicating that GPT-4's one-dimensional world-view does not fully capture the complexity of the moral landscape. In sum, this study underscores the importance of not only considering country-specific characteristics to understand GPT-4's moral understanding, but also the characteristics of the moral issues at hand.

Submitted: Jun 5, 2024