Grammaticality Judgment

Grammaticality judgment research investigates how well large language models (LLMs), such as GPT-3 and others, perform on tasks requiring the assessment of grammatical correctness in sentences, compared to human performance. Current research focuses on evaluating LLMs' abilities across various grammatical constructions, including those that are less frequent or involve subtle semantic nuances, using different model sizes and evaluating the influence of task design. Findings reveal inconsistencies between LLM and human judgments, highlighting limitations in LLMs' understanding of grammar and meaning, despite high accuracy on simpler tasks, and emphasizing the need for more sophisticated evaluation methods to accurately assess true linguistic competence in these models.

Papers