Paper ID: 2208.04548
Inconsistencies in the Definition and Annotation of Student Engagement in Virtual Learning Datasets: A Critical Review
Shehroz S. Khan, Ali Abedi, Tracey Colella
Background: Student engagement (SE) in virtual learning can have a major impact on meeting learning objectives and program dropout risks. Developing Artificial Intelligence (AI) models for automatic SE measurement requires annotated datasets. However, existing SE datasets suffer from inconsistent definitions and annotation protocols mostly unaligned with the definition of SE in educational psychology. This issue could be misleading in developing generalizable AI models and make it hard to compare the performance of these models developed on different datasets. The objective of this critical review was to explore the existing SE datasets and highlight inconsistencies in terms of differing engagement definitions and annotation protocols. Methods: Several academic databases were searched for publications introducing new SE datasets. The datasets containing students' single- or multi-modal data in online or offline computer-based virtual learning sessions were included. The definition and annotation of SE in the existing datasets were analyzed based on our defined seven dimensions of engagement annotation: sources, data modalities, timing, temporal resolution, level of abstraction, combination, and quantification. Results: Thirty SE measurement datasets met the inclusion criteria. The reviewed SE datasets used very diverse and inconsistent definitions and annotation protocols. Unexpectedly, very few of the reviewed datasets used existing psychometrically validated scales in their definition of SE. Discussion: The inconsistent definition and annotation of SE are problematic for research on developing comparable AI models for automatic SE measurement. Some of the existing SE definitions and protocols in settings other than virtual learning that have the potential to be used in virtual learning are introduced.
Submitted: Aug 9, 2022