SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
DOI:
https://doi.org/10.32473/flairs.38.1.138727Abstract
Assessing student responses is a critical task in adaptive educational systems. More specifically, automatically evaluating students' self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive educational systems. To facilitate the development of Artificial Intelligence (AI) and Machine Learning models for automated assessment of learners' self-explanations, annotated datasets are essential. In response to this need, we developed the SelfCode2.0 corpus, which consists of 3,019 pairs of student and expert explanations of Java code snippets, each annotated with semantic similarity, correctness, and completeness scores provided by experts. Alongside the dataset, we also provide performance results obtained with several baseline models based on TF-IDF and Sentence-BERT vectorial representations. This work aims to enhance the effectiveness of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jeevan Chapagain, Arun Balajiee Lekshmi Narayanan, Kamil Akhuseyinoglu, Peter Brusilovsky, Vasile Rus

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.