SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment

Authors

DOI:

https://doi.org/10.32473/flairs.38.1.138727

Abstract

Assessing student responses is a critical task in adaptive educational systems. More specifically, automatically evaluating students' self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive educational systems. To facilitate the development of Artificial Intelligence (AI) and Machine Learning models for automated assessment of learners' self-explanations, annotated datasets are essential. In response to this need, we developed the SelfCode2.0 corpus, which consists of 3,019 pairs of student and expert explanations of Java code snippets, each annotated with semantic similarity, correctness, and completeness scores provided by experts. Alongside the dataset, we also provide performance results obtained with several baseline models based on TF-IDF and Sentence-BERT vectorial representations. This work aims to enhance the effectiveness of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming.

Downloads

Published

14-05-2025

How to Cite

Chapagain, J., Arun Balajiee Lekshmi Narayanan, Kamil Akhuseyinoglu, Peter Brusilovsky, & Vasile Rus. (2025). SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.138727