Addressing a Bias in Evaluating of Student Self-Explanations of Worked Programming Examples

Authors

DOI:

https://doi.org/10.32473/flairs.39.1.141405

Keywords:

Large Language Models, Biases, Programming Explanations, Code Comprehension, Automated Evaluation

Abstract

Worked examples are step-by-step solutions to problems in a specific domain, offered to students to acquire domain-specific problem-solving skills. The power of worked examples could be magnified by combining them with self-explanations, which ask students to explain rather than passively study each problem-solving step. The main challenge of this approach is assessing the correctness of the student's explanations. In the current approach, student explanations are judged by their semantic similarity to an explanation provided by an instructor or domain expert. However, recent studies of example explanations in the domain of programming demonstrated that many students express themselves very differently from domain experts. In this situation, a traditional semantic similarity approach might introduce bias against students who correctly explain worked examples but are considerably different from expert explanations. In this paper, we use a recently published dataset to compare several explanation-assessment approaches based on semantic similarity with alternative approaches based on direct Large Language Model (LLM) prompting. Our results show that the use of LLMs enables worked example systems that follow an active learning approach to reduce bias in evaluating example explanations.

Author Biographies

Arun Balajiee Lekshmi Narayanan, University of Pittsburgh

I am a PhD student in the Intelligent Systems Program at the University of Pittsburgh, where my research focuses on developing personalized educational technologies for undergraduate computer science students. My research interests lie in the application of artificial intelligence to education, with a focus on data analysis, data mining, machine learning, and natural language processing.

Xiang Lorraine Li, University of Pittsburgh

She is an assistant professor in the Department of Computer Science at the University of Pittsburgh. 

Her research interests are at the intersection of natural language processing and machine learning. In particular, 

  1. Understand model behavior via evaluation benchmark design and exploration around the meaning of model parameters in complex or long-tail situations.
  2. Understand and evaluate models’ ability to perform complex reasoning using atomic knowledge articles.
  3. I’m interested in applying current LLM techniques in high-impact domains, such as law and education, to study the model’s behavior and limitations. My overall research goal is to construct socially responsible, equitable, and robust models that cater to diverse users, populations, cultures, and scenarios.

 

Peter Brusilovsky, University of Pittsburgh

Research Interests

Downloads

Published

06-05-2026

How to Cite

Lekshmi Narayanan, A. B., Li, X. L., & Brusilovsky, P. (2026). Addressing a Bias in Evaluating of Student Self-Explanations of Worked Programming Examples. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141405

Issue

Section

Special Track: Explainable, Fair, and Trustworthy AI