Sudoku Sage

Evaluating Correctness of LLM-Generated Moves as a Constraint Satisfaction Task

Authors

  • Aidan Gillespie Tennessee Technological University
  • Vladimir Serov Tennessee Technological University
  • Drew Phelps Tennessee Technological University
  • Amr Hilal Tennessee Technological University

DOI:

https://doi.org/10.32473/flairs.39.1.141841

Abstract

Large Language Models (LLMs) frequently fail on constraint satisfaction problems where correctness is binary and violations are immediately detectable. The popular game Sudoku is an example of this type of problem and provides a useful test case for evaluating such failures, as every proposed move must obey strict row, column, and subgrid constraints. In this work, we evaluate the correctness of LLM-generated Sudoku moves across puzzles of varying difficulty, where difficulty is defined by the number of missing cells and their distribution across the grid. The model is prompted to propose a single candidate move given only a textual representation of the current board, with no solver-derived information, verification, or feedback provided at inference time. Model performance is measured as the fraction of proposed moves that match a verified solution. Our results show that move correctness is strongly dependent on puzzle sparsity. Accuracy remains high for low-sparsity puzzles, where constraints are explicit and many moves are forced, but degrades sharply as sparsity increases and the space of plausible candidate moves expands. These findings characterize a clear limitation of ungrounded LLM prompting, in which the model is asked to propose a move given only the current board state without access to solver-derived constraints, verification, or feedback, and highlight the challenges posed by under-determined decision settings.

Downloads

Published

06-05-2026

How to Cite

Gillespie, A., Serov, V., Phelps, D., & Hilal, A. (2026). Sudoku Sage: Evaluating Correctness of LLM-Generated Moves as a Constraint Satisfaction Task. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141841

Issue

Section

Special Track: AI in Games, Serious Games, and Multimedia