Winning Isn't Reasoning
Evaluating Iterative Reasoning Updating in Language Models
DOI:
https://doi.org/10.32473/flairs.39.1.141832Abstract
Large language models (LLMs) are increasingly deployed in interactive systems such as recommendation, negotiation, and decision support, where agents engage in iterative reasoning by inferring latent preferences and adapting their behavior based on feedback over time. An interesting question is whether LLMs perform iterative reasoning as effectively as classical decision-theoretic strategies explicitly designed to reduce uncertainty or regret. We study this question in a controlled setting using Wordle as a diagnostic testbed. Wordle yields structured, deterministic feedback that progressively constrains the hypothesis space and allows for information acquisition through queries, closely mirroring core processes in preference elicitation and opponent modeling. Across 100 games per configuration (5,400+ total runs), we evaluate LLM-based agents and compare them to a Value of Information (VOI) policy, a minimax-regret policy (CSS), and 2 random baselines. Beyond win rate, we measure convergence efficiency, distance to solution, and sensitivity to feedback across rounds. Our results show that while LLMs can achieve competitive win rates, they are less reliable at systematic uncertainty reduction and typically converge more slowly than classical methods. These findings offer some clarity as to when decision-theoretic policies and when LLMs are best suited for iterative interaction, providing insights for the design of interactive AI systems.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Kevin Scroggins, Kweku Yamoah, Emmanuel Dorley

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.