The Robot Maze Test
An Evaluation of Situated Learning for Humans and Machine Agents
DOI:
https://doi.org/10.32473/flairs.39.1.141850Keywords:
LLM, human ai agreement, AI education, Human-AI Collaboration, Human-Machine Collaboration, Automated Evaluation, model trust, Cognitive ability testsAbstract
With the burgeoning popularity of Large Language Models (LLMs) and their introduction to the workplace in multiple fields, an important question remains unexplored: what are the cognitive skills and attributes that make an individual well-suited to interact with such black-box systems? To answer this, we developed a simulated robot planning task testing an individual’s ability to infer how a novel environment influences a robot’s behavior through interactions and experimentation. Our platform revealed that users with greater system knowledge at the end of the task typically used slower, exploratory interactions and testing of hypotheses. We then extended this platform to include a code-generation LLM model to serve as a collaborative learning agent which updates a model of robot interactions through a combination of exploration and natural language guidance. We believe this framework and collected data provides an opportunity to study human-LLM situated model building, error correction performance, and alignment of learning behaviors in new environments.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ian Perera, Brady DeCouto, Christopher J. Bates, Matthew Johnson, Charles B. Patterson, Alec Treacy, Zoeanne McCurdy

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.