Do Programmers and AI See the Same Problem?
Quantifying Cognitive Misalignment in Code Generation
DOI:
https://doi.org/10.32473/flairs.39.1.141770Keywords:
Large Language Models, Code Generation, Bloom's TaxonomyAbstract
The integration of AI assistants into software development raises fundamental questions about how task complexity is evaluated and the extent to which these evaluations align with human perception. Current evaluations focus primarily on functional correctness, overlooking this cognitive alignment. We introduce and empirically examine cognitive misalignment: the discrepancy between human and AI perceptions of a task's cognitive demands. Using Bloom’s Taxonomy, we prompt five LLMs to classify 2,520 tasks from three code generation benchmarks, and establish human reference annotations for 150 tasks via expert consensus. Results show systematic misalignment: humans predominantly classify tasks as "Apply" or "Analyze", whereas several LLMs overestimate the "Create" dimension. This gap varies by model and task type and may contribute to observed interaction frictions and productivity paradoxes. Our findings motivate the development of cognitively aware benchmarks and evaluation methods that better reflect human judgments of task complexity.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Yi Zhang, Julia Rayz

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.