Structural Differences Between Human and AI-Generated SQL Queries
DOI:
https://doi.org/10.32473/flairs.39.1.141617Abstract
SQL is a declarative language in which the same analytical result can often be expressed through multiple structurally distinct queries, with implications for readability and maintainability. As large language models (LLMs) are increasingly used to draft and refactor SQL, it becomes important to examine not only correctness but also structural form, and to understand how model and prompt choice shape that form. This paper presents an empirical study of 669 Snowflake SQL solutions for shared analytical tasks, including 189 high-scoring human submissions and 480 functionally correct AI-generated queries produced by three systems (ChatGPT 5.2, Cortex AI, and Claude 4.5 Sonnet) under four prompt variants designed to elicit different structural preferences. We measure verbosity (lines of code, token count, Halstead vocabulary), structural complexity (Halstead difficulty, cognitive complexity, weighted composite complexity), and SQL-specific constructs (CTEs, subqueries, aggregates, window functions, nesting depth). Because the AI subset is conditioned on accepted correct generations whereas the human subset is filtered using a score threshold, the analysis is interpreted as a comparison of retained solutions rather than a claim about first-pass success. Using nonparametric tests, we find that AI-generated SQL is significantly higher on several structure-related measures, including difficulty, nesting depth, subquery use, aggregate use, CTE use, and weighted complexity. Model and prompt analyses show strong effects on verbosity, with decomposition-oriented prompting consistently associated with increased CTE usage and model-specific differences in prompt sensitivity. The results support multi-dimensional evaluation of AI-assisted SQL beyond correctness alone.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Janka Pecuchova, Ľubomír Benko

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.