Have (A)I Seen this Before?
Exploring LLM Metacognition Using Self-Assessed Rankings and Scoring
DOI:
https://doi.org/10.32473/flairs.39.1.141862Keywords:
Large Language Model, Metacognition, Self-Assessment, Evaluation, Learning, Biologically Inspired ComputingAbstract
Large Language Models (LLMs) commonly report high confidence, even in domains where their underlying knowledge or training data is limited. This mismatch can negatively impact model reliability, particularly affecting educational applications where users may not recognize errors. To detect these knowledge gaps, LLM knowledge must be assessed after training. In this work, we compare LLM prompts to self-assess knowledge of content in two ways: rank-ordering and direct confidence scoring (e.g., 1-5). For human metacognition, rankings or A/B comparisons are more reliable, so we hypothesize that LLMs’ rankings may also be more effective than scores. We compare LLM-generated Overall Rankings and confidence scores for 15 topics against two external estimates of LLM knowledge: expert human ratings and search result counts from Bing, Google, and Wikipedia. We also consider Anchored Rankings in which each document to be rated by the LLM is compared to a set of documents with known expert scores. Comparing across different document representations and different LLMs, Spearman correlations with expert ratings are generally: positive and relatively high with Anchored Rankings having the highest correlation (ρ ranging from 0.74 to 1.0) followed by Overall Rankings and then confidence scores. In contrast, search-based signals have a weaker and variable alignment suggesting that web popularity is a noisy signal for estimating LLM familiarity with content. Overall, these findings suggest that relative self-assessment through rankings provides an interpretable signal of LLM self-knowledge. This can be used to select specialized prompts or workflows for topics where an LLM has less knowledge.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Anura Deshpande, Celine Cerezci, Benjamin Nye, Mark G. Core, Suvaditya Mukherjee, Joshua Shay Kricheli

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.