Spoken or Written?
Multi-Level Topic Modeling and Explainable AI Visualization for Stylometry
DOI:
https://doi.org/10.32473/flairs.39.1.141867Keywords:
Large Language Models, Explainable AI, Visualization, Stylometry, Linguistic Features, Writing Style, Topic ModelingAbstract
Although distinguishing spoken from written language may seem straightforward, it remains a complex problem in stylometry and natural language processing. In this work, we present an open-source explainable AI (XAI) visualization framework for analyzing stylistic differences between spoken and written registers. Using a dataset of 41,306 sentences from transcribed speeches and written books by United States presidents, we utilize syntactic features and propose a multi-level topic modeling approach that captures semantic patterns across varying granularities. Our experiments demonstrate that linguistic features and derived features from multi-level topic modeling, Attention Enrichment, and Integrated Gradients substantially improve classification performance and interpretability. Additionally, we compare fine-tuned transformer models against prompt-based classification, showing that task-specific fine-tuning significantly outperforms zero-shot and few-shot prompting strategies. To support qualitative analysis, we develop an interactive dual-panel visualization framework that integrates UMAP-projected sentence embeddings with BERTopic clustering and token-level attribution highlighting. All artifacts, including the dataset, code, and visualizations, are publicly available.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mina Rajaei Moghadam, Mosab Rezaei, David Koop, Maoyuan Sun, Reva Freedman

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.