Spoken or Written?

Multi-Level Topic Modeling and Explainable AI Visualization for Stylometry

Authors

DOI:

https://doi.org/10.32473/flairs.39.1.141867

Keywords:

Large Language Models, Explainable AI, Visualization, Stylometry, Linguistic Features, Writing Style, Topic Modeling

Abstract

Although distinguishing spoken from written language may seem straightforward, it remains a complex problem in stylometry and natural language processing. In this work, we present an open-source explainable AI (XAI) visualization framework for analyzing stylistic differences between spoken and written registers. Using a dataset of 41,306 sentences from transcribed speeches and written books by United States presidents, we utilize syntactic features and propose a multi-level topic modeling approach that captures semantic patterns across varying granularities. Our experiments demonstrate that linguistic features and derived features from multi-level topic modeling, Attention Enrichment, and Integrated Gradients substantially improve classification performance and interpretability. Additionally, we compare fine-tuned transformer models against prompt-based classification, showing that task-specific fine-tuning significantly outperforms zero-shot and few-shot prompting strategies. To support qualitative analysis, we develop an interactive dual-panel visualization framework that integrates UMAP-projected sentence embeddings with BERTopic clustering and token-level attribution highlighting. All artifacts, including the dataset, code, and visualizations, are publicly available.

Downloads

Published

06-05-2026

How to Cite

Rajaei Moghadam, M., Rezaei, M., Koop, D., Sun, M., & Freedman, R. (2026). Spoken or Written? Multi-Level Topic Modeling and Explainable AI Visualization for Stylometry. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141867

Issue

Section

Special Track: Applied Natural Language Processing