The LLM-Augmented Clustering for Customer Support Ticket Triage
A Comparative Study on the ABCD Dataset
DOI:
https://doi.org/10.32473/flairs.39.1.141558Keywords:
customer support, ticket triage, clustering, large language models, HDBSCAN, UMAP, text embeddings, semantic normalizationAbstract
Automatically clustering customer support tickets into coherent issue groups is critical for efficient triage, root-
cause analysis, and resource allocation. However, support ticket text is short, noisy, and exhibits high lexical variance for semantically identical issues, making traditional clustering methods unreliable. This paper presents a comparative study of four clustering approaches on the Action-Based Conversations Dataset (ABCD): online clustering, K-Means with TF-IDF, UMAP with HDBSCAN on dense embeddings, and a novel LLM-augmented pipeline that uses a large language model to extract normalized issue statements before embedding and clustering. Results show that LLM-based semantic normalization before clustering is the single largest contributor to cluster quality, improving silhouette scores and human-rated coherence over all baselines. The hybrid keyword-plus-LLM filtering stage also reduces API costs while maintaining high recall.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 FNU PRITI SAGAR

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.