The LLM-Augmented Clustering for Customer Support Ticket Triage

A Comparative Study on the ABCD Dataset

Authors

  • Priti Sagar DREXEL UNIVERSITY

DOI:

https://doi.org/10.32473/flairs.39.1.141558

Keywords:

customer support, ticket triage, clustering, large language models, HDBSCAN, UMAP, text embeddings, semantic normalization

Abstract

Automatically clustering customer support tickets into coherent issue groups is critical for efficient triage, root-

cause analysis, and resource allocation. However, support ticket text is short, noisy, and exhibits high lexical variance for semantically identical issues, making traditional clustering methods unreliable. This paper presents a comparative study of four clustering approaches on the Action-Based Conversations Dataset (ABCD): online clustering, K-Means with TF-IDF, UMAP with HDBSCAN on dense embeddings, and a novel LLM-augmented pipeline that uses a large language model to extract normalized issue statements before embedding and clustering. Results show that LLM-based semantic normalization before clustering is the single largest contributor to cluster quality, improving silhouette scores and human-rated coherence over all baselines. The hybrid keyword-plus-LLM filtering stage also reduces API costs while maintaining high recall.

Downloads

Published

06-05-2026

How to Cite

PRITI SAGAR, F. (2026). The LLM-Augmented Clustering for Customer Support Ticket Triage: A Comparative Study on the ABCD Dataset. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141558