Latent Beta-Liouville Probabilistic Modeling for Bursty Topic Discovery in Textual Data

Shadan Ghadimi; Hafsa Ennajari; Nizar Bouguila

doi:10.32473/flairs.37.1.135043

Latent Beta-Liouville Probabilistic Modeling for Bursty Topic Discovery in Textual Data

Autores/as

Shadan Ghadimi Concordia University https://orcid.org/0009-0005-8822-6470
Hafsa Ennajari Concordia Institute for Information Systems Engineering
Nizar Bouguila Concordia Institute for Information Systems Engineering

DOI:

https://doi.org/10.32473/flairs.37.1.135043

Palabras clave:

Topic Modeling, word burstiness, Beta-Liouville Distribution, Dirichlet Compound Multinomial Distribution, Natural Language Processing

Resumen

Topic modeling has become a fundamental technique for uncovering latent thematic structures within large collections of textual data. However, conventional models often struggle to capture the burstiness of topics. This characteristic, where the occurrence of a word increases its likelihood of subsequent appearances in a document, is fundamental in natural language processing. To address this gap, we introduce a novel topic modeling framework, integrating Beta-Liouville and Dirichlet Compound Multinomial distributions. Our approach, named Beta-Liouville Dirichlet Compound Multinomial Latent Dirichlet Allocation (BLDCMLDA), is designed to specifically model word burstiness and support a wide range of adaptable topic proportion patterns. Through experiments on diverse benchmark text datasets, the BLDCMLDA model has demonstrated superior performance over conventional models. Our promising results in terms of perplexity and coherence scores demonstrate the effectiveness of BLDCMLDA in capturing the nuances of word usage dynamics in natural language.

Descargas

PDF (English)

Publicado

2024-05-13

Cómo citar

Ghadimi, S., Ennajari, H., & Bouguila, N. (2024). Latent Beta-Liouville Probabilistic Modeling for Bursty Topic Discovery in Textual Data. The International FLAIRS Conference Proceedings, 37(1). https://doi.org/10.32473/flairs.37.1.135043

Descargar cita

Número

Vol. 37 (2024): Vol. 37 (2024): Proceedings of FLAIRS-37

Sección

Main Track Proceedings

Licencia

Derechos de autor 2024 Shadan Ghadimi, Hafsa Ennajari, Nizar Bouguila

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.

Latent Beta-Liouville Probabilistic Modeling for Bursty Topic Discovery in Textual Data

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Desarrollado por

Enviar un artículo

Idioma