Simultaneous count data feature selection and clustering using Multinomial Nested Dirichlet Mixture

Fares Alkhawaja; Manar Amayri; Nizar Bouguila

doi:10.32473/flairs.37.1.135262

Simultaneous count data feature selection and clustering using Multinomial Nested Dirichlet Mixture

Autores/as

Fares Alkhawaja Concordia Institute for Information Systems Engineering (CIISE)
Manar Amayri Concordia Institute for Information Systems Engineering (CIISE)
Nizar Bouguila Concordia Institute for Information Systems Engineering (CIISE)

DOI:

https://doi.org/10.32473/flairs.37.1.135262

Palabras clave:

mixture model clustering, Expectation-Maximization (EM), feature saliences, feature selection, Multinomial Nested Dirichlet Mixture (MNDM), Minimum Message Length (MML)

Resumen

The elevating effect of the curse of dimensionality in count data has made clustering a challenging task. This paper solves this by adopting the concept of feature saliency as a feature selection method in the context of using the Multinomial Nested Dirichlet Mixture (MNDM). The MNDM is a generalization of the Dirichlet Compound Mixture (DCM) that suffers from several limitations. The model learning is accomplished through the expectation-maximization method. The Minimum Message Length criterion is used to simultaneously determine the best number of components in the mixture with the updated selected features. At the price of convergence times, the results show better performance through different metrics, as the model aims to select the salient features and tune away the non-salient anomalistic features.

Descargas

PDF (English)

Publicado

2024-05-13

Cómo citar

Alkhawaja, F., Amayri, M., & Bouguila, N. (2024). Simultaneous count data feature selection and clustering using Multinomial Nested Dirichlet Mixture. The International FLAIRS Conference Proceedings, 37(1). https://doi.org/10.32473/flairs.37.1.135262

Descargar cita

Número

Vol. 37 (2024): Vol. 37 (2024): Proceedings of FLAIRS-37

Sección

Special Track: Neural Networks and Data Mining

Licencia

Derechos de autor 2024 Fares Alkhawaja, Manar Amayri, Nizar Bouguila

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.

Simultaneous count data feature selection and clustering using Multinomial Nested Dirichlet Mixture

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Desarrollado por

Enviar un artículo

Idioma