On the Varieties of Fractal Geometry of Word Embeddings

Authors

  • Anurag Naren Kallakunta UNC Charlotte and Lander University, Computer Sciences Department
  • Wlodek Zadrozny UNC Charlotte

DOI:

https://doi.org/10.32473/flairs.38.1.138951

Keywords:

word embeddings, fractals, correlation dimension, data manifold

Abstract

Prior research showed the instability of word embeddings. That
is, the neighborhoods of word vectors differ depending on corpora and training methods.
In this article we compute, using the correlation dimension algorithm, as well as a clustering dimension algorithm, the fractal
dimensions of word embeddings, such as Glove vectors and FastText and CoNLL.
We note the differences in fractal dimensions reported by us and prior work using other techniques, thereby showing the dependence of the geometry of word embeddings on algorithms used for computation.
In addition, this article answers two questions about the dimension of the local manifold of word embeddings around polysemous
words. Namely, the dimension is relatively small, i.e. 4 or less, and it does not differ from neighborhoods of non-polysemous words. We also observe in a few examples, that fractal dimensions are higher if we restrict ourselves to most frequent words, and hypothesize that this could be a more general pattern.
This article also reviews recent publications in the area, including applications of fractals to an analysis of deep neural networks.

Downloads

Published

14-05-2025

How to Cite

Kallakunta, A. N., & Zadrozny, W. (2025). On the Varieties of Fractal Geometry of Word Embeddings. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.138951