On the Varieties of Fractal Geometry of Word Embeddings
DOI:
https://doi.org/10.32473/flairs.38.1.138951Keywords:
word embeddings, fractals, correlation dimension, data manifoldAbstract
Prior research showed the instability of word embeddings. That
is, the neighborhoods of word vectors differ depending on corpora and training methods.
In this article we compute, using the correlation dimension algorithm, as well as a clustering dimension algorithm, the fractal
dimensions of word embeddings, such as Glove vectors and FastText and CoNLL.
We note the differences in fractal dimensions reported by us and prior work using other techniques, thereby showing the dependence of the geometry of word embeddings on algorithms used for computation.
In addition, this article answers two questions about the dimension of the local manifold of word embeddings around polysemous
words. Namely, the dimension is relatively small, i.e. 4 or less, and it does not differ from neighborhoods of non-polysemous words. We also observe in a few examples, that fractal dimensions are higher if we restrict ourselves to most frequent words, and hypothesize that this could be a more general pattern.
This article also reviews recent publications in the area, including applications of fractals to an analysis of deep neural networks.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Anurag Naren Kallakunta, Wlodek Zadrozny

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.