Bounded Degradation in Latent Representation Under Bias Subspace Removal
DOI:
https://doi.org/10.32473/flairs.38.1.139018Abstract
This work investigates the concentration of demo- graphic signals in high-dimensional embeddings, focusing on a “bias subspace” that encodes sensitive at- tributes such as gender. Experiments on textual job biographies reveal that a single vector—derived by subtracting subgroup means—can correlate with gender above 0.95, indicating that only a few coordinates often capture dominant group distinctions. A further analysis using covariance differences isolates additional, though weaker, bias directions. To explain why neutralizing the principal bias dimension barely impairs classification performance, this paper introduces a Bounded Degradation Theorem. The result shows that unless a downstream classifier aligns heavily with the removed axis, any resulting logit shifts remain bounded, thus preserving accuracy. Empirical observations confirm that group-level outcomes shift, yet overall accuracy remains nearly unchanged. Theoretical and experimental insights highlight both the geo- metric underpinnings of bias in language-model embeddings and practical strategies for mitigating undesired effects, while leaving most classification power intact.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Md Nur Amin, Phil Nemeth, Alexander Jesser

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.