Bounded Degradation in Latent Representation Under Bias Subspace Removal

Authors

  • Md Nur Amin Heilbronn University of Applied Sciences
  • Phil Nemeth Heilbronn University of Applied Sciences
  • Alexander Jesser Heilbronn University of Applied Sciences

DOI:

https://doi.org/10.32473/flairs.38.1.139018

Abstract

This work investigates the concentration of demo- graphic signals in high-dimensional embeddings, focusing on a “bias subspace” that encodes sensitive at- tributes such as gender. Experiments on textual job biographies reveal that a single vector—derived by subtracting subgroup means—can correlate with gender above 0.95, indicating that only a few coordinates often capture dominant group distinctions. A further analysis using covariance differences isolates additional, though weaker, bias directions. To explain why neutralizing the principal bias dimension barely impairs classification performance, this paper introduces a Bounded Degradation Theorem. The result shows that unless a downstream classifier aligns heavily with the removed axis, any resulting logit shifts remain bounded, thus preserving accuracy. Empirical observations confirm that group-level outcomes shift, yet overall accuracy remains nearly unchanged. Theoretical and experimental insights highlight both the geo- metric underpinnings of bias in language-model embeddings and practical strategies for mitigating undesired effects, while leaving most classification power intact.

Downloads

Published

14-05-2025

How to Cite

Amin, M. N., Nemeth, P., & Jesser, A. (2025). Bounded Degradation in Latent Representation Under Bias Subspace Removal. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.139018

Issue

Section

Special Track: Explainable, Fair, and Trustworthy AI