Inter-Annotator Agreement and Its Reflection in LLMs and Responsible AI.

Authors

  • Amir Toliyat Vaughn College of Aeronautics and Technology
  • Elena Filatova CUNY
  • Ronak Etemadpour

DOI:

https://doi.org/10.32473/flairs.38.1.139049

Abstract

Recent research on Responsible AI, particularly in addressing algorithmic biases, has gained significant attention. Natural Language Processing (NLP) algorithms, which rely on human-generated and human-labeled data, often reflect these challenges. In this paper, we analyze inter-annotator agreement in the task of labeling hate speech data and examine how annotators’ backgrounds influence their labeling decisions. Specifically, we investigate differences in hate speech annotations that arise when annotators identify with the targeted groups. Our findings reveal substantial differences in agreement between a general pool of annotators and those who personally relate to the targets of the hate speech they label. Additionally, we evaluate the OpenAI GPT-4o model on the same dataset. Our results highlight the need to consider annotators’ backgrounds when assessing the performance of Large Language Models (LLMs) in hate speech detection.

Downloads

Published

14-05-2025

How to Cite

Toliyat, A., Filatova, E., & Etemadpour, R. (2025). Inter-Annotator Agreement and Its Reflection in LLMs and Responsible AI. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.139049