Inter-Annotator Agreement and Its Reflection in LLMs and Responsible AI.
DOI:
https://doi.org/10.32473/flairs.38.1.139049Abstract
Recent research on Responsible AI, particularly in addressing algorithmic biases, has gained significant attention. Natural Language Processing (NLP) algorithms, which rely on human-generated and human-labeled data, often reflect these challenges. In this paper, we analyze inter-annotator agreement in the task of labeling hate speech data and examine how annotators’ backgrounds influence their labeling decisions. Specifically, we investigate differences in hate speech annotations that arise when annotators identify with the targeted groups. Our findings reveal substantial differences in agreement between a general pool of annotators and those who personally relate to the targets of the hate speech they label. Additionally, we evaluate the OpenAI GPT-4o model on the same dataset. Our results highlight the need to consider annotators’ backgrounds when assessing the performance of Large Language Models (LLMs) in hate speech detection.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Amir Toliyat, Elena Filatova, Ronak Etemadpour

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.