The Impact of Data Augmentation on the Hate Speech Detection in Portuguese Language
DOI:
https://doi.org/10.32473/flairs.37.1.135307Resumen
Online communities allow users to establish a web presence, manage their identities, and stay connected with others. The internet has facilitated global outreach with just a click on the World Wide Web. However, the current landscape of online social media platforms are marred by various issues, with hate speech prominently taking center stage. Hate speech is characterized by hostile and malicious language driven by prejudice, targeting individuals or groups based on their innate, natural, or perceived characteristics. Detecting such speech is crucial for maintaining a safe online environment. This study examines the impact of dataset regularization techniques on the performance of BERTimbau-based models when applied to four Portuguese hate speech datasets: Fortuna et al. (2019), OFFCOMBR-2, ToLD-BR, and Hate-BR. Four Data Augmentation techniques are evaluated: Oversampling, Undersampling, Text Augmentation, and Synonym Replacement. Our experiments revealed that, apart from the Fortuna et al. (2019) dataset, the Data Augmentation techniques did not significantly enhance the performance of hate speech detection tasks.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2024 Félix Silva, Artur Cerri, Ulisses Brisolara Corrêa, Larissa A. de Freitas
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.