Twitter User Account Classification to Gain Insights into Communication Dynamics and Public Awareness During Tampa Bay's Red Tide Events
DOI:
https://doi.org/10.32473/flairs.37.1.135551Palabras clave:
Text Classification, environmental issues, tiered labeling, social mediaResumen
This study presents an innovative approach to analyzing environmental challenges, focusing on the localized impacts of toxic algal blooms, specifically the dinoflagellate Karenia brevis on Florida's Gulf Coast, commonly known as "red tide". Despite the extensive influence of social media in public discourse, its potential in environmental awareness remains largely untapped. Our research exploits Twitter data to examine communication trends and public understanding of red tide issues in the Tampa Bay area from 2018 to 2022. For that study period, we collected 63K tweets from 30K accounts that mentioned terms related to red tide. Our methodology involves a tiered labeling process to obtain over 15K labeled accounts. In the initial tier, we employ predefined dictionaries for account groups to establish preliminary class designations, streamlining the subsequent labeling tiers, one of which is aided by preliminary machine learning classification. Having used several text classification algorithms and feature preprocessing approaches, Support Vector Machine with Bidirectional Encoder Representations from Transformers (BERT) yielded the best cross-validation performance in both accuracy (90%) and versatility (unweighted F1 score of 0.67). Lastly, we creatively leveraged the Term Frequency-Inverse Document Frequency (TF-IDF) method to study the terms that most distinguish each user category from the rest.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2024 Andrey Skripnikov, Tania Roy, Fehmi Neffati, Melvin Adkins, Marcus Beck
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.