Twitter User Account Classification to Gain Insights into Communication Dynamics and Public Awareness During Tampa Bay's Red Tide Events
DOI:
https://doi.org/10.32473/flairs.37.1.135551Keywords:
Text Classification, environmental issues, tiered labeling, social mediaAbstract
This study presents an innovative approach to analyzing environmental challenges, focusing on the localized impacts of toxic algal blooms, specifically the dinoflagellate Karenia brevis on Florida's Gulf Coast, commonly known as "red tide". Despite the extensive influence of social media in public discourse, its potential in environmental awareness remains largely untapped. Our research exploits Twitter data to examine communication trends and public understanding of red tide issues in the Tampa Bay area from 2018 to 2022. For that study period, we collected 63K tweets from 30K accounts that mentioned terms related to red tide. Our methodology involves a tiered labeling process to obtain over 15K labeled accounts. In the initial tier, we employ predefined dictionaries for account groups to establish preliminary class designations, streamlining the subsequent labeling tiers, one of which is aided by preliminary machine learning classification. Having used several text classification algorithms and feature preprocessing approaches, Support Vector Machine with Bidirectional Encoder Representations from Transformers (BERT) yielded the best cross-validation performance in both accuracy (90%) and versatility (unweighted F1 score of 0.67). Lastly, we creatively leveraged the Term Frequency-Inverse Document Frequency (TF-IDF) method to study the terms that most distinguish each user category from the rest.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Andrey Skripnikov, Tania Roy, Fehmi Neffati, Melvin Adkins, Marcus Beck
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.