PropitterX: a Twitter-based propaganda corpus extended with multiple contextual features

Jan 1, 2025·
Marco Casavantes
Marco Casavantes
,
Manuel Montes-Y-Gómez
,
Delia-Irazú Hernández-Farías
,
Luis C. González
,
Alberto Barrón-Cedeño
· 0 min read
Abstract
Propagandistic online content could be everywhere; e.g., social media, web forums, and news articles. Nonetheless, the vast majority of efforts to build computational models to automatically detect propaganda has been centered on news outlets, given that historically this space is where people used to attend to get informed. This has gradually changed. Today the Internet, and in particular some social media have become the main news and event spreaders worldwide. In this study, we explore the detection of propaganda in tweets. The originality of our contribution resides in the creation of PropitterX, a Twitter-based dataset that we extend by incorporating contextual information for each instance, thus allowing for the study not only of the contents within a tweet but also of the roles of different aspects, such as the political bias behind the post, its publication time, region of origin, and even its predominant emotion evoked. We present this corpus alongside four data sub-collections to show how different questions about propaganda detection could be posed to take advantage of this resource and further advance in this task.
Type
Publication
Language Resources and Evaluation