PropitterX: a Twitter-based propaganda corpus extended with multiple contextual features

Jan 1, 2025·

Marco Casavantes

Manuel Montes-Y-Gómez

Delia-Irazú Hernández-Farías

Luis C. González

Alberto Barrón-Cedeño

· 0 min read

Cite DOI URL

Abstract

Propagandistic online content could be everywhere; e.g., social media, web forums, and news articles. Nonetheless, the vast majority of efforts to build computational models to automatically detect propaganda has been centered on news outlets, given that historically this space is where people used to attend to get informed. This has gradually changed. Today the Internet, and in particular some social media have become the main news and event spreaders worldwide. In this study, we explore the detection of propaganda in tweets. The originality of our contribution resides in the creation of PropitterX, a Twitter-based dataset that we extend by incorporating contextual information for each instance, thus allowing for the study not only of the contents within a tweet but also of the roles of different aspects, such as the political bias behind the post, its publication time, region of origin, and even its predominant emotion evoked. We present this corpus alongside four data sub-collections to show how different questions about propaganda detection could be posed to take advantage of this resource and further advance in this task.

Type

Journal article

Publication

Language Resources and Evaluation

Last updated on Jan 1, 2025

Authors

Marco Casavantes

PhD student

← On persuasion in spam email: A multi-granularity text analysis Jan 1, 2025

On Cross-Language Entity Label Projection and Recognition Dec 1, 2024 →