07/06/2020

Generally Curious: Thematically Distinct Datasets of General Threads on 4chan/pol/

Emilija Jokubauskaite, Stijn Peeters

Keywords: 4chan, collection, discussions, distinct, images, political, political extremism, politically incorrect, spaces, structure, topic

Abstract: Over the second half of the 2010s, the /pol/ (‘politically incorrect’) forum on the 4chan image board has emerged as a space within which various extreme political ideologies are discussed and cultivated, occasionally informing off-site acts of political extremism. While previous research has often studied this space as a unified whole, it is relevant to more specifically demarcate different publics within 4chan’s /pol/ board, apart from studying it as an ‘amorphous blob’. This paper focuses specifically on ‘generals’ — recurring threads with a specific thematic focus identified by a particular vernacular phrase or tag. By identifying them it is possible to partition the board’s archive into multiple distinct datasets comprising discussions about a particular topic, such as Donald Trump, the Syria war, or British politics. We provide a dataset containing 58,841 opening posts and 13,697,738 replies to those, divided over 329 thematically distinct general thread collections. In this paper we outline our data collection and query protocol, the structure of the data and its rationale, as well as a number of suggested research uses for this new data.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICWSM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers