19/10/2020

A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias

Michael Färber, Victoria Burkard, Adam Jatowt, Sora Lim

Keywords: crowdsourcing, text mining, media bias, news articles

Abstract: The automatic detection of bias in news articles can have a high impact on society because undiscovered news bias may influence the political opinions, social views, and emotional feelings of readers. While various analyses and approaches to news bias detection have been proposed, large data sets with rich bias annotations on a fine-grained level are still missing. In this paper, we firstly aggregate the aspects of news bias in related works by proposing a new annotation schema for labeling news bias. This schema covers the overall bias, as well as the bias dimensions (1) hidden assumptions, (2) subjectivity, and (3) representation tendencies. Secondly, we propose a methodology based on crowdsourcing for obtaining a large data set for news bias analysis and identification. We then use our methodology to create a dataset consisting of more than 2,000 sentences annotated with 43,000 bias and bias dimension labels. Thirdly, we perform an in-depth analysis of the collected data. We show that the annotation task is difficult with respect to bias and specific bias dimensions. While crowdworkers’ labels of representation tendencies correlate with experts’ bias labels for articles, subjectivity and hidden assumptions do not correlate with experts’ bias labels and, thus, seem to be less relevant when creating data sets with crowdworkers. The experts’ article labels better match the inferred crowdworkers’ article labels than the crowdworkers’ sentence labels. The crowdworkers’ countries of origin seem to affect their judgements. In our study, non-Western crowdworkers tend to annotate more bias either directly or in the form of bias dimensions (e.g., subjectivity) than Western crowdworkers do.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3340531.3412876#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CIKM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers