Mega-COV: A billion-scale dataset of 100+ languages for COVID-19

19/04/2021

Mega-COV: A billion-scale dataset of 100+ languages for COVID-19

Muhammad Abdul-Mageed, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Dinesh Pabbi, Kunal Verma, Rannie Lin

Keywords:

Abstract Paper Similar Papers

Abstract: We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 268 countries), longitudinal (goes as back as 2007), multilingual (comes in 100+ languages), and has a significant number of location-tagged tweets (~169M tweets). We release tweet IDs from the dataset. We also develop two powerful models, one for identifying whether or not a tweet is related to the pandemic (best F1=97%) and another for detecting misinformation about COVID-19 (best F1=92%). A human annotation study reveals the utility of our models on a subset of Mega-COV. Our data and models can be useful for studying a wide host of phenomena related to the pandemic. Mega-COV and our models are publicly available.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

07/06/2021

The Healthy States of America: Creating a Health Taxonomy with Social Media

Sanja Šćepanović, Luca Maria Aiello, Ke Zhou and
Sagar Joglekar, Daniele Quercia

Keywords Paper

Qualitative and quantitative studies of social media, Credibility of online content, Measuring predictability of real world phenomena based on social media, e.g., spanning politics, finance, and health

0

0

0

0

8:00

08/12/2020

COVID-19 Twitter Monitor: Aggregating and Visualizing COVID-19 Related Trends in Social Media

Joseph Cornelius, Tilia Ellendorff, Lenz Furrer, Fabio Rinaldi

Keywords Paper

0

0

0

0

9:50

19/10/2020

ReCOVery: A multimodal repository for COVID-19 news credibility research

Xinyi Zhou, Apurva Mulay, Emilio Ferrara, Reza Zafarani

Keywords Paper

coronavirus, covid-19, fake news, infodemic, information credibility, multimodal, repository, pandemic, social media

0

0

0

0

9:59

08/12/2020

Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking

Rutvik Vijjali, Prathyush Potluri, Siddharth Kumar, Sundeep Teki

Keywords Paper

0

0

0

0

14:59

07/06/2020

Variation across Scales: Measurement Fidelity under Twitter Data Sampling

Siqi Wu, Marian-Andrei Rizoiu, Lexing Xie

Keywords Paper

attention, bias, cascades, changes, collection, graphs, influences, measures, networks, rates, structure, terms, tweets, twitter

0

0

0

0

9:59

19/10/2020

MiNet: Mixed interest network for cross-domain click-through rate prediction

Wentao Ouyang, Xiuwu Zhang, Lei Zhao and
Jinmei Luo, Yu Zhang, Heng Zou, Zhaojie Liu, Yanlong Du

Keywords Paper

cross-domain click-through rate prediction, click-through rate prediction, computational advertising, online advertising, deep learning

0

0

0

0

9:43

07/06/2021

Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms

Firoj Alam, Fahim Dalvi, Shaden Shaar and
Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov

Keywords Paper

Credibility of online content, Text categorization, topic recognition, demographic/gender/age identification

0

0

0

0

3:09

19/10/2020

CovidExplorer: A multi-faceted AI-based search and visualization engine for COVID-19 information

Heer Ambavi, Kavita Vaishnaw, Udit Vyas and
Abhisht Tiwari, Mayank Singh

Keywords Paper

visualization, search, social media, coronaviruses, covid-19

0

0

0

0

4:59

07/06/2021

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Firoj Alam, Umair Qazi, Muhammad Imran, Ferda Ofli

Keywords Paper

Qualitative and quantitative studies of social media, Text categorization, topic recognition, demographic/gender/age identification, Measuring predictability of real world phenomena based on social media, e.g., spanning politics, finance, and health

0

0

0

0

3:11

07/06/2020

Analysing the Extent of Misinformation in Cancer Related Tweets

Rakesh Bal, Sayan Sinha, Swastika Dutta and
Risabh Joshi, Sayan Ghosh, Ritam Dutt

Keywords Paper

cancer, claims, deep learning, detection, learning, linguistic, misinformation, spread, texts, tweets, twitter

0

0

0

0

3:03

08/12/2020

Ensemble BERT for Classifying Medication-mentioning Tweets

Huong Dang, Kahyun Lee, Sam Henry, Özlem Uzuner

Keywords Paper

0

0

0

0

9:44

07/06/2021

COVID-19 Coverage By Cable and Broadcast Networks

Ceren Budak, Ashley Muddiman, Yujin Kim and
Caroline C. Murray, Natalie J. Stroud

Keywords Paper

Analysis of the relationship between social media and mainstream media, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social media behavior

0

0

0

0

2:32

04/07/2020

Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup

Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea

Keywords Paper

Cross-Lingual Classification, Distinguishing messages, disaster management, multi-label tweets

0

0

0

0

12:49

07/06/2021

Misinformation Adoption or Rejection in the Era of COVID-19

Maxwell Weinzierl, Suellen Hopfer, Sanda M. Harabagiu

Keywords Paper

Qualitative and quantitative studies of social media, Credibility of online content, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social media behavior, Organizational and group be

0

0

0

0

7:51

08/12/2020

Multilingual Emoticon Prediction of Tweets about COVID-19

Stefanos Stoikos, Mike Izbicki

Keywords Paper

0

0

0

0

6:19

07/06/2020

A Quantitative Approach to Understanding Online Antisemitism

Savvas Zannettou, Joel Finkelstein, Barry Bradlyn, Jeremy Blackburn

Keywords Paper

4chan, cases, changes, communities, elections, embeddings, events, groups, images, influences, large_scale, large_scale quantitative, measures, memes, political, politically incorrect, presidential election, rates, reddit, rest, spread, tools, traditional, trends, twitter, twitter reddit

0

0

0

0

9:06

02/02/2021

Twitter Event Summarization by Exploiting Semantic Terms and Graph Network

Quanzhi Li, Qiong Zhang

Keywords Paper

0

0

0

0

15:58

04/07/2020

Prta: A System to Support the Analysis of Propaganda Techniques in the News

Giovanni Da San Martino, Shaden Shaar, Yifan Zhang and
Seunghak Yu, Alberto Barrón-Cedeño, Preslav Nakov

Keywords Paper

online disinformation, fact-checking detection, disinformation detection, media thinking

0

0

0

0

11:46

29/06/2020

Need for tweet: How open source developers talk about their GitHub work on twitter

Hongbo Fang, Daniel Klug, Hemank Lamba and
James Herbsleb, Bogdan Vasilescu

Keywords Paper

0

0

0

0

5:06

08/12/2020

Team Oulu at SemEval-2020 Task 12: Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets

Md Saroar Jahan

Keywords Paper

0

0

0

0

10:36

19/10/2020

A large test collection for entity aspect linking

Jordan Ramsdell, Laura Dietz

Keywords Paper

dataset, reference method, entity aspect linking

0

0

0

0

9:03

16/11/2020

VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

Mingzhe Li, Xiuying Chen, Shen Gao and
Zhangming Chan, Dongyan Zhao, Rui Yan

Keywords Paper

video-based summarization, human evaluations, vmsmo, dual-interaction-based summarizer

0

0

0

0

12:22

07/06/2021

Automatic Discovery of Political Meme Genres with Diverse Appearances

William Theisen, Joel Brogan, Pamela Bilo Thomas and
Daniel Moreira, Pascal Phoa, Tim Weninger, Walter Scheirer

Keywords Paper

Studies of digital humanities (culture, history, arts) using social media, Qualitative and quantitative studies of social media, Trend identification and tracking, time series forecasting, Organizational and group behavior mediated by social media, interpe

0

0

0

0

7:54

02/02/2021

Segmentation of Tweets with URLs and its Applications to Sentiment Analysis

Abdullah Aljebreen, Weiyi Meng, Eduard Dragut

Keywords Paper

0

0

0

0

15:57

29/06/2020

20-MAD: 20 years of issues and commits of mozilla and apache development

Maëlick Claes, Mika V. Mäntylä

Keywords Paper

0

0

0

0

5:00

07/06/2021

VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter

Anton Abilov, Yiqing Hua, Hana Matatov and
Ofra Amir, Mor Naaman

Keywords Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery

0

0

0

0

2:52

23/07/2020

Extracting medical entities from social media

Sanja Scepanovic, Enrique Martin-Lopez, Daniele Quercia, Khan Baykaner

Keywords Paper

Applied computing, Life and medical sciences, Health informatics, Computing methodologies, Artificial intelligence, Natural language processing

0

0

0

0

6:12

04/07/2020

Fine-grained Interest Matching for Neural News Recommendation

Heyuan Wang, Fangzhao Wu, Zheng Liu, Xing Xie

Keywords Paper

Fine-grained Matching, Neural Recommendation, Personalized recommendation, news recommendation

0

0

0

0

10:06

08/12/2020

Incorporating Count-Based Features into Pre-Trained Models for Improved Stance Detection

Anushka Prakash, Harish Tayyar Madabushi

Keywords Paper

0

0

0

0

12:31

16/11/2020

Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings

Yue Wang, Jing Li, Michael Lyu, Irwin King

Keywords Paper

keyphrase prediction, text modeling, classification, generation

0

0

0

0

11:55

07/06/2020

An Experimental Study of Structural Diversity in Social Networks

Jessica Su, Krishna Kamath, Aneesh Sharma and
Johan Ugander, Sharad Goel

Keywords Paper

cases, causal, changes, common, engagement, groups, large_scale, networks, rates, relationships, retention rates, twitter

0

0

0

0

8:44

07/06/2021

Classifying Reasonability in Retellings of Personal Events Shared on Social Media: A Preliminary Case Study with /r/AmITheAsshole

Ethan Haworth, Ted Grover, Justin Langston and
Ankush Patel, Joseph West, Alex C. Williams

Keywords Paper

Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social media behavior, Trend identification and tracking, time series forecasting, Measuring predictability of real world phenomena bas

0

0

0

0

5:11

06/12/2021

VigDet: Knowledge Informed Neural Temporal Point Process for Coordination Detection on Social Media

Yizhou Zhang, Karishma Sharma, Yan Liu

Keywords Paper

generative model

0

0

0

0

14:35

19/10/2020

Relevance ranking for real-time tweet search

Yan Xia, Yu Sun, Tian Wang and
Juan Caicedo Carvajal, Jinliang Fan, Bhargav Mangipudi, Lisa Huang, Yatharth Saraf

Keywords Paper

tweet search, social network, large-scale ml system

0

0

0

0

9:18

06/07/2020

On the limits of cross-domain generalization in automated X-ray prediction

Joseph Paul Cohen, Mohammad Hashir, Rupert Brooks, Hadrien Bertrand

Keywords Paper

0

0

0

0

4:59

07/06/2020

Mining Archive.org’s Twitter Stream Grab for Pharmacovigilance Research Gold

Ramya Tekumalla, Javad Rafiei Asl, Juan M. Banda

Keywords Paper

building, learning, trends, tweets, twitter

0

0

0

0

3:07

07/06/2021

ABOME: A Multi-platform Data Repository of Artificially Boosted Online Media Entities

Hridoy Sankar Dutta, Udit Arora, Tanmoy Chakraborty

Keywords Paper

New social media applications, interfaces, interaction techniques

0

0

0

0

2:54

14/09/2020

FireAnt: Claim-based Medical Misinformation Detection and Monitoring

Branislav Pecher, Ivan Srba, Robert Moro and
Matus Tomlein, Maria Bielikova

Keywords Paper

medical misinformation, claim-based detection, fireant

0

0

0

0

11:53

04/07/2020

Code and Named Entity Recognition in StackOverflow

Jeniya Tabassum, Mounica Maddela, Wei Xu, Alan Ritter

Keywords Paper

Named Recognition, computer domain, StackOverflow, NLP techniques

0

0

0

0

11:14

19/08/2021

A Survey on Universal Adversarial Attack

Chaoning Zhang, Philipp Benz, Chenguo Lin and
Adil Karjauv, Jing Wu, In So Kweon

Keywords Paper

Machine learning, General, General

0

0

0

0

10:29