VisualMRC: Machine Reading Comprehension on Document Images

Abstract: Recent studies on machine reading comprehension have focused on text-level understanding but have not yet reached the level of human understanding of the visual layout and content of real-world documents. In this study, we introduce a new visual machine reading comprehension dataset, named VisualMRC, wherein given a question and a document image, a machine reads and comprehends texts in the image to answer the question in natural language. Compared with existing visual question answering datasets that contain texts in images, VisualMRC focuses more on developing natural language understanding and generation abilities. It contains 30,000+ pairs of a question and an abstractive answer for 10,000+ document images sourced from multiple domains of webpages. We also introduce a new model that extends existing sequence-to-sequence models, pre-trained with large-scale text corpora, to take into account the visual layout and content of documents. Experiments with VisualMRC show that this model outperformed the base sequence-to-sequence models and a state-of-the-art VQA model. However, its performance is still below that of humans on most automatic evaluation metrics. The dataset will facilitate research aimed at connecting vision and language understanding.

08/12/2020

fine-grained recognition, weakly-supervised recognition, fine-grained retrieval, unsupervised recognition, image-to-text retrieval, text-to-image retrieval, image classification

8:53

26/04/2020

VisualMRC: Machine Reading Comprehension on Document Images

Ryota Tanaka, Kyosuke Nishida, Sen Yoshida

Comments

Similar Papers

Context Dependent Semantic Parsing: A Survey

Zhuang Li, Lizhen Qu, Gholamreza Haffari

Keywords Abstract Paper

Retrospective Reader for Machine Reading Comprehension

Zhuosheng Zhang, Junjie Yang, Hai Zhao

Keywords Abstract Paper

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

Keywords Abstract Paper

fine-grained recognition, weakly-supervised recognition, fine-grained retrieval, unsupervised recognition, image-to-text retrieval, text-to-image retrieval, image classification

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng

Keywords Abstract Paper

reading comprehension, logical reasoning, natural language processing

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Abstract Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Arman Cohan, Sergey Feldman, Iz Beltagy and Doug Downey, Daniel Weld

Keywords Abstract Paper

Document-level Learning, Representation learning, natural systems, classification

Incremental Neural Lexical Coherence Modeling

Sungho Jeon, Michael Strube

Keywords Abstract Paper

Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Peng Shi, Patrick Ng, Zhiguo Wang and Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

Keywords Abstract Paper

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Keywords Abstract Paper

Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?

Cansu Sen, Thomas Hartvigsen, Biao Yin and Xiangnan Kong, Elke Rundensteiner

Keywords Abstract Paper

Text Classification, quantitative mechanisms, text task, large-scale study

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Abstract Paper

Investigating reading behavior in fine-grained relevance judgment

Zhijing Wu, Jiaxin Mao, Yiqun Liu and Min Zhang, Shaoping Ma

Keywords Abstract Paper

eye-tracking, relevance judgment, passage-level cumulative gain

Benchmarking machine reading comprehension: A psychological perspective

Saku Sugawara, Pontus Stenetorp, Akiko Aizawa

Keywords Abstract Paper

Multimodal Transformer for Multimodal Machine Translation

Shaowei Yao, Xiaojun Wan

Keywords Abstract Paper

Multimodal MMT, Multimodal, MMT, representation images

Shaping Visual Representations with Language for Few-Shot Classification

Jesse Mu, Percy Liang, Noah Goodman

Keywords Abstract Paper

Few-Shot Classification, human learning, supervision, machine models

Attending to inter-sentential features in neural text classification

Billy Chiu, Sunil Kumar Sahu, Neha Sengupta and Derek Thomas, Mohammady Mahdy

Keywords Abstract Paper

graph network, hybrid neural network, attention mechanism

Inquisitive Question Generation for High Level Text Comprehension

Wei-Jen Ko, Te-yuan Chen, Yiyan Huang and Greg Durrett, Junyi Jessy Li

Keywords Abstract Paper

inquisitive questions, automatic systems, text comprehension, data-driven approaches

When Computational Representation Meets Neuroscience: A Survey on Brain Encoding and Decoding

Lu Cao, Dandan Huang, Yue Zhang

Keywords Abstract Paper

Humans and AI, General, General

Query by Strings and Return Ranking Word Regions with Only One Look

Peng Zhao, Wenyuan Xue, Qingyong Li, Siqi Cai

Keywords Abstract Paper

You Don't Have Time to Read This: An Exploration of Document Reading Time Prediction

Orion Weller, Jordan Hildebrandt, Ilya Reznik and Christopher Challis, E. Shannon Tass, Quinn Snell, Kevin Seppi

Keywords Abstract Paper

Exploration Prediction, Predicting time, human processing, machine methods

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation

Junjie Hu, Sebastian Ruder, Aditya Siddhant and Graham Neubig, Orhan Firat, Melvin Johnson

Keywords Abstract Paper

Applications - Language, Speech and Dialog

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Arman Cohan, Sergey Feldman, Iz Beltagy and
Doug Downey, Daniel Weld

Keywords Paper

Keywords Paper

Peng Shi, Patrick Ng, Zhiguo Wang and
Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

Keywords Paper

Keywords Paper

Cansu Sen, Thomas Hartvigsen, Biao Yin and
Xiangnan Kong, Elke Rundensteiner

Keywords Paper

Keywords Paper

Zhijing Wu, Jiaxin Mao, Yiqun Liu and
Min Zhang, Shaoping Ma

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Billy Chiu, Sunil Kumar Sahu, Neha Sengupta and
Derek Thomas, Mohammady Mahdy

Keywords Paper

Wei-Jen Ko, Te-yuan Chen, Yiyan Huang and
Greg Durrett, Junyi Jessy Li

Keywords Paper

Keywords Paper

Keywords Paper

Orion Weller, Jordan Hildebrandt, Ilya Reznik and
Christopher Challis, E. Shannon Tass, Quinn Snell, Kevin Seppi

Keywords Paper

Junjie Hu, Sebastian Ruder, Aditya Siddhant and
Graham Neubig, Orhan Firat, Melvin Johnson

Keywords Paper

Keywords Paper

Jesse Dunietz, Greg Burnham, Akash Bharadwaj and
Owen Rambow, Jennifer Chu-Carroll, Dave Ferrucci

Keywords Paper

Bo Zheng, Haoyang Wen, Yaobo Liang and
Nan Duan, Wanxiang Che, Daxin Jiang, Ming Zhou, Ting Liu

Keywords Paper

Keywords Paper

Nico Herbig, Santanu Pal, Tim Düwel and
Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Krüger, Josef van Genabith

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hongyu Gong, Yelong Shen, Dian Yu and
Jianshu Chen, Dong Yu

Keywords Paper

Keywords Paper

Wanjun Zhong, Duyu Tang, Zenan Xu and
Ruize Wang, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yiheng Xu, Minghao Li, Lei Cui and
Shaohan Huang, Furu Wei, Ming Zhou

Keywords Paper