ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks

Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, Rong Xiao

Auto-TLDR; PICK: A Graph Learning Framework for Key Information Extraction from Documents

Abstract Slides Poster

Computer vision with state-of-the-art deep learning models have achieved huge success in the field of Optical Character Recognition (OCR) including text detection and recognition tasks recently. However, Key Information Extraction (KIE) from documents as the downstream task of OCR, having a large number of use scenarios in real-world, remains a challenge because documents not only have textual features extracting from OCR systems but also have semantic visual features that are not fully exploited and play a critical role in KIE. Too little work has been devoted to efficiently make full use of both textual and visual features of the documents. In this paper, we introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity. Extensive experiments on real-world datasets have been conducted to show that our method outperforms baselines methods by significant margins.

Similar papers

Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents

Manuel Carbonell, Pau Riba, Mauricio Villegas, Alicia Fornés, Josep Llados

Auto-TLDR; Graph Neural Network for Entity Recognition and Relation Extraction in Semi-Structured Documents

Abstract Slides Similar

The use of administrative documents to communicate and leave record of business information requires of methods able to automatically extract and understand the content from such documents in a robust and efficient way. In addition, the semi-structured nature of these reports is specially suited for the use of graph-based representations which are flexible enough to adapt to the deformations from the different document templates. Moreover, Graph Neural Networks provide the proper methodology to learn relations among the data elements in these documents. In this work we study the use of Graph Neural Network architectures to tackle the problem of entity recognition and relation extraction in semi-structured documents. Our approach achieves state of the art results on the three tasks involved in the process. Moreover, the experimentation with two datasets of different nature demonstrates the good generalization ability of our approach.

GCNs-Based Context-Aware Short Text Similarity Model

Xiaoqi Sun

Auto-TLDR; Context-Aware Graph Convolutional Network for Text Similarity

PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks

Similar papers

Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents

GCNs-Based Context-Aware Short Text Similarity Model

End-To-End Hierarchical Relation Extraction for Generic Form Understanding

Reinforcement Learning with Dual Attention Guided Graph Convolution for Relation Extraction

Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network

Label Incorporated Graph Neural Networks for Text Classification

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

A Multi-Head Self-Relation Network for Scene Text Recognition

Multi-Modal Contextual Graph Neural Network for Text Visual Question Answering

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding

Segmenting Messy Text: Detecting Boundaries in Text Derived from Historical Newspaper Images

Multimodal Side-Tuning for Document Classification

Global Context-Based Network with Transformer for Image2latex

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning

CKG: Dynamic Representation Based on Context and Knowledge Graph

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

Edge-Aware Graph Attention Network for Ratio of Edge-User Estimation in Mobile Networks

ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Multi-Task Learning Based Traditional Mongolian Words Recognition

KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding

Automatic Student Network Search for Knowledge Distillation

Learning Neural Textual Representations for Citation Recommendation

Sketch-SNet: Deeper Subdivision of Temporal Cues for Sketch Recognition

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Boundary-Aware Graph Convolution for Semantic Segmentation

Equation Attention Relationship Network (EARN) : A Geometric Deep Metric Framework for Learning Similar Math Expression Embedding

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Region and Relations Based Multi Attention Network for Graph Classification

Transformer Reasoning Network for Image-Text Matching and Retrieval

LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese Text Line Recognition

MEAN: A Multi-Element Attention Based Network for Scene Text Recognition

On the Global Self-attention Mechanism for Graph Convolutional Networks

Text Synopsis Generation for Egocentric Videos

Context Visual Information-Based Deliberation Network for Video Captioning

Attentive Visual Semantic Specialized Network for Video Captioning

Improving Word Recognition Using Multiple Hypotheses and Deep Embeddings

Efficient Sentence Embedding Via Semantic Subspace Analysis

Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning

Adversarial Training for Aspect-Based Sentiment Analysis with BERT

Object Detection Using Dual Graph Network

What Nodes Vote To? Graph Classification without Readout Phase

Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

Enriching Video Captions with Contextual Text

Privacy Attributes-Aware Message Passing Neural Network for Visual Privacy Attributes Classification

A Novel Attention-Based Aggregation Function to Combine Vision and Language

2D License Plate Recognition based on Automatic Perspective Rectification

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification