ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Ancient Document Layout Analysis: Autoencoders Meet Sparse Coding

Homa Davoudi, Marco Fiorucci, Arianna Traviglia

Auto-TLDR; Unsupervised Unsupervised Representation Learning for Document Layout Analysis

Abstract Slides Poster

Layout analysis of historical handwritten documents is a key pre-processing step in document image analysis that, by segmenting the image into its homogeneous regions, facilitates subsequent procedures such as optical character recognition and automatic transcription. Learning-based approaches have shown promising performances in layout analysis, however, the majority of them requires tedious pixel-wise labelled training data to achieve generalisation capabilities, this limitation preventing their application due to the lack of large labelled datasets. This paper proposes a novel unsupervised representation learning method for documents’ layout analysis that reduces the need for labelled data: a sparse autoencoder is first trained in an unsupervised manner on a historical text document’s image; representation of image patches, computed by the sparse encoder, is then used to classify pixels into various region categories of the document using a feed-forward neural network. A new training method, inspired by the ISTA algorithm, is also introduced here to train the sparse encoder. Experimental results on DIVA-HisDB dataset demonstrate that the proposed method outperforms previous approaches based on unsupervised representation learning while achieving performances comparable to the state-of-the-art fully supervised methods.

Similar papers

Unsupervised deep learning for text line segmentation

Berat Kurar Barakat, Ahmad Droby, Reem Alaasam, Borak Madi, Irina Rabaev, Raed Shammes, Jihad El-Sana

Auto-TLDR; Unsupervised Deep Learning for Handwritten Text Line Segmentation without Annotation

Abstract Poster Similar

We present an unsupervised deep learning method for text line segmentation that is inspired by the relative variance between text lines and spaces among text lines. Handwritten text line segmentation is important for the efficiency of further processing. A common method is to train a deep learning network for embedding the document image into an image of blob lines that are tracing the text lines. Previous methods learned such embedding in a supervised manner, requiring the annotation of many document images. This paper presents an unsupervised embedding of document image patches without a need for annotations. The number of foreground pixels over the text lines is relatively different from the number of foreground pixels over the spaces among text lines. Generating similar and different pairs relying on this principle definitely leads to outliers. However, as the results show, the outliers do not harm the convergence and the network learns to discriminate the text lines from the spaces between text lines. Remarkably, with a challenging Arabic handwritten text line segmentation dataset, VML-AHTE, we achieved superior performance over the supervised methods. Additionally, the proposed method was evaluated on the ICDAR 2017 and ICFHR 2010 handwritten text line segmentation datasets.

Text Baseline Recognition Using a Recurrent Convolutional Neural Network

Matthias Wödlinger, Robert Sablatnig

Auto-TLDR; Automatic Baseline Detection of Handwritten Text Using Recurrent Convolutional Neural Network

Ancient Document Layout Analysis: Autoencoders Meet Sparse Coding

Similar papers

Unsupervised deep learning for text line segmentation

Text Baseline Recognition Using a Recurrent Convolutional Neural Network

Writer Identification Using Deep Neural Networks: Impact of Patch Size and Number of Patches

The HisClima Database: Historical Weather Logs for Automatic Transcription and Information Extraction

Multiple Document Datasets Pre-Training Improves Text Line Detection with Deep Neural Networks

Watch Your Strokes: Improving Handwritten Text Recognition with Deformable Convolutions

Vision-Based Layout Detection from Scientific Literature Using Recurrent Convolutional Neural Networks

Multimodal Side-Tuning for Document Classification

UDBNET: Unsupervised Document Binarization Network Via Adversarial Game

Combining Deep and Ad-Hoc Solutions to Localize Text Lines in Ancient Arabic Document Images

Textual-Content Based Classification of Bundles of Untranscribed of Manuscript Images

ConvMath : A Convolutional Sequence Network for Mathematical Expression Recognition

Learning to Sort Handwritten Text Lines in Reading Order through Estimated Binary Order Relations

Deep Convolutional Embedding for Digitized Painting Clustering

Image Representation Learning by Transformation Regression

Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks

LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese Text Line Recognition

A Few-Shot Learning Approach for Historical Ciphered Manuscript Recognition

Multi-Task Learning Based Traditional Mongolian Words Recognition

A Gated and Bifurcated Stacked U-Net Module for Document Image Dewarping

Variational Capsule Encoder

Recursive Recognition of Offline Handwritten Mathematical Expressions

An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers

Feature-Aware Unsupervised Learning with Joint Variational Attention and Automatic Clustering

Variational Deep Embedding Clustering by Augmented Mutual Information Maximization

Learning Sparse Deep Neural Networks Using Efficient Structured Projections on Convex Constraints for Green AI

Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Explainable Feature Embedding Using Convolutional Neural Networks for Pathological Image Analysis

PICK: Processing Key Information Extraction from Documents Using Improved Graph Learning-Convolutional Networks

Online Trajectory Recovery from Offline Handwritten Japanese Kanji Characters of Multiple Strokes

A Joint Representation Learning and Feature Modeling Approach for One-Class Recognition

Improving Word Recognition Using Multiple Hypotheses and Deep Embeddings

Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents

Automated Whiteboard Lecture Video Summarization by Content Region Detection and Representation

Combining GANs and AutoEncoders for Efficient Anomaly Detection

Documents Counterfeit Detection through a Deep Learning Approach

End-To-End Hierarchical Relation Extraction for Generic Form Understanding

N2D: (Not Too) Deep Clustering Via Clustering the Local Manifold of an Autoencoded Embedding

Approach for Document Detection by Contours and Contrasts

Variational Information Bottleneck Model for Accurate Indoor Position Recognition

Handwritten Digit String Recognition Using Deep Autoencoder Based Segmentation and ResNet Based Recognition Approach

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

Recognizing Bengali Word Images - A Zero-Shot Learning Perspective

Generative Latent Implicit Conditional Optimization When Learning from Small Sample

On-Device Text Image Super Resolution

Deep Superpixel Cut for Unsupervised Image Segmentation

Phase Retrieval Using Conditional Generative Adversarial Networks