Umapada Pal

Papers from this author

Chebyshev-Harmonic-Fourier-Moments and Deep CNNs for Detecting Forged Handwriting

Lokesh Nandanwar, Shivakumara Palaiahnakote, Kundu Sayani, Umapada Pal, Tong Lu, Daniel Lopresti

Responsive image

Auto-TLDR; Chebyshev-Harmonic-Fourier-Moments and Deep Convolutional Neural Networks for forged handwriting detection

Slides Poster Similar

Recently developed sophisticated image processing techniques and tools have made easier the creation of high-quality forgeries of handwritten documents including financial and property records. To detect such forgeries of handwritten documents, this paper presents a new method by exploring the combination of Chebyshev-Harmonic-Fourier-Moments (CHFM) and deep Convolutional Neural Networks (D-CNNs). Unlike existing methods work based on abrupt changes due to distortion created by forgery operation, the proposed method works based on inconsistencies and irregular changes created by forgery operations. Inspired by the special properties of CHFM, such as its reconstruction ability by removing redundant information, the proposed method explores CHFM to obtain reconstructed images for the color components of the Original, Forged Noisy and Blurred classes. Motivated by the strong discriminative power of deep CNNs, for the reconstructed images of respective color components, the proposed method used deep CNNs for forged handwriting detection. Experimental results on our dataset and benchmark datasets (namely, ACPR 2019, ICPR 2018 FCD and IMEI datasets) show that the proposed method outperforms existing methods in terms of classification rate.

Inception Based Deep Learning Architecture for Tuberculosis Screening of Chest X-Rays

Dipayan Das, K.C. Santosh, Umapada Pal

Responsive image

Auto-TLDR; End to End CNN-based Chest X-ray Screening for Tuberculosis positive patients in the severely resource constrained regions of the world

Slides Poster Similar

The motivation for this work is the primary need of screening Tuberculosis (TB) positive patients in the severely resource constrained regions of the world. Chest X-ray (CXR) is considered to be a promising indicator for the onset of TB, but the lack of skilled radiologists in such regions degrades the situation. Therefore, several computer aided diagnosis (CAD) systems have been proposed to solve the decision making problem, which includes hand engineered feature extraction methods to deep learning or Convolutional Neural Network (CNN) based methods. Feature extraction, being a time and resource intensive process, often delays the process of mass screening. Hence an end to end CNN architecture is proposed in this work to solve the problem. Two benchmark CXR datasets have been used in this work, collected from Shenzhen (China) and Montgomery County (USA), on which the proposed methodology achieved a maximum abnormality detection accuracy (ACC) of 91.7\% (0.96 AUC) and 87.47\% (0.92 AUC) respectively. To the greatest of our knowledge, the obtained results are marginally superior to the state of the art results that have solely used deep learning methodologies on the aforementioned datasets.

Local Gradient Difference Based Mass Features for Classification of 2D-3D Natural Scene Text Images

Lokesh Nandanwar, Shivakumara Palaiahnakote, Raghavendra Ramachandra, Tong Lu, Umapada Pal, Daniel Lopresti, Nor Badrul Anuar

Responsive image

Auto-TLDR; Classification of 2D and 3D Natural Scene Images Using COLD

Slides Poster Similar

Methods developed for normal 2D text detection do not work well for a text that is rendered using decorative, 3D effects. This paper proposes a new method for classification of 2D and 3D natural scene images such that an appropriate method can be chosen or modified according to the complexity of the individual classes. The proposed method explores local gradient differences for obtaining candidate pixels, which represent a stroke. To study the spatial distribution of candidate pixels, we propose a measure we call COLD, which is denser for pixels toward the center of strokes and scattered for non-stroke pixels. This observation leads us to introduce mass features for extracting the regular spatial pattern of COLD, which indicates a 2D text image. The extracted features are fed to a Neural Network (NN) for classification. The proposed method is tested on both a new dataset introduced in this work and a standard dataset assembled from different natural scene datasets, and compared to from existing methods to show its effectiveness. The approach improves text detection performance significantly after classification.

Modeling Extent-Of-Texture Information for Ground Terrain Recognition

Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

Responsive image

Auto-TLDR; Extent-of-Texture Guided Inter-domain Message Passing for Ground Terrain Recognition

Slides Poster Similar

Ground Terrain Recognition is a difficult task as the context information varies significantly over the regions of a ground terrain image. In this paper, we propose a novel approach towards ground-terrain recognition via modeling the Extent-of-Texture information to establish a balance between the order-less texture component and ordered-spatial information locally. At first, the proposed method uses a CNN backbone feature extractor network to capture meaningful information of a ground terrain image, and model the extent of texture and shape information locally. Then, the order-less texture information and ordered shape information are encoded in a patch-wise manner, which is utilized by intra-domain message passing module to make every patch aware of each other for rich feature learning. Next, the Extent-of-Texture (EoT) Guided Inter-domain Message Passing module combines the extent of texture and shape information with the encoded texture and shape information in a patch-wise fashion for sharing knowledge to balance out the order-less texture information with ordered shape information. Further, Bilinear model generates a pairwise correlation between the order-less texture information and ordered shape information. Finally, the ground-terrain image classification is performed by a fully connected layer. The experimental results indicate superior performance of the proposed model over existing state-of-the-art techniques on publicly available datasets like DTD, MINC and GTOS-mobile.

Recognizing Bengali Word Images - A Zero-Shot Learning Perspective

Sukalpa Chanda, Daniël Arjen Willem Haitink, Prashant Kumar Prasad, Jochem Baas, Umapada Pal, Lambert Schomaker

Responsive image

Auto-TLDR; Zero-Shot Learning for Word Recognition in Bengali Script

Slides Poster Similar

Zero-Shot Learning(ZSL) techniques could classify a completely unseen class, which it has never seen before during training. Thus, making it more apt for any real-life classification problem, where it is not possible to train a system with annotated data for all possible class types. This work investigates recognition of word images written in Bengali Script in a ZSL framework. The proposed approach performs Zero-Shot word recognition by coupling deep learned features procured from VGG16 architecture along with 13 basic shapes/stroke primitives commonly observed in Bengali script characters. As per the notion of ZSL framework those 13 basic shapes are termed as “Signature Attributes”. The obtained results are promising while evaluation was carried out in a Five-Fold cross-validation setup dealing with samples from 250 word classes.

UDBNET: Unsupervised Document Binarization Network Via Adversarial Game

Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

Responsive image

Auto-TLDR; Three-player Min-max Adversarial Game for Unsupervised Document Binarization

Slides Poster Similar

Degraded document image binarization is one of the most challenging tasks in the domain of document image analysis. In this paper, we present a novel approach towards document image binarization by introducing three-player min-max adversarial game. We train the network in an unsupervised setup by assuming that we do not have any paired-training data. In our approach, an Adversarial Texture Augmentation Network (ATANet) first superimposes the texture of a degraded reference image over a clean image. Later, the clean image along with its generated degraded version constitute the pseudo paired-data which is used to train the Unsupervised Document Binarization Network (UDBNet). Following this approach, we have enlarged the document binarization datasets as it generates multiple images having same content feature but different textual feature. These generated noisy images are then fed into the UDBNet to get back the clean version. The joint discriminator which is the third-player of our three-player min-max adversarial game tries to couple both the ATANet and UDBNet. The three-player min-max adversarial game stops, when the distributions modelled by the ATANet and the UDBNet align to the same joint distribution over time. Thus, the joint discriminator enforces the UDBNet to perform better on real degraded image. The experimental results indicate the superior performance of the proposed model over existing state-of-the-art algorithm on widely used DIBCO datasets. The source code of the proposed system is publicly available at https://github.com/VIROBO-15/UDBNET.