ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Semantic Bilinear Pooling for Fine-Grained Recognition

Xinjie Li, Chun Yang, Song-Lu Chen, Chao Zhu, Xu-Cheng Yin

Auto-TLDR; Semantic bilinear pooling for fine-grained recognition with hierarchical label tree

Abstract Slides Poster

Naturally, fine-grained recognition, e.g., vehicle identification or bird classification, has specific hierarchical labels, where fine categories are always harder to be classified than coarse categories. However, most of the recent deep learning based methods neglect the semantic structure of fine-grained objects and do not take advantage of the traditional fine-grained recognition techniques (e.g. coarse-to-fine classification). In this paper, we propose a novel framework with a two-branch network (coarse branch and fine branch), i.e., semantic bilinear pooling, for fine-grained recognition with a hierarchical label tree. This framework can adaptively learn the semantic information from the hierarchical levels. Specifically, we design a generalized cross-entropy loss for the training of the proposed framework to fully exploit the semantic priors via considering the relevance between adjacent levels and enlarge the distance between samples of different coarse classes. Furthermore, our method leverages only the fine branch when testing so that it adds no overhead to the testing time. Experimental results show that our proposed method achieves state-of-the-art performance on four public datasets.

Similar papers

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Qingtao Wang, Ke Zhang, Shaoli Huang, Lianbo Zhang, Jin Fan

Auto-TLDR; Multi-Order Feature Statistical Method for Fine-Grained Visual Categorization

Semantic Bilinear Pooling for Fine-Grained Recognition

Similar papers

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Exploiting Knowledge Embedded Soft Labels for Image Recognition

Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Generalized Local Attention Pooling for Deep Metric Learning

Second-Order Attention Guided Convolutional Activations for Visual Recognition

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

Local Attention and Global Representation Collaborating for Fine-Grained Classification

Recurrent Deep Attention Network for Person Re-Identification

Attention Pyramid Module for Scene Recognition

Learning from Web Data: Improving Crowd Counting Via Semi-Supervised Learning

Multi-Label Contrastive Focal Loss for Pedestrian Attribute Recognition

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

Cc-Loss: Channel Correlation Loss for Image Classification

Boundary-Aware Graph Convolution for Semantic Segmentation

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

MFST: Multi-Features Siamese Tracker

Multi-Scale Residual Pyramid Attention Network for Monocular Depth Estimation

Adaptive L2 Regularization in Person Re-Identification

TinyVIRAT: Low-Resolution Video Action Recognition

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

Sketch-SNet: Deeper Subdivision of Temporal Cues for Sketch Recognition

Multi-Direction Convolution for Semantic Segmentation

Context-Aware Residual Module for Image Classification

Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions

Norm Loss: An Efficient yet Effective Regularization Method for Deep Neural Networks

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Dynamic Guided Network for Monocular Depth Estimation

Skin Lesion Classification Using Weakly-Supervised Fine-Grained Method

DARN: Deep Attentive Refinement Network for Liver Tumor Segmentation from 3D CT Volume

Coarse to Fine: Progressive and Multi-Task Learning for Salient Object Detection

Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation

Learnable Higher-Order Representation for Action Recognition

Building Computationally Efficient and Well-Generalizing Person Re-Identification Models with Metric Learning

Prior Knowledge about Attributes: Learning a More Effective Potential Space for Zero-Shot Recognition

Multi-Attribute Learning with Highly Imbalanced Data

Bidirectional Matrix Feature Pyramid Network for Object Detection

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

Attentive Part-Aware Networks for Partial Person Re-Identification

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Attention As Activation

Augmented Bi-Path Network for Few-Shot Learning

Progressive Learning Algorithm for Efficient Person Re-Identification

VGG-Embedded Adaptive Layer-Normalized Crowd Counting Net with Scale-Shuffling Modules

Construction Worker Hardhat-Wearing Detection Based on an Improved BiFPN