ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Qingtao Wang, Ke Zhang, Shaoli Huang, Lianbo Zhang, Jin Fan

Auto-TLDR; Multi-Order Feature Statistical Method for Fine-Grained Visual Categorization

Abstract Slides Poster

Fine-grained visual categorization aims to learn a robust image representation modeling subtle differences from similar categories. Existing methods in this field tackle the problem by designing complex frameworks, which produce high-level features by performing first-order or second-order pooling. Despite the impressive performance achieved by these strategies, the single-order networks only carry linear or non-linear information of the last convolutional layer, neglecting the fact that feature from different orders are mutually complementary. In this paper, we propose a Multi-Order Feature Statistical Method (MOFS), which learns fine-grained features characterizing multiple orders. Specifically, the MOFS consists of two sub-modules: (i) a first-order module modeling both mid-level and high-level features. (ii) a covariance feature statistical module capturing high-order features. By deploying these two sub-modules on the top of existing backbone networks, MOFS simultaneously captures multi-level of discrimative patters including local, global and co-related patters. We evaluate the proposed method on three challenging benchmarks, namely CUB-200-2011, Stanford Cars, and FGVC-Aircraft. Compared with state-of-the-art methods, experiments results exhibit superior performance in recognizing fine-grained objects

Similar papers

Semantic Bilinear Pooling for Fine-Grained Recognition

Xinjie Li, Chun Yang, Song-Lu Chen, Chao Zhu, Xu-Cheng Yin

Auto-TLDR; Semantic bilinear pooling for fine-grained recognition with hierarchical label tree

Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization

Similar papers

Semantic Bilinear Pooling for Fine-Grained Recognition

Exploiting Knowledge Embedded Soft Labels for Image Recognition

Second-Order Attention Guided Convolutional Activations for Visual Recognition

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Dual-Attention Guided Dropblock Module for Weakly Supervised Object Localization

Feature Fusion for Online Mutual Knowledge Distillation

Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval

An Improved Bilinear Pooling Method for Image-Based Action Recognition

Local Attention and Global Representation Collaborating for Fine-Grained Classification

Attention Pyramid Module for Scene Recognition

Attention-Based Selection Strategy for Weakly Supervised Object Localization

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

Learnable Higher-Order Representation for Action Recognition

PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation

FastSal: A Computationally Efficient Network for Visual Saliency Prediction

MFI: Multi-Range Feature Interchange for Video Action Recognition

Generalized Local Attention Pooling for Deep Metric Learning

Skin Lesion Classification Using Weakly-Supervised Fine-Grained Method

TAAN: Task-Aware Attention Network for Few-Shot Classification

Learning a Dynamic High-Resolution Network for Multi-Scale Pedestrian Detection

Context-Aware Residual Module for Image Classification

Zoom-CAM: Generating Fine-Grained Pixel Annotations from Image Labels

A Novel Region of Interest Extraction Layer for Instance Segmentation

Enhanced Feature Pyramid Network for Semantic Segmentation

Multi-Attribute Learning with Highly Imbalanced Data

Bidirectional Matrix Feature Pyramid Network for Object Detection

Global-Local Attention Network for Semantic Segmentation in Aerial Images

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

Attentive Hybrid Feature Based a Two-Step Fusion for Facial Expression Recognition

Utilising Visual Attention Cues for Vehicle Detection and Tracking

Cc-Loss: Channel Correlation Loss for Image Classification

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

Multi-Label Contrastive Focal Loss for Pedestrian Attribute Recognition

Real-Time Semantic Segmentation Via Region and Pixel Context Network

SFPN: Semantic Feature Pyramid Network for Object Detection

Dynamic Guided Network for Monocular Depth Estimation

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification

Boundary-Aware Graph Convolution for Semantic Segmentation

Efficient Online Subclass Knowledge Distillation for Image Classification

Channel Planting for Deep Neural Networks Using Knowledge Distillation

MANet: Multimodal Attention Network Based Point-View Fusion for 3D Shape Recognition

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Prior Knowledge about Attributes: Learning a More Effective Potential Space for Zero-Shot Recognition

VGG-Embedded Adaptive Layer-Normalized Crowd Counting Net with Scale-Shuffling Modules

Object Detection Model Based on Scene-Level Region Proposal Self-Attention

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data

Augmented Bi-Path Network for Few-Shot Learning