ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Weight Estimation from an RGB-D Camera in Top-View Configuration

Marco Mameli, Marina Paolanti, Nicola Conci, Filippo Tessaro, Emanuele Frontoni, Primo Zingaretti

Auto-TLDR; Top-View Weight Estimation using Deep Neural Networks

Abstract Slides Poster

The development of so-called soft-biometrics aims at providing information related to the physical and behavioural characteristics of a person. This paper focuses on bodyweight estimation based on the observation from a top-view RGB-D camera. In fact, the capability to estimate the weight of a person can be of help in many different applications, from health-related scenarios to business intelligence and retail analytics. To deal with this issue, a TVWE (Top-View Weight Estimation) framework is proposed with the aim of predicting the weight. The approach relies on the adoption of Deep Neural Networks (DNNs) that have been trained on depth data. Each network has also been modified in its top section to replace classification with prediction inference. The performance of five state-of-art DNNs has been compared, namely VGG16, ResNet, Inception, DenseNet and Efficient-Net. In addition, a convolutional auto-encoder has also been included for completeness. Considering the limited literature in this domain, the TVWE framework has been evaluated on a new publicly available dataset: “VRAI Weight estimation Dataset”, which also collects, for each subject, labels related to weight, gender, and height. The experimental results have demonstrated that the proposed methods are suitable for this task, bringing different and significant insights for the application of the solution in different domains.

Similar papers

RefiNet: 3D Human Pose Refinement with Depth Maps

Andrea D'Eusanio, Stefano Pini, Guido Borghi, Roberto Vezzani, Rita Cucchiara

Auto-TLDR; RefiNet: A Multi-stage Framework for 3D Human Pose Estimation

Abstract Slides Similar

Human Pose Estimation is a fundamental task for many applications in the Computer Vision community and it has been widely investigated in the 2D domain, i.e. intensity images. Therefore, most of the available methods for this task are mainly based on 2D Convolutional Neural Networks and huge manually-annotated RGB datasets, achieving stunning results. In this paper, we propose RefiNet, a multi-stage framework that regresses an extremely-precise 3D human pose estimation from a given 2D pose and a depth map. The framework consists of three different modules, each one specialized in a particular refinement and data representation, i.e. depth patches, 3D skeleton and point clouds. Moreover, we collect a new dataset, namely Baracca, acquired with RGB, depth and thermal cameras and specifically created for the automotive context. Experimental results confirm the quality of the refinement procedure that largely improves the human pose estimations of off-the-shelf 2D methods.

Surface Material Dataset for Robotics Applications (SMDRA): A Dataset with Friction Coefficient and RGB-D for Surface Segmentation

Donghun Noh, Hyunwoo Nam, Min Sung Ahn, Hosik Chae, Sangjoon Lee, Kyle Gillespie, Dennis Hong

Auto-TLDR; A Surface Material Dataset for Robotics Applications

Weight Estimation from an RGB-D Camera in Top-View Configuration

Similar papers

RefiNet: 3D Human Pose Refinement with Depth Maps

Surface Material Dataset for Robotics Applications (SMDRA): A Dataset with Friction Coefficient and RGB-D for Surface Segmentation

IPT: A Dataset for Identity Preserved Tracking in Closed Domains

A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Enhancing Deep Semantic Segmentation of RGB-D Data with Entangled Forests

Anomaly Detection, Localization and Classification for Railway Inspection

Video Analytics Gait Trend Measurement for Fall Prevention and Health Monitoring

Vision-Based Multi-Modal Framework for Action Recognition

RISEdb: A Novel Indoor Localization Dataset

Two-Level Attention-Based Fusion Learning for RGB-D Face Recognition

A Systematic Investigation on End-To-End Deep Recognition of Grocery Products in the Wild

Rotational Adjoint Methods for Learning-Free 3D Human Pose Estimation from IMU Data

Automatic Semantic Segmentation of Structural Elements related to the Spinal Cord in the Lumbar Region by Using Convolutional Neural Networks

Benchmarking Cameras for OpenVSLAM Indoors

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Early Wildfire Smoke Detection in Videos

Partially Supervised Multi-Task Network for Single-View Dietary Assessment

Fine-Tuning Convolutional Neural Networks: A Comprehensive Guide and Benchmark Analysis for Glaucoma Screening

Inner Eye Canthus Localization for Human Body Temperature Screening

Depth Videos for the Classification of Micro-Expressions

Attribute-Based Quality Assessment for Demographic Estimation in Face Videos

Gender Classification Using Video Sequences of Body Sway Recorded by Overhead Camera

Better Prior Knowledge Improves Human-Pose-Based Extrinsic Camera Calibration

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

A Quantitative Evaluation Framework of Video De-Identification Methods

NetCalib: A Novel Approach for LiDAR-Camera Auto-Calibration Based on Deep Learning

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

PHNet: Parasite-Host Network for Video Crowd Counting

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

Location Prediction in Real Homes of Older Adults based on K-Means in Low-Resolution Depth Videos

Silhouette Body Measurement Benchmarks

From Early Biological Models to CNNs: Do They Look Where Humans Look?

A Cross Domain Multi-Modal Dataset for Robust Face Anti-Spooﬁng

A Comparison of Neural Network Approaches for Melanoma Classification

P2D: A Self-Supervised Method for Depth Estimation from Polarimetry

Wireless Localisation in WiFi Using Novel Deep Architectures

Anticipating Activity from Multimodal Signals

Which are the factors affecting the performance of audio surveillance systems?

Activity Recognition Using First-Person-View Cameras Based on Sparse Optical Flows

EdgeNet: Semantic Scene Completion from a Single RGB-D Image

Deep Gait Relative Attribute Using a Signed Quadratic Contrastive Loss

Deep Convolutional Embedding for Digitized Painting Clustering

Holistic Grid Fusion Based Stop Line Estimation

Confidence Calibration for Deep Renal Biopsy Immunofluorescence Image Classification

LFIR2Pose: Pose Estimation from an Extremely Low-Resolution FIR Image Sequence

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Real Time Fencing Move Classification and Detection at Touch Time During a Fencing Match