ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

Multi-Task Learning for Calorie Prediction on a Novel Large-Scale Recipe Dataset Enriched with Nutritional Information

Robin Ruede, Verena Heusser, Lukas Frank, Monica Haurilet, Alina Roitberg, Rainer Stiefelhagen

Auto-TLDR; Pic2kcal: Learning Food Recipes from Images for Calorie Estimation

Abstract Slides Poster

A rapidly growing amount of content posted online, such as food recipes, opens doors to new exciting applications at the intersection of vision and language. In this work, we aim to estimate the calorie amount of a meal directly from an image by learning from recipes people have published on the Internet, thus skipping time-consuming manual data annotation. Since there are few large-scale publicly available datasets captured in unconstrained environments, we propose the pic2kcal benchmark comprising 308,000 images from over 70,000 recipes including photographs, ingredients and instructions. To obtain nutritional information of the ingredients and automatically determine the ground-truth calorie value, we match the items in the recipes with structured information from a food item database. We evaluate various neural networks for regression of the calorie quantity and extend them with the multi-task paradigm. Our learning procedure combines the calorie estimation with prediction of proteins, carbohydrates, and fat amounts as well as a multi-label ingredient classification. Our experiments demonstrate clear benefits of multi-task learning for calorie estimation, surpassing the single-task calorie regression by 9.9%. To encourage further research on this task, we make the code for generating the dataset and the models publicly available.

Similar papers

Picture-To-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Jiatong Li, Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

Auto-TLDR; PITA: A Deep Learning Architecture for Predicting the Relative Amount of Ingredients from Food Images

Multi-Task Learning for Calorie Prediction on a Novel Large-Scale Recipe Dataset Enriched with Nutritional Information

Similar papers

Picture-To-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

RWMF: A Real-World Multimodal Foodlog Database

Partially Supervised Multi-Task Network for Single-View Dietary Assessment

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Uncertainty-Aware Data Augmentation for Food Recognition

A Systematic Investigation on End-To-End Deep Recognition of Grocery Products in the Wild

Self-Supervised Learning for Astronomical Image Classification

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

A CNN-RNN Framework for Image Annotation from Visual Cues and Social Network Metadata

Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes

Deep Gait Relative Attribute Using a Signed Quadratic Contrastive Loss

Price Suggestion for Online Second-Hand Items

Attentive Visual Semantic Specialized Network for Video Captioning

Bridging the Gap between Natural and Medical Images through Deep Colorization

Text Synopsis Generation for Egocentric Videos

Uncertainty-Sensitive Activity Recognition: A Reliability Benchmark and the CARING Models

Towards Tackling Multi-Label Imbalances in Remote Sensing Imagery

Emerging Relation Network and Task Embedding for Multi-Task Regression Problems

Weight Estimation from an RGB-D Camera in Top-View Configuration

Learning Neural Textual Representations for Citation Recommendation

Exploiting the Logits: Joint Sign Language Recognition and Spell-Correction

DR2S: Deep Regression with Region Selection for Camera Quality Evaluation

FC-DCNN: A Densely Connected Neural Network for Stereo Estimation

Large-Scale Historical Watermark Recognition: Dataset and a New Consistency-Based Approach

Iterative Label Improvement: Robust Training by Confidence Based Filtering and Dataset Partitioning

Predicting Chemical Properties Using Self-Attention Multi-Task Learning Based on SMILES Representation

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Smart Inference for Multidigit Convolutional Neural Network Based Barcode Decoding

A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Webly Supervised Image-Text Embedding with Noisy Tag Refinement

Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study

Enriching Video Captions with Contextual Text

Video Face Manipulation Detection through Ensemble of CNNs

Detective: An Attentive Recurrent Model for Sparse Object Detection

Ballroom Dance Recognition from Audio Recordings

An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers

End-To-End Hierarchical Relation Extraction for Generic Form Understanding

A Novel Attention-Based Aggregation Function to Combine Vision and Language

Learning from Web Data: Improving Crowd Counting Via Semi-Supervised Learning

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning

Confidence Calibration for Deep Renal Biopsy Immunofluorescence Image Classification

Conditional Multi-Task Learning for Plant Disease Identification

How Unique Is a Face: An Investigative Study

Multimodal Side-Tuning for Document Classification

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications

Improving Word Recognition Using Multiple Hypotheses and Deep Embeddings

RMS-Net: Regression and Masking for Soccer Event Spotting