ICPR2020 Paper Browser

Paper download is intended for registered attendees only, and is subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.

RWMF: A Real-World Multimodal Foodlog Database

Pengfei Zhou, Cong Bai, Kaining Ying, Jie Xia, Lixin Huang

Auto-TLDR; Real-World Multimodal Foodlog: A Real-World Foodlog Database for Diet Assistant

Abstract Slides Poster

With the increasing health concerns on diet, it's worthwhile to develop an intelligent assistant that can help users eat healthier. Such assistants can automatically give personal advice for the users' diet and generate health reports about eating on a regular basis. To boost the research on such diet assistant, we establish a real-world foodlog database using various methods such as filter, cluster and graph convolutional network. This database is built based on real-world lifelog and medical data, which is named as Real-World Multimodal Foodlog (RWMF). It contains 7500 multimodal pairs, and each pair consists of a food image paired with a line of personal biometrics data (such as Blood Glucose) and a textual food description of food composition paired with a line of food nutrition data. In this paper, we present the detailed procedures for setting up the database. We evaluate the performance of RWMF using different food classification and cross-modal retrieval approaches. We also test the performance of multimodal fusion on RWMF through ablation experiments. The experimental results show that the RWMF database is quite challenging and can be widely used to evaluate the performance of food analysis methods based on multimodal data.

Similar papers

Picture-To-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Jiatong Li, Fangda Han, Ricardo Guerrero, Vladimir Pavlovic

Auto-TLDR; PITA: A Deep Learning Architecture for Predicting the Relative Amount of Ingredients from Food Images

RWMF: A Real-World Multimodal Foodlog Database

Similar papers

Picture-To-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Multi-Task Learning for Calorie Prediction on a Novel Large-Scale Recipe Dataset Enriched with Nutritional Information

Partially Supervised Multi-Task Network for Single-View Dietary Assessment

Uncertainty-Aware Data Augmentation for Food Recognition

More Correlations Better Performance: Fully Associative Networks for Multi-Label Image Classification

Transformer Reasoning Network for Image-Text Matching and Retrieval

Webly Supervised Image-Text Embedding with Noisy Tag Refinement

Zero-Shot Text Classification with Semantically Extended Graph Convolutional Network

Price Suggestion for Online Second-Hand Items

A CNN-RNN Framework for Image Annotation from Visual Cues and Social Network Metadata

Cross-Media Hash Retrieval Using Multi-head Attention Network

A Novel Attention-Based Aggregation Function to Combine Vision and Language

Automatic Classification of Human Granulosa Cells in Assisted Reproductive Technology Using Vibrational Spectroscopy Imaging

Integrating Historical States and Co-Attention Mechanism for Visual Dialog

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

Hierarchical Multimodal Attention for Deep Video Summarization

Assessing the Severity of Health States Based on Social Media Posts

Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization

VSR++: Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching

Hybrid Decomposition Convolution Neural Network and Vocabulary Forest for Image Retrieval

BAT Optimized CNN Model Identifies Water Stress in Chickpea Plant Shoot Images

Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval

Multi-Modal Identification of State-Sponsored Propaganda on Social Media

Label or Message: A Large-Scale Experimental Survey of Texts and Objects Co-Occurrence

Deep Convolutional Embedding for Digitized Painting Clustering

Automatic Annotation of Corpora for Emotion Recognition through Facial Expressions Analysis

JECL: Joint Embedding and Cluster Learning for Image-Text Pairs

Information Graphic Summarization Using a Collection of Multimodal Deep Neural Networks

RGB-Infrared Person Re-Identification Via Image Modality Conversion

VSB^2-Net: Visual-Semantic Bi-Branch Network for Zero-Shot Hashing

Fast Discrete Cross-Modal Hashing Based on Label Relaxation and Matrix Factorization

To Honor Our Heroes: Analysis of the Obituaries of Australians Killed in Action in WWI and WWII

On Identification and Retrieval of Near-Duplicate Biological Images: A New Dataset and Protocol

Face Anti-Spoofing Using Spatial Pyramid Pooling

Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology

Dual Path Multi-Modal High-Order Features for Textual Content Based Visual Question Answering

Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition

Person Recognition with HGR Maximal Correlation on Multimodal Data

A Systematic Investigation on End-To-End Deep Recognition of Grocery Products in the Wild

Attention-Based Deep Metric Learning for Near-Duplicate Video Retrieval

Prior Knowledge about Attributes: Learning a More Effective Potential Space for Zero-Shot Recognition

Multi-Graph Convolutional Network for Relationship-Driven Stock Movement Prediction

Weight Estimation from an RGB-D Camera in Top-View Configuration

Open Set Domain Recognition Via Attention-Based GCN and Semantic Matching Optimization

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Fusion of Global-Local Features for Image Quality Inspection of Shipping Label

A Systematic Investigation on Deep Architectures for Automatic Skin Lesions Classification

Learning Neural Textual Representations for Citation Recommendation