Fast Region-Adaptive Defogging and Enhancement for Outdoor Images Containing Sky

Zhan Li, Xiaopeng Zheng, Bir Bhanu, Shun Long, Qingfeng Zhang, Zhenghao Huang

Responsive image

Auto-TLDR; Image defogging and enhancement of hazy outdoor scenes using region-adaptive segmentation and region-ratio-based adaptive Gamma correction

Slides Poster

Inclement weather, haze, and fog severely decrease the performance of outdoor imaging systems. Due to a large range of the depth-of-field, most image dehazing or enhancement methods suffer from color distortions and halo artifacts when applied to real-world hazy outdoor scenes, especially those with the sky. To effectively recover details in both distant and nearby regions as well as to preserve color fidelity of the sky, in this study, we propose a novel image defogging and enhancement approach based on a replaceable plug-in segmentation module and region-adaptive processing. First, regions of the grayish sky, pure white objects, and other parts are separated. Several segmentation methods are studied, including an efficient threshold-based one used for this work. Second, a luminance-inverted multi-scale Retinex with color restoration (MSRCR) and region-ratio-based adaptive Gamma correction are applied to non-grayish and non-white areas. Finally, the enhanced regions are stitched seamlessly by using a mean-filtered region mask. The proposed method is efficient in defogging natural outdoor scenes and requires no training data or prior knowledge. Extensive experiments show that the proposed approach not only outperforms several state-of-the-art defogging methods in terms of both visibility and color fidelity, but also provides enhanced outputs with fewer artifacts and halos, particularly in sky regions.

Similar papers

SIDGAN: Single Image Dehazing without Paired Supervision

Pan Wei, Xin Wang, Lei Wang, Ji Xiang, Zihan Wang

Responsive image

Auto-TLDR; DehazeGAN: An End-to-End Generative Adversarial Network for Image Dehazing

Slides Poster Similar

Single image dehazing is challenging without scene airlight and transmission map. Most of existing dehazing algorithms tend to estimate key parameters based on manual designed priors or statistics, which may be invalid in some scenarios. Although deep learning-based dehazing methods provide an effective solution, most of them rely on paired training datasets, which are prohibitively difficult to be collected in real world. In this paper, we propose an effective end-to-end generative adversarial network for image dehazing, named DehazeGAN. The proposed DehazeGAN adopts a U-net architecture with a novel color-consistency loss derived from dark channel prior and perceptual loss, which can be trained in an unsupervised fashion without paired synthetic datasets. We create a RealHaze dataset for network training, including 4,000 outdoor hazy images and 4,000 haze-free images. Extensive experiments demonstrate that our proposed DehazeGAN achieves better performance than existing state-of-the-art methods on both synthetic datasets and real-world datasets in terms of PSNR, SSIM, and subjective visual experience.

Near-Infrared Depth-Independent Image Dehazing using Haar Wavelets

Sumit Laha, Ankit Sharma, Shengnan Hu, Hassan Foroosh

Responsive image

Auto-TLDR; A fusion algorithm for haze removal using Haar wavelets

Slides Poster Similar

We propose a fusion algorithm for haze removal that combines color information from an RGB image and edge information extracted from its corresponding NIR image using Haar wavelets. The proposed algorithm is based on the key observation that NIR edge features are more prominent in the hazy regions of the image than the RGB edge features in those same regions. To combine the color and edge information, we introduce a haze-weight map which proportionately distributes the color and edge information during the fusion process. Because NIR images are, intrinsically, nearly haze-free, our work makes no assumptions like existing works that rely on a scattering model and essentially designing a depth-independent method. This helps in minimizing artifacts and gives a more realistic sense to the restored haze-free image. Extensive experiments show that the proposed algorithm is both qualitatively and quantitatively better on several key metrics when compared to existing state-of-the-art methods.

Automatical Enhancement and Denoising of Extremely Low-Light Images

Yuda Song, Yunfang Zhu, Xin Du

Responsive image

Auto-TLDR; INSNet: Illumination and Noise Separation Network for Low-Light Image Restoring

Slides Poster Similar

Deep convolutional neural networks (DCNN) based methodologies have achieved remarkable performance on various low-level vision tasks recently. Restoring images captured at night is one of the trickiest low-level vision tasks due to its high-level noise and low-level intensity. We propose a DCNN-based methodology, Illumination and Noise Separation Network (INSNet), which performs both denoising and enhancement on these extremely low-light images. INSNet fully utilizes global-ware features and local-ware features using the modified network structure and image sampling scheme. Compared to well-designed complex neural networks, our proposed methodology only needs to add a bypass network to the existing network. However, it can boost the quality of recovered images dramatically but only increase the computational cost by less than 0.1%. Even without any manual settings, INSNet can stably restore the extremely low-light images to desired high-quality images.

Towards Artifacts-Free Image Defogging

Gabriele Graffieti, Davide Maltoni

Responsive image

Auto-TLDR; CurL-Defog: Learning Based Defogging with CycleGAN and HArD

Slides Similar

In this paper we present a novel defogging technique, named CurL-Defog, aimed at minimizing the creation of artifacts. The majority of learning based defogging approaches relies on paired data (i.e., the same images with and without fog), where fog is artificially added to clear images: this often provides good results on mildly fogged images but does not generalize well to real difficult cases. On the other hand, the models trained with real unpaired data (e.g. CycleGAN) can provide visually impressive results but often produce unwanted artifacts. In this paper we propose a curriculum learning strategy coupled with an enhanced CycleGAN model in order to reduce the number of produced artifacts, while maintaining state-of-the- art performance in terms of contrast enhancement and image reconstruction. We also introduce a new metric, called HArD (Hazy Artifact Detector) to numerically quantify the amount of artifacts in the defogged images, thus avoiding the tedious and subjective manual inspection of the results. The proposed approach compares favorably with state-of-the-art techniques on both real and synthetic datasets.

Visibility Restoration in Infra-Red Images

Olivier Fourt, Jean-Philippe Tarel

Responsive image

Auto-TLDR; Single Image Defogging for Long-Wavelength Infra-Red (LWIR)

Slides Poster Similar

For the last decade, single image defogging has been a subject of interest in image processing. In the visible spectrum, fog and haze decrease the visibility of distant objects. Thus, the objective of the visibility restoration is to remove as much as possible the effect of the fog within the image. Infrared sensors are more and more used in automotive and aviation industries but the effect of fog and haze is not restricted to the visible spectrum and also applies in the infrared band. After recalling the effects of fog in the common sub-bands of the infrared spectrum, we tested if the approach used for single image defogging in the visible spectrum might also work for infrared. This led us to propose a new approach of single image defogging for Long-Wavelength Infra-Red (LWIR) or Thermal Infra-Red. Several experiments are presented showing that the proposed algorithm offers interesting results not only for fog and haze but for bad weather conditions in general, during day and night.

Early Wildfire Smoke Detection in Videos

Taanya Gupta, Hengyue Liu, Bir Bhanu

Responsive image

Auto-TLDR; Semi-supervised Spatio-Temporal Video Object Segmentation for Automatic Detection of Smoke in Videos during Forest Fire

Poster Similar

Recent advances in unmanned aerial vehicles and camera technology have proven useful for the detection of smoke that emerges above the trees during a forest fire. Automatic detection of smoke in videos is of great interest to Fire department. To date, in most parts of the world, the fire is not detected in its early stage and generally it turns catastrophic. This paper introduces a novel technique that integrates spatial and temporal features in a deep learning framework using semi-supervised spatio-temporal video object segmentation and dense optical flow. However, detecting this smoke in the presence of haze and without the labeled data is difficult. Considering the visibility of haze in the sky, a dark channel pre-processing method is used that reduces the amount of haze in video frames and consequently improves the detection results. Online training is performed on a video at the time of testing that reduces the need for ground-truth data. Tests using the publicly available video datasets show that the proposed algorithms outperform previous work and they are robust across different wildfire-threatened locations.

Video Lightening with Dedicated CNN Architecture

Li-Wen Wang, Wan-Chi Siu, Zhi-Song Liu, Chu-Tak Li, P. K. Daniel Lun

Responsive image

Auto-TLDR; VLN: Video Lightening Network for Driving Assistant Systems in Dark Environment

Slides Poster Similar

Darkness brings us uncertainty, worry and low confidence. This is a problem not only applicable to us walking in a dark evening but also for drivers driving a car on the road with very dim or even without lighting condition. To address this problem, we propose a new CNN structure named as Video Lightening Network (VLN) that regards the low-light enhancement as a residual learning task, which is useful as reference to indirectly lightening the environment, or for vision-based application systems, such as driving assistant systems. The VLN consists of several Lightening Back-Projection (LBP) and Temporal Aggregation (TA) blocks. Each LBP block enhances the low-light frame by domain transfer learning that iteratively maps the frame between the low- and normal-light domains. A TA block handles the motion among neighboring frames by investigating the spatial and temporal relationships. Several TAs work in a multi-scale way, which compensates the motions at different levels. The proposed architecture has a consistent enhancement for different levels of illuminations, which significantly increases the visual quality even in the extremely dark environment. Extensive experimental results show that the proposed approach outperforms other methods under both objective and subjective metrics.

Dynamic Low-Light Image Enhancement for Object Detection Via End-To-End Training

Haifeng Guo, Yirui Wu, Tong Lu

Responsive image

Auto-TLDR; Object Detection using Low-Light Image Enhancement for End-to-End Training

Slides Poster Similar

Object detection based on convolutional neural networks is a hot research topic in computer vision. The illumination component in the image has a great impact on object detection, and it will cause a sharp decline in detection performance under low-light conditions. Using low-light image enhancement technique as a pre-processing mechanism can improve image quality and obtain better detection results.However, due to the complexity of low-light environments, the existing enhancement methods may have negative effects on some samples. Therefore, it is difficult to improve the overall detection performance in low-light conditions. In this paper, our goal is to use image enhancement to improve object detection performance rather than perceptual quality for humans. We propose a novel framework that combines low-light enhancement and object detection for end-to-end training. The framework can dynamically select different enhancement subnetworks for each sample to improve the performance of the detector. Our proposed method consists of two stage: the enhancement stage and the detection stage. The enhancement stage dynamically enhances the low-light images under the supervision of several enhancement methods and output corresponding weights. During the detection stage, the weights offers information on object classification to generate high-quality region proposals and in turn result in accurate detection. Our experiments present promising results, which show that the proposed method can significantly improve the detection performance in low-light environment.

Edge-Guided CNN for Denoising Images from Portable Ultrasound Devices

Yingnan Ma, Fei Yang, Anup Basu

Responsive image

Auto-TLDR; Edge-Guided Convolutional Neural Network for Portable Ultrasound Images

Slides Poster Similar

Ultrasound is a non-invasive tool that is useful for medical diagnosis and treatment. To reduce long wait times and add convenience to patients, portable ultrasound scanning devices are becoming increasingly popular. These devices can be held in one palm, and are compatible with modern cell phones. However, the quality of ultrasound images captured from the portable scanners is relatively poor compared to standard ultrasound scanning systems in hospitals. To improve the quality of the ultrasound images obtained from portable ultrasound devices, we propose a new neural network architecture called Edge-Guided Convolutional Neural Network (EGCNN), which can preserve significant edge information in ultrasound images when removing noise. We also study and compare the effectiveness of classical filtering approaches in removing speckle noise in these images. Experimental results show that after applying the proposed EGCNN, various organs can be better recognized from ultrasound images. This approach is expected to lead to better accuracy in diagnostics in the future.

LFIEM: Lightweight Filter-Based Image Enhancement Model

Oktai Tatanov, Aleksei Samarin

Responsive image

Auto-TLDR; Image Retouching Using Semi-supervised Learning for Mobile Devices

Slides Poster Similar

Photo retouching features are being integrated into a growing number of mobile applications. Current learning-based approaches enhance images using large convolutional neural network-based models, where the result is received directly from the neural network outputs. This method can lead to artifacts in the resulting images, models that are complicated to interpret, and can be computationally expensive. In this paper, we explore the application of a filter-based approach in order to overcome the problems outlined above. We focus on creating a lightweight solution suitable for use on mobile devices when designing our model. A significant performance increase was achieved through implementing consistency regularization used in semi-supervised learning. The proposed model can be used on mobile devices and achieves competitive results compared to known models.

Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution

Xiaoyu Xiang, Qian Lin, Jan Allebach

Responsive image

Auto-TLDR; A Context-Aware Joint CAR and SR Neural Network for High-Resolution Text Recognition and Face Detection

Slides Poster Similar

Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% less runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).

Improving Low-Resolution Image Classification by Super-Resolution with Enhancing High-Frequency Content

Liguo Zhou, Guang Chen, Mingyue Feng, Alois Knoll

Responsive image

Auto-TLDR; Super-resolution for Low-Resolution Image Classification

Slides Poster Similar

With the prosperous development of Convolutional Neural Networks, currently they can perform excellently on visual understanding tasks when the input images are high quality and common quality images. However, large degradation in performance always occur when the input images are low quality images. In this paper, we propose a new super-resolution method in order to improve the classification performance for low-resolution images. In an image, the regions in which pixel values vary dramatically contain more abundant high frequency contents compared to other parts. Based on this fact, we design a weight map and integrate it with a super-resolution CNN training framework. During the process of training, this weight map can find out positions of the high frequency pixels in ground truth high-resolution images. After that, the pixel-level loss function takes effect only at these found positions to minimize the difference between reconstructed high-resolution images and ground truth high-resolution images. Compared with other state-of-the-art super-resolution methods, the experiment results show that our method can recover more high-frequency contents in high-resolution image reconstructing, and better improve the classification accuracy after low-resolution image preprocessing.

Deep Fusion of RGB and NIR Paired Images Using Convolutional Neural Networks

琳 梅, Cheolkon Jung

Responsive image

Auto-TLDR; Deep Fusion of RGB and NIR paired images in low light condition using convolutional neural networks

Slides Poster Similar

In low light condition, the captured color (RGB) images are highly degraded by noise with severe texture loss. In this paper, we propose deep fusion of RGB and NIR paired images in low light condition using convolutional neural networks (CNNs). The proposed deep fusion network consists of three independent sub-networks: denoising, enhancing, and fusion. We build a denoising sub-network to eliminate noise from noisy RGB images. After denoising, we perform an enhancing sub-network to increase the brightness of low light RGB images. Since NIR image contains fine details, we fuse it with the Y channel of RGB image through a fusion sub-network. Experimental results demonstrate that the proposed method successfully fuses RGB and NIR images, and generates high quality fusion results containing textures and colors.

A GAN-Based Blind Inpainting Method for Masonry Wall Images

Yahya Ibrahim, Balázs Nagy, Csaba Benedek

Responsive image

Auto-TLDR; An End-to-End Blind Inpainting Algorithm for Masonry Wall Images

Slides Poster Similar

In this paper we introduce a novel end-to-end blind inpainting algorithm for masonry wall images, performing the automatic detection and virtual completion of occluded or damaged wall regions. For this purpose, we propose a three-stage deep neural network that comprises a U-Net-based sub-network for wall segmentation into brick, mortar and occluded regions, which is followed by a two-stage adversarial inpainting model. The first adversarial network predicts the schematic mortar-brick pattern of the occluded areas based on the observed wall structure, providing in itself valuable structural information for archeological and architectural applications. Finally, the second adversarial network predicts the RGB pixel values yielding a realistic visual experience for the observer. While the three stages implement a sequential pipeline, they interact through dependencies of their loss functions admitting the consideration of hidden feature dependencies between the different network components. For training and testing the network a new dataset has been created, and an extensive qualitative and quantitative evaluation versus the state-of-the-art is given.

Extending Single Beam Lidar to Full Resolution by Fusing with Single Image Depth Estimation

Yawen Lu, Yuxing Wang, Devarth Parikh, Guoyu Lu

Responsive image

Auto-TLDR; Self-supervised LIDAR for Low-Cost Depth Estimation

Slides Similar

Depth estimation is playing an important role in indoor and outdoor scene understanding, autonomous driving, augmented reality and many other tasks. Vehicles and robotics are able to use active illumination sensors such as LIDAR to receive high precision depth estimation. However, high-resolution Lidars are usually too expensive, which limits its massive production on various applications. Though single beam LIDAR enjoys the benefits of low cost, one beam depth sensing is not usually sufficient to perceive the surrounding environment in many scenarios. In this paper, we propose a learning-based framework to explore to replicate similar or even higher performance as costly LIDARs with our designed self-supervised network and a low-cost single-beam LIDAR. After the accurate calibration with a visible camera, the single beam LIDAR can adjust the scale uncertainty of the depth map estimated by the visible camera. The adjusted depth map enjoys the benefits of high resolution and sensing accuracy as high beam LIDAR and maintains low-cost as single beam LIDAR. Thus we can achieve similar sensing effect of high beam LIDAR with more than a 50-100 times cheaper price (e.g., \$80000 Velodyne HDL-64E LIDAR v.s. \$1000 SICK TIM-781 2D LIDAR and normal camera). The proposed approach is verified on our collected dataset and public dataset with superior depth-sensing performance.

Polarimetric Image Augmentation

Marc Blanchon, Fabrice Meriaudeau, Olivier Morel, Ralph Seulin, Desire Sidibe

Responsive image

Auto-TLDR; Polarimetric Augmentation for Deep Learning in Robotics Applications

Poster Similar

This paper deals with new augmentation methods for an unconventional imaging modality sensitive to the physics of the observed scene called polarimetry. In nature, polarized light is obtained by reflection or scattering. Robotics applications in urban environments are subject to many obstacles that can be specular and therefore provide polarized light. These areas are prone to segmentation errors using standard modalities but could be solved using information carried by the polarized light. Deep Convolutional Neural Networks (DCNNs) have shown excellent segmentation results, but require a significant amount of data to achieve best performances. The lack of data is usually overcomed by using augmentation methods. However, unlike RGB images, polarization images are not only scalar (intensity) images and standard augmentation techniques cannot be applied straightforwardly. We propose enhancing deep learning models through a regularized augmentation procedure applied to polarimetric data in order to characterize scenes more effectively under challenging conditions. We subsequently observe an average of 18.1% improvement in IoU between not augmented and regularized training procedures on real world data.

Removing Raindrops from a Single Image Using Synthetic Data

Yoshihito Kokubo, Shusaku Asada, Hirotaka Maruyama, Masaru Koide, Kohei Yamamoto, Yoshihisa Suetsugu

Responsive image

Auto-TLDR; Raindrop Removal Using Synthetic Raindrop Data

Slides Poster Similar

We simulated the exact features of raindrops on a camera lens and conducted an experiment to evaluate the performance of a network trained to remove raindrops using synthetic raindrop data. Although research has been conducted to precisely evaluate methods to remove raindrops, with some evaluation networks trained on images with real raindrops and others trained on images with synthetic raindrops, there have not been any studies that have directly compared the performance of two networks trained on each respective kind of image. In a previous study wherein images with synthetic raindrops were used for training, the network did not work effectively on images with real raindrops because the shapes of the raindrops were simulated using simple arithmetic expressions. In this study, we focused on generating raindrop shapes that are closer to reality with the aim of using these synthetic raindrops in images to develop a technique for removing real-world raindrops. After categorizing raindrops by type, we further separated each raindrop type into its constituent elements, generated each element separately, and finally combined the generated elements. The proposed technique was used to add images with synthetic raindrops to the training data, and when we evaluated the model, we confirmed that the technique's precision exceeded that of when only images with actual raindrops were used for training. The evaluation results proved that images with synthetic raindrops can be used as training data for real-world images.

DR2S: Deep Regression with Region Selection for Camera Quality Evaluation

Marcelin Tworski, Stéphane Lathuiliere, Salim Belkarfa, Attilio Fiandrotti, Marco Cagnazzo

Responsive image

Auto-TLDR; Texture Quality Estimation Using Deep Learning

Slides Poster Similar

In this work, we tackle the problem of estimating a camera capability to preserve fine texture details at a given lighting condition. Importantly, our texture preservation measurement should coincide with human perception. Consequently, we formulate our problem as a regression one and we introduce a deep convolutional network to estimate texture quality score. At training time, we use ground-truth quality scores provided by expert human annotators in order to obtain a subjective quality measure. In addition, we propose a region selection method to identify the image regions that are better suited at measuring perceptual quality. Finally, our experimental evaluation shows that our learning-based approach outperforms existing methods and that our region selection algorithm consistently improves the quality estimation.

P2D: A Self-Supervised Method for Depth Estimation from Polarimetry

Marc Blanchon, Desire Sidibe, Olivier Morel, Ralph Seulin, Daniel Braun, Fabrice Meriaudeau

Responsive image

Auto-TLDR; Polarimetric Regularization for Monocular Depth Estimation

Slides Poster Similar

Monocular depth estimation is a recurring subject in the field of computer vision. Its ability to describe scenes via a depth map while reducing the constraints related to the formulation of perspective geometry tends to favor its use. However, despite the constant improvement of algorithms, most methods exploit only colorimetric information. Consequently, robustness to events to which the modality is not sensitive to, like specularity or transparency, is neglected. In response to this phenomenon, we propose using polarimetry as an input for a self-supervised monodepth network. Therefore, we propose exploiting polarization cues to encourage accurate reconstruction of scenes. Furthermore, we include a term of polarimetric regularization to state-of-the-art method to take specific advantage of the data. Our method is evaluated both qualitatively and quantitatively demonstrating that the contribution of this new information as well as an enhanced loss function improves depth estimation results, especially for specular areas.

Learning Defects in Old Movies from Manually Assisted Restoration

Arthur Renaudeau, Travis Seng, Axel Carlier, Jean-Denis Durou, Fabien Pierre, Francois Lauze, Jean-François Aujol

Responsive image

Auto-TLDR; U-Net: Detecting Defects in Old Movies by Inpainting Techniques

Slides Poster Similar

We propose to detect defects in old movies, as the first step of a larger framework of old movies restoration by inpainting techniques. The specificity of our work is to learn a film restorer's expertise from a pair of sequences, composed of a movie with defects, and the same movie which was semi-automatically restored with the help of a specialized software. In order to detect those defects with minimal human interaction and further reduce the time spent for a restoration, we feed a U-Net with consecutive defective frames as input to detect the unexpected variations of pixel intensity over space and time. Since the output of the network is a mask of defect location, we first have to create the dataset of mask frames on the basis of restored frames from the software used by the film restorer, instead of classical synthetic ground truth, which is not available. These masks are estimated by computing the absolute difference between restored frames and defectuous frames, combined with thresholding and morphological closing. Our network succeeds in automatically detecting real defects with more precision than the manual selection with an all-encompassing shape, including some the expert restorer could have missed for lack of time.

A NoGAN Approach for Image and Video Restoration and Compression Artifact Removal

Mameli Filippo, Marco Bertini, Leonardo Galteri, Alberto Del Bimbo

Responsive image

Auto-TLDR; Deep Neural Network for Image and Video Compression Artifact Removal and Restoration

Poster Similar

Lossy image and video compression algorithms introduce several different types of visual artifacts that reduce the visual quality of the compressed media, and the higher the compression rate the higher is the strength of these artifacts. In this work, we describe an approach for visual quality improvement of compressed images and videos to be performed at presentation time, so to obtain the benefits of fast data transfer and reduced data storage, while enjoying a visual quality that could be obtained only reducing the compression rate. To obtain this result we propose to use a deep neural network trained using the NoGAN approach, adapting the popular DeOldify architecture used for colorization. We show how the proposed method can be applied both to image and video compression artifact removal and restoration.

Explorable Tone Mapping Operators

Su Chien-Chuan, Yu-Lun Liu, Hung Jin Lin, Ren Wang, Chia-Ping Chen, Yu-Lin Chang, Soo-Chang Pei

Responsive image

Auto-TLDR; Learning-based multimodal tone-mapping from HDR images

Slides Poster Similar

Tone-mapping plays an essential role in high dynamic range (HDR) imaging. It aims to preserve visual information of HDR images in a medium with a limited dynamic range. Although many works have been proposed to provide tone-mapped results from HDR images, most of them can only perform tone-mapping in a single pre-designed way. However,the subjectivity of tone-mapping quality varies from person to person, and the preference of tone-mapping style also differs from application to application. In this paper, a learning-based multimodal tone-mapping method is proposed, which not only achieves excellent visual quality but also explores the style diversity. Based on the framework of BicycleGAN [1], the proposed method can provide a variety of expert-level tone-mapped results by manipulating different latent codes. Finally, we show that the proposed method performs favorably against state-of-the-art tone-mapping algorithms both quantitatively and qualitatively.

Enhancing Depth Quality of Stereo Vision Using Deep Learning-Based Prior Information of the Driving Environment

Weifu Li, Vijay John, Seiichi Mita

Responsive image

Auto-TLDR; A Novel Post-processing Mathematical Framework for Stereo Vision

Slides Poster Similar

Generation of high density depth values of the driving environment is indispensable for autonomous driving. Stereo vision is one of the practical and effective methods to generate these depth values. However, the accuracy of the stereo vision is limited by texture-less regions, such as sky and road areas, and repeated patterns in the image. To overcome these problems, we propose to enhance the stereo generated depth by incorporating prior information of the driving environment. Prior information, generated by deep learning-based U-Net model, is utilized in a novel post-processing mathematical framework to refine the stereo generated depth. The proposed mathematical framework is formulated as an optimization problem, which refines the errors due to texture-less regions and repeated patterns. Owing to its mathematical formulation, the post-processing framework is not a black-box and is explainable, and can be readily utilized for depth maps generated by any stereo vision algorithm. The proposed framework is qualitatively validated on the acquired dataset and KITTI dataset. The results obtained show that the proposed framework improves the stereo depth generation accuracy

One Step Clustering Based on A-Contrario Framework for Detection of Alterations in Historical Violins

Alireza Rezaei, Sylvie Le Hégarat-Mascle, Emanuel Aldea, Piercarlo Dondi, Marco Malagodi

Responsive image

Auto-TLDR; A-Contrario Clustering for the Detection of Altered Violins using UVIFL Images

Slides Poster Similar

Preventive conservation is an important practice in Cultural Heritage. The constant monitoring of the state of conservation of an artwork helps us reduce the risk of damage and number of interventions necessary. In this work, we propose a probabilistic approach for the detection of alterations on the surface of historical violins based on an a-contrario framework. Our method is a one step NFA clustering solution which considers grey-level and spatial density information in one background model. The proposed method is robust to noise and avoids parameter tuning and any assumption about the quantity of the worn out areas. We have used as input UV induced fluorescence (UVIFL) images for considering details not perceivable with visible light. Tests were conducted on image sequences included in the ``Violins UVIFL imagery'' dataset. Results illustrate the ability of the algorithm to distinguish the worn area from the surrounding regions. Comparisons with the state of the art clustering methods shows improved overall precision and recall.

Hierarchically Aggregated Residual Transformation for Single Image Super Resolution

Zejiang Hou, Sy Kung

Responsive image

Auto-TLDR; HARTnet: Hierarchically Aggregated Residual Transformation for Multi-Scale Super-resolution

Slides Poster Similar

Visual patterns usually appear at different scales/sizes in natural images. Multi-scale feature representation is of great importance for the single-image super-resolution(SISR) task to reconstruct image objects at different scales.However, such characteristic has been rarely considered by CNN-based SISR methods. In this work, we propose a novel build-ing block, i.e. hierarchically aggregated residual transformation(HART), to achieve multi-scale feature representation in each layer of the network. Within each HART block, we connect multiple convolutions in a hierarchical residual-like manner, which greatly expands the range of effective receptive fields and helps to detect image features at different scales. To theoretically understand the proposed HART block, we recast SISR as an optimal control problem and show that HART effectively approximates the classical4th-order Runge-Kutta method, which has the merit of small local truncation error for solving numerical ordinary differential equation. By cascading the proposed HART blocks, we establish our high-performing HARTnet. Comparedwith existing SR state-of-the-arts (including those in NTIRE2019 SR Challenge leaderboard), the proposed HARTnet demonstrates consistent PSNR/SSIM performance improvements on various benchmark datasets under different degradation models.Moreover, HARTnet can efficiently restore more faithful high-resolution images than comparative SR methods (cf. Figure 1).

Multi-focus Image Fusion for Confocal Microscopy Using U-Net Regression Map

Md Maruf Hossain Shuvo, Yasmin M. Kassim, Filiz Bunyak, Olga V. Glinskii, Leike Xie, Vladislav V Glinsky, Virginia H. Huxley, Kannappan Palaniappan

Responsive image

Auto-TLDR; Independent Single Channel U-Net Fusion for Multi-focus Microscopy Images

Slides Poster Similar

Multi-focus image fusion plays an important role to better visualize the detailed information and anatomical structures of microscopy images. We propose a new approach to fuse all single-focus microscopy images in each Z-stack. As the structures are different in different channels, input images are separated into red and green channels. Red for blood vessels, and green for lymphatics like structures . Taking the maximum likelihood of U-Net regression likelihood map along Z, we obtain the focus selection map for each channel. We named this approach as Independent Single Channel U-Net (ISCU) fusion. We combined each channel fusion result to get the final dual channel composite RGB image. The dataset used is extremely challenging with complex microscopy images of mice dura mater attached to bone. We compared our results with one of the popular and widely used derivative based fusion method [7] using multiscale Hessian. We found that multiscale Hessian-based approach produces banding effects with nonhomogeneous background lacking detailed anatomical structures. So, we took the advantages of Convolutional Neural Network (CNN), and used the U-Net regression likelihood map to fuse the images. Perception based no-reference image quality assessment parameters like PIQUE, NIQE, and BRISQUE confirms the effectiveness of the proposed method.

Motion U-Net: Multi-Cue Encoder-Decoder Network for Motion Segmentation

Gani Rahmon, Filiz Bunyak, Kannappan Palaniappan

Responsive image

Auto-TLDR; Motion U-Net: A Deep Learning Framework for Robust Moving Object Detection under Challenging Conditions

Slides Poster Similar

Detection of moving objects is a critical first step in many computer vision applications. Several algorithms for motion and change detection were proposed. However, many of these approaches lack the ability to handle challenging real-world scenarios. Recently, deep learning approaches started to produce impressive solutions to computer vision tasks, particularly for detection and segmentation. Many existing deep learning networks proposed for moving object detection rely only on spatial appearance cues. In this paper, we propose a novel multi-cue and multi-stream network, Motion U-Net (MU-Net), which integrates motion, change, and appearance cues using a deep learning framework for robust moving object detection under challenging conditions. The proposed network consists of a two-stream encoder module followed by feature concatenation and a decoder module. Motion and change cues are computed through our tensor-based motion estimation and a multi-modal background subtraction modules. The proposed system was tested and evaluated on the change detection challenge datasets (CDnet-2014) and compared to state-of-the-art methods. On CDnet-2014 dataset, our approach reaches an average overall F-measure of 0.9852 and outperforms all current state-of-the-art methods. The network was also tested on the unseen SBI-2015 dataset and produced promising results.

Face Super-Resolution Network with Incremental Enhancement of Facial Parsing Information

Shuang Liu, Chengyi Xiong, Zhirong Gao

Responsive image

Auto-TLDR; Learning-based Face Super-Resolution with Incremental Boosting Facial Parsing Information

Slides Poster Similar

Recently, facial priors based face super-resolution (SR) methods have obtained significant performance gains in dealing with extremely degraded facial images, and facial priors have also been proved useful in facilitating the inference of face images. Based on this, how to fully fuse facial priors into deep features to improve face SR performance has attracted a major attention. In this paper, we propose a learning-based face SR approach with incremental boosting facial parsing information (IFPSR) for high-magnification of low-resolution faces. The proposed IFPSR method consists of three main parts: i) a three-stage parsing map embedded features upsampling network, in which image recovery and prior estimation processes are performed simultaneously and progressively to improve the image resolution; ii) a progressive training method and a joint facial attention and heatmap loss to obtain better facial attributes; iii) the channel attention strategy in residual dense blocks to adaptively learn facial features. Extensive experimental results show that compared with the state-of-the-art methods in terms of quantitative and qualitative metrics, our approach can achieve an outstanding balance between SR image quality and low network complexity.

Coarse-To-Fine Foreground Segmentation Based on Co-Occurrence Pixel-Block and Spatio-Temporal Attention Model

Xinyu Liu, Dong Liang

Responsive image

Auto-TLDR; Foreground Segmentation from coarse to Fine Using Co-occurrence Pixel-Block Model for Dynamic Scene

Slides Poster Similar

Foreground segmentation in dynamic scene is an important task in video surveillance. The unsupervised background subtraction method based on background statistics modeling has difficulties in updating. On the other hand, the supervised foreground segmentation method based on deep learning relies on the large-scale of accurately annotated training data, which limits its cross-scene performance. In this paper, we propose a foreground segmentation method from coarse to fine. First, a across-scenes trained Spatio-Temporal Attention Model (STAM) is used to achieve coarse segmentation, which does not require training on specific scene. Then the coarse segmentation is used as a reference to help Co-occurrence Pixel-Block Model (CPB) complete the fine segmentation, and at the same time help CPB to update its background model. This method is more flexible than those deep-learning-based methods which depends on the specific-scene training, and realizes the accurate online dynamic update of the background model. Experimental results on WallFlower and LIMU validate our method outperforms STAM, CPB and other methods of participating in comparison.

A Multi-Focus Image Fusion Method Based on Fractal Dimension and Guided Filtering

Nikoo Dehghani, Ehsanollah Kabir

Responsive image

Auto-TLDR; Fractal Dimension-based Multi-focus Image Fusion with Guide Filtering

Slides Poster Similar

Fractal Dimension (FD) is widely used for image segmentation because of its successful approach toward quantifying texture information. In this paper, we present a FD-based multi-focus image fusion method that utilizes FD to identify focused regions, as the primary step for the multi-focus image fusion process. The algorithm aims to extract the local FD features of each multi-focus pair estimated using the differential box-counting method. A guided filter is employed to further specify the spatial information and increase the robustness of the FD features to noise. The outcome would be analyzed to achieve a focus map that identifies sharp regions in each partially focused image. Afterwards, the detected regions are combined into a single all-focused image. The experiments, along with the objective assessments, demonstrate the competitive performance of the proposed method compared to several state-of-the-art multi-focus image fusion methods.

Deep Universal Blind Image Denoising

Jae Woong Soh, Nam Ik Cho

Responsive image

Auto-TLDR; Image Denoising with Deep Convolutional Neural Networks

Slides Similar

Image denoising is an essential part of many image processing and computer vision tasks due to inevitable noise corruption during image acquisition. Traditionally, many researchers have investigated image priors for the denoising, within the Bayesian perspective based on image properties and statistics. Recently, deep convolutional neural networks (CNNs) have shown great success in image denoising by incorporating large-scale synthetic datasets. However, they both have pros and cons. While the deep CNNs are powerful for removing the noise with known statistics, they tend to lack flexibility and practicality for the blind and real-world noise. Moreover, they cannot easily employ explicit priors. On the other hand, traditional non-learning methods can involve explicit image priors, but they require considerable computation time and cannot exploit large-scale external datasets. In this paper, we present a CNN-based method that leverages the advantages of both methods based on the Bayesian perspective. Concretely, we divide the blind image denoising problem into sub-problems and conquer each inference problem separately. As the CNN is a powerful tool for inference, our method is rooted in CNNs and propose a novel design of network for efficient inference. With our proposed method, we can successfully remove blind and real-world noise, with a moderate number of parameters of universal CNN.

Semi-Supervised Deep Learning Techniques for Spectrum Reconstruction

Adriano Simonetto, Vincent Parret, Alexander Gatto, Piergiorgio Sartor, Pietro Zanuttigh

Responsive image

Auto-TLDR; hyperspectral data estimation from RGB data using semi-supervised learning

Slides Poster Similar

State-of-the-art approaches for the estimation of hyperspectral images (HSI) from RGB data are mostly based on deep learning techniques but due to the lack of training data their performances are limited to uncommon scenarios where a large hyperspectral database is available. In this work we present a family of novel deep learning schemes for hyperspectral data estimation able to work when the hyperspectral information at our disposal is limited. Firstly, we introduce a learning scheme exploiting a physical model based on the backward mapping to the RGB space and total variation regularization that can be trained with a limited amount of HSI images. Then, we propose a novel semi-supervised learning scheme able to work even with just a few pixels labeled with hyperspectral information. Finally, we show that the approach can be extended to a transfer learning scenario. The proposed techniques allow to reach impressive performances while requiring only some HSI images or just a few pixels for the training.

Thermal Image Enhancement Using Generative Adversarial Network for Pedestrian Detection

Mohamed Amine Marnissi, Hajer Fradi, Anis Sahbani, Najoua Essoukri Ben Amara

Responsive image

Auto-TLDR; Improving Visual Quality of Infrared Images for Pedestrian Detection Using Generative Adversarial Network

Slides Poster Similar

Infrared imaging has recently played an important role in a wide range of applications including surveillance, robotics and night vision. However, infrared cameras often suffer from some limitations, essentially about low-contrast and blurred details. These problems contribute to the loss of observation of target objects in infrared images, which could limit the feasibility of different infrared imaging applications. In this paper, we mainly focus on the problem of pedestrian detection on thermal images. Particularly, we emphasis the need for enhancing the visual quality of images beforehand performing the detection step. % to ensure effective results. To address that, we propose a novel thermal enhancement architecture based on Generative Adversarial Network, and composed of two modules contrast enhancement and denoising modules with a post-processing step for edge restoration in order to improve the overall quality. The effectiveness of the proposed architecture is assessed by means of visual quality metrics and better results are obtained compared to the original thermal images and to the obtained results by other existing enhancement methods. These results have been conduced on a subset of KAIST dataset. Using the same dataset, the impact of the proposed enhancement architecture has been demonstrated on the detection results by obtaining better performance with a significant margin using YOLOv3 detector.

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Responsive image

Auto-TLDR; ISRResCNet: Deep Iterative Super-Resolution Residual Convolutional Network for Single Image Super-resolution

Slides Similar

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. Most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a huge volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves results for different scaling factors in comparison with the state-of-art methods.

Construction Worker Hardhat-Wearing Detection Based on an Improved BiFPN

Chenyang Zhang, Zhiqiang Tian, Jingyi Song, Yaoyue Zheng, Bo Xu

Responsive image

Auto-TLDR; A One-Stage Object Detection Method for Hardhat-Wearing in Construction Site

Slides Poster Similar

Work in the construction site is considered to be one of the occupations with the highest safety risk factor. Therefore, safety plays an important role in construction site. One of the most fundamental safety rules in construction site is to wear a hardhat. To strengthen the safety of the construction site, most of the current methods use multi-stage method for hardhat-wearing detection. These methods have limitations in terms of adaptability and generalizability. In this paper, we propose a one-stage object detection method based on convolutional neural network. We present a multi-scale strategy that selects the high-resolution feature maps of DarkNet-53 to effectively identify small-scale hardhats. In addition, we propose an improved weighted bi-directional feature pyramid network (BiFPN), which could fuse more semantic features from more scales. The proposed method can not only detect hardhat-wearing, but also identify the color of the hardhat. Experimental results show that the proposed method achieves a mAP of 87.04%, which outperforms several state-of-the-art methods on a public dataset.

DSPNet: Deep Learning-Enabled Blind Reduction of Speckle Noise

Yuxu Lu, Meifang Yang, Liu Wen

Responsive image

Auto-TLDR; Deep Blind DeSPeckling Network for Imaging Applications

Poster Similar

Blind reduction of speckle noise has become a long-standing unsolved problem in several imaging applications, such as medical ultrasound imaging, synthetic aperture radar (SAR) imaging, and underwater sonar imaging, etc. The unwanted noise could lead to negative effects on the reliable detection and recognition of objects of interest. From a statistical point of view, speckle noise could be assumed to be multiplicative, significantly different from the common additive Gaussian noise. The purpose of this study is to blindly reduce the speckle noise under non-ideal imaging conditions. The multiplicative relationship between latent sharp image and random noise will be first converted into an additive version through a logarithmic transformation. To promote imaging performance, we introduced the feature pyramid network (FPN) and atrous spatial pyramid pooling (ASPP), contributing to a more powerful deep blind DeSPeckling Network (named as DSPNet). In particular, DSPNet is mainly composed of two subnetworks, i.e., Log-NENet (i.e., noise estimation network in logarithmic domain) and Log-DNNet (i.e., denoising network in logarithmic domain). Log-NENet and Log-DNNet are, respectively, proposed to estimate noise level map and reduce random noise in logarithmic domain. The multi-scale mixed loss function is further proposed to improve the robust generalization of DSPNet. The proposed deep blind despeckling network is capable of reducing random noise and preserving salient image details. Both synthetic and realistic experiments have demonstrated the superior performance of our DSPNet in terms of quantitative evaluations and visual image qualities.

Selective Kernel and Motion-Emphasized Loss Based Attention-Guided Network for HDR Imaging of Dynamic Scenes

Yipeng Deng, Qin Liu, Takeshi Ikenaga

Responsive image

Auto-TLDR; SK-AHDRNet: A Deep Network with attention module and motion-emphasized loss function to produce ghost-free HDR images

Slides Poster Similar

Ghost-like artifacts caused by ill-exposed and motion areas is one of the most challenging problems in high dynamic range (HDR) image reconstruction.When the motion range is small, previous methods based on optical flow or patch-match can suppress ghost-like artifacts by first aligning input images before merging them.However, they are not robust enough and still produce artifacts for challenging scenes where large foreground motions exist.To this end, we propose a deep network with attention module and motion-emphasized loss function to produce ghost-free HDR images. In attention module, we use channel and spatial attention to guide network to emphasize important components such as motion and saturated areas automatically. With the purpose of being robust to images with different resolutions and objects with distinct scale, we adopt the selective kernel network as the basic framework for channel attention. In addition to the attention module, the motion-emphasized loss function based on the motion and ill-exposed areas mask is designed to help network reconstruct motion areas. Experiments on the public dataset indicate that the proposed SK-AHDRNet produces ghost-free results where detail in ill-exposed areas is well recovered. The proposed method scores 43.17 with PSNR metric and 61.02 with HDR-VDP-2 metric on test which outperforms all conventional works. According to quantitative and qualitative evaluations, the proposed method can achieve state-of-the-art performance.

Color, Edge, and Pixel-Wise Explanation of Predictions Based onInterpretable Neural Network Model

Jay Hoon Jung, Youngmin Kwon

Responsive image

Auto-TLDR; Explainable Deep Neural Network with Edge Detecting Filters

Poster Similar

We design an interpretable network model by introducing explainable components into a Deep Neural Network (DNN). We substituted the first kernels of a Convolutional Neural Network (CNN) and a ResNet-50 with the well-known edge detecting filters such as Sobel, Prewitt, and other filters. Each filters' relative importance scores are measured with a variant of Layer-wise Relevance Propagation (LRP) method proposed by Bach et al. Since the effects of the edge detecting filters are well understood, our model provides three different scores to explain individual predictions: the scores with respect to (1) colors, (2) edge filters, and (3) pixels of the image. Our method provides more tools to analyze the predictions by highlighting the location of important edges and colors in the images. Furthermore, the general features of a category can be shown in our scores as well as individual predictions. At the same time, the model does not degrade performances on MNIST, Fruit360 and ImageNet datasets.

A Fine-Grained Dataset and Its Efficient Semantic Segmentation for Unstructured Driving Scenarios

Kai Andreas Metzger, Peter Mortimer, Hans J "Joe" Wuensche

Responsive image

Auto-TLDR; TAS500: A Semantic Segmentation Dataset for Autonomous Driving in Unstructured Environments

Slides Poster Similar

Research in autonomous driving for unstructured environments suffers from a lack of semantically labeled datasets compared to its urban counterpart. Urban and unstructured outdoor environments are challenging due to the varying lighting and weather conditions during a day and across seasons. In this paper, we introduce TAS500, a novel semantic segmentation dataset for autonomous driving in unstructured environments. TAS500 offers fine-grained vegetation and terrain classes to learn drivable surfaces and natural obstacles in outdoor scenes effectively. We evaluate the performance of modern semantic segmentation models with an additional focus on their efficiency. Our experiments demonstrate the advantages of fine-grained semantic classes to improve the overall prediction accuracy, especially along the class boundaries. The dataset, code, and pretrained model are available online.

Breast Anatomy Enriched Tumor Saliency Estimation

Fei Xu, Yingtao Zhang, Heng-Da Cheng, Jianrui Ding, Boyu Zhang, Chunping Ning, Ying Wang

Responsive image

Auto-TLDR; Tumor Saliency Estimation for Breast Ultrasound using enriched breast anatomy knowledge

Slides Poster Similar

Breast cancer investigation is of great significance and developing tumor detection methodologies is a critical need. However, it is a challenging task for breast cancer detection using breast ultrasound (BUS) images due to the complicated breast structure and poor quality of the images. In this paper, we propose a novel tumor saliency estimation (TSE) model guided by enriched breast anatomy knowledge to localize the tumor. First, the breast anatomy layers are generated by a deep neural network. Then we refine the layers by integrating a non-semantic breast anatomy model to solve the problems of incomplete mammary layers. Meanwhile, a new background map generation method weighted by the semantic probability and spatial distance is proposed to improve the performance. The experiment demonstrates that the proposed method with the new background map outperforms four state-of-the-art TSE models with increasing 10% of F_meansure on the public BUS dataset.

A Scalable Deep Neural Network to Detect Low Quality Images without a Reference

Zongyi Liu

Responsive image

Auto-TLDR; A Deep Neural Network-based Algorithm for Non-reference Non-Reference Non-Referential Image Quality Metrics for Streaming Services

Slides Poster Similar

Online streaming services have been growing at a fast pace. To provide the best user experience, it is needed to detect low quality images from videos so that we can repair or improve them before showing to customers. For example, for movie and TV-show streaming services, it is important to check if an original~(master) video produced from a studio has low quality images that contain artifacts such as up-scaling or interlacing; for live streaming services, it is important to detect if a streamed video have hits due to encoding such as h264 or mpeg2. The impairment detection is usually measured by the non-reference~(NR) metrics because it is often difficult and sometimes impossible to get the original~(master) videos. On the other hand, today researches in the image quality area, such as super-resolution, are mainly focused on the full reference~(FR) metrics like PSNR or VMAF. In this paper, we present an algorithm that is able to reliably compute five types spatial NR metrics that are commonly used in the Prime Video~(PV) movie content inspection and live streaming services. The algorithm consists of two components: a pre-processing step that spatially de-correlates pixel intensity values and a novel deep neural network~(DNN) that is able to quantify the $NR$ metrics at the image region level. We show that our algorithm achieves better performance than state-of-art algorithms in this area.

RONELD: Robust Neural Network Output Enhancement for Active Lane Detection

Zhe Ming Chng, Joseph Mun Hung Lew, Jimmy Addison Lee

Responsive image

Auto-TLDR; Real-Time Robust Neural Network Output Enhancement for Active Lane Detection

Slides Poster Similar

Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works particularly well on train and test inputs obtained from the same dataset, the performance drops significantly on unseen datasets of different environments. In this paper, we present a real-time robust neural network output enhancement for active lane detection (RONELD) method to identify, track, and optimize active lanes from deep learning probability map outputs. We first adaptively extract lane points from the probability map outputs, followed by detecting curved and straight lanes before using weighted least squares linear regression on straight lanes to fix broken lane edges resulting from fragmentation of edge maps in real images. Lastly, we hypothesize true active lanes through tracking preceding frames. Experimental results demonstrate an up to two-fold increase in accuracy using RONELD on cross-dataset validation tests.

An Adaptive Model for Face Distortion Correction

Duong H. Nguyen, Tien D. Bui

Responsive image

Auto-TLDR; Adaptive Polynomial Model for Face Distortion Correction in Selfie Photos

Poster Similar

The age of social media insists on developing devices that are able to capture and share ones' moments with high fidelity. Handheld devices such as smartphones with wide-angle cameras have shown the current trend in mobile photography. Although one can take great delight in a wide field of view through modern cameras, nearby objects or faces may be distorted significantly. Recent works have obtained impressive results in this research area, but there is still a tradeoff between image quality and processing time to consider. This work introduces an adaptive polynomial model that automatically selects faces and performs image distortion correction. Since the photos are processed locally, faces are undistorted, and the background is close to the original state. Unlike other content-aware based methods which rely on heavy computing components and high image resolution, our model is suitable for mobile devices to tackle face distortion issue in selfie photos.

Fast Multi-Level Foreground Estimation

Thomas Germer, Tobias Uelwer, Stefan Conrad, Stefan Harmeling

Responsive image

Auto-TLDR; Fur foreground estimation given the alpha matte

Slides Poster Similar

Alpha matting aims to estimate the translucency of an object in a given image. The resulting alpha matte describes pixel-wise to what amount foreground and background colors contribute to the color of the composite image. While most methods in literature focus on estimating the alpha matte, the process of estimating the foreground colors given the input image and its alpha matte is often neglected, although foreground estimation is an essential part of many image editing workflows. In this work, we propose a novel method for foreground estimation given the alpha matte. We demonstrate that our fast multi-level approach yields results that are comparable with the state-of-the-art while outperforming those methods in computational runtime and memory usage.

TSDM: Tracking by SiamRPN++ with a Depth-Refiner and a Mask-Generator

Pengyao Zhao, Quanli Liu, Wei Wang, Qiang Guo

Responsive image

Auto-TLDR; TSDM: A Depth-D Tracker for 3D Object Tracking

Slides Poster Similar

In a generic object tracking, depth (D) information provides informative cues for foreground-background separation and target bounding box regression. However, so far, few trackers have used depth information to play the important role aforementioned due to the lack of a suitable model. In this paper, a RGB-D tracker named TSDM is proposed, which is composed of a Mask-generator (M-g), SiamRPN++ and a Depth-refiner (D-r). The M-g generates the background masks, and updates them as the target 3D position changes. The D-r optimizes the target bounding box estimated by SiamRPN++, based on the spatial depth distribution difference between the target and the surrounding background. Extensive evaluation on the Princeton Tracking Benchmark and the Visual Object Tracking challenge shows that our tracker outperforms the state-of-the-art by a large margin while achieving 23 FPS. In addition, a light-weight variant can run at 31 FPS and thus it is practical for real world applications. Code and models of TSDM are available at https://github.com/lql-team/TSDM.

Edge-Aware Monocular Dense Depth Estimation with Morphology

Zhi Li, Xiaoyang Zhu, Haitao Yu, Qi Zhang, Yongshi Jiang

Responsive image

Auto-TLDR; Spatio-Temporally Smooth Dense Depth Maps Using Only a CPU

Slides Poster Similar

Dense depth maps play an important role in Computer Vision and AR (Augmented Reality). For CV applications, a dense depth map is the cornerstone of 3D reconstruction allowing real objects to be precisely displayed in the computer. And Dense depth maps can handle correct occlusion relationships between virtual content and real objects for better user experience in AR. However, the complicated computation limits the development of computing dense depth maps. We present a novel algorithm that produces low latency, spatio-temporally smooth dense depth maps using only a CPU. The depth maps exhibit sharp discontinuities at depth edges in low computational complexity ways. Our algorithm obtains the sparse SLAM reconstruction first, then extracts coarse depth edges from a down-sampled RGB image by morphology operations. Next, we thin the depth edges and align them with image edges. Finally, a Warm-Start initialization scheme and an improved optimization solver are adopted to accelerate convergence. We evaluate our proposal quantitatively and the result shows improvements on the accuracy of depth map with respect to other state-of-the-art and baseline techniques.

Fused 3-Stage Image Segmentation for Pleural Effusion Cell Clusters

Sike Ma, Meng Zhao, Hao Wang, Fan Shi, Xuguo Sun, Shengyong Chen, Hong-Ning Dai

Responsive image

Auto-TLDR; Coarse Segmentation of Stained and Stained Unstained Cell Clusters in pleural effusion using 3-stage segmentation method

Slides Poster Similar

The appearance of tumor cell clusters in pleural effusion is usually a vital sign of cancer metastasis. Segmentation, as an indispensable basis, is of crucial importance for diagnosing, chemical treatment, and prognosis in patients. However, accurate segmentation of unstained cell clusters containing more detailed features than the fluorescent staining images remains to be a challenging problem due to the complex background and the unclear boundary. Therefore, in this paper, we propose a fused 3-stage image segmentation algorithm, namely Coarse segmentation-Mapping-Fine segmentation (CMF) to achieve unstained cell clusters from whole slide images. Firstly, we establish a tumor cell cluster dataset consisting of 107 sets of images, with each set containing one unstained image, one stained image, and one ground-truth image. Then, according to the features of the unstained and stained cell clusters, we propose a three-stage segmentation method: 1) Coarse segmentation on stained images to extract suspicious cell regions-Region of Interest (ROI); 2) Mapping this ROI to the corresponding unstained image to get the ROI of the unstained image (UI-ROI); 3) Fine Segmentation using improved automatic fuzzy clustering framework (AFCF) on the UI-ROI to get precise cell cluster boundaries. Experimental results on 107 sets of images demonstrate that the proposed algorithm can achieve better performance on unstained cell clusters with an F1 score of 90.40%.

Extended Depth of Field Preserving Color Fidelity for Automated Digital Cytology

Alexandre Bouyssoux, Riadh Fezzani, Jean-Christophe Olivo-Marin

Responsive image

Auto-TLDR; Multi-Channel Extended Depth of Field for Digital cytology based on the stationary wavelet transform

Poster Similar

This paper presents a multi-channel Extended Depth of Field (EDF) method for digital cytology based on the stationary wavelet transform. With a coefficient selection rule adapted to a precise color recovery, a sharp image can be reconstructed even on images with transparent overlapping cells. The precision and the color fidelity of the proposed method is analyzed. Moreover, an experiment demonstrating the necessity of volume analysis in cytology to achieve precise segmentation on cell clumps is conducted, and the importance of color fidelity in this context is asserted. The proposed method was tested on pap-stained urothelial cells and gray-scale cervical cells with important overlapping.