On-Demand Learning for Deep Image Restoration

Ruohan Gao                      Kristen Grauman

University of Texas at Austin

In ICCV 2017

[Main] [Supp] [Poster] [Bibtex] [GitHub]

feature
Abstract

While machine learning approaches to image restoration offer great promise, current methods risk training models fixated on performing well only for image corruption of a particular level of difficulty---such as a certain level of noise or blur. First, we examine the weakness of conventional "fixated" models and demonstrate that training general models to handle arbitrary levels of corruption is indeed non-trivial. Then, we propose an on-demand learning algorithm for training image restoration models with deep convolutional neural networks. The main idea is to exploit a feedback mechanism to self-generate training instances where they are needed most, thereby learning models that can generalize across difficulty levels. On four restoration tasks---image inpainting, pixel interpolation, image deblurring, and image denoising---and three diverse datasets, our approach consistently outperforms both the status quo training procedure and curriculum learning alternatives.

The Fixation Problem

Image corruption exists in various degrees of severity, and so in real-world applications the difficulty of restoring images will also vary significantly. For example, as shown in the figure above, an inpainter may face images with varying sizes of missing content, and a deblurring system may encounter varying levels of blurriness. Intuitively, the more missing pixels or the more severe the blur, the more difficult the restoration task. However, the norm in existing deep learning methods is to train a model that succeeds at restoring images exhibiting a particular level of corruption difficulty. In particular, existing systems self-generate training instances with a manually fixed hyper-parameter that controls the degree of corruption---a fixed inpainting size, a fixed percentage of corrupted pixels, or a fixed level of Gaussian white noise, etc. The implicit assumption is that at test time, either i) corruption will be limited to that same difficulty, or ii) some other process will estimate the difficulty level before passing the image to the appropriate, separately trained restoration system. Unfortunately, these are strong assumptions that are difficult if not impossible to meet in practice. As a result, existing methods risk training fixated models: models that perform well only at a particular level of difficulty.

feature

Just how bad is the fixatation problem in image restoration tasks? The above figure helps illustrate. To get these results, we followed the current literature to train deep networks to target a certain degree of corruption for four applications. Specifically, for the image inpainting task, we train a model to inpaint a large central missing block of size 32 x 32. During testing, the resulting model can inpaint the central block of the same size at the same location very well. However, if we remove a block that is slightly shifted away from the central region, or remove a much smaller block, the model fails to inpaint satisfactorily. For the pixel interpolation, deblurring and denoising results, we attempt analogous trials, i.e., training for 80% missing pixels, a single width blur kernel or a single level of noise, respectively, then observe poor performance by the fixated models on examples having different corruption levels.

On-Demand Learning for Image Restoration

To solve the fixation problem, in this work we explore ways to let a deep learning system take control and guide its own training. This includes i) a solution that simply pools training instances from across difficulty levels, ii) a solution that focuses on hard examples, and iii) a curriculum learning solution that intelligently orders the training samples from easy to hard.

Based on our findings, we introduce a new on-demand learning solution for training all-rounder deep networks for image restoration tasks. We divide each restoration task into N sub-tasks of increasing difficulty. During training, we aim to jointly train the deep neural network restoration model to accommodate all N sub-tasks. Our approach relies on a feedback mechanism that, at each epoch of training, lets the system guide its own learning towards the right proportion of sub-tasks per difficulty level. In this way, the system itself can discover which sub-tasks deserve more or less attention.

To implement our idea, we devise a general encoder-decoder network amenable to several restoration tasks. We evaluate the approach on four low-level tasks---inpainting, pixel interpolation, image deblurring and image denoising---and three diverse datasets, CelebFaces Attributes, SUN397 Scenes, and Denoising Benchmark 11 (DB11). Across all task and datasets, the results consistently demonstrate the advantage of our proposed method. On-demand learning helps avoid the common (but thus far overlooked) pitfall of overly specializing deep networks to a narrow band of distortion difficulty.

feature

The figure above shows some qualitative examples output by our on-demand learning method. For each task, the first row shows testing examples of CelebA dataset, and the second row shows examples of SUN397 dataset. For each quintuple, Column 1: Original image from the dataset; Column 2: Corrupted image; Column 3: Restored image using rigid joint training; Column 4: Restored image using a fixated model; Column 5: Restored image using our method. The all-rounder models trained using our proposed algorithm perform well on images with various corruption levels. These illustrate that models trained using our proposed on-demand approach are all-rounders that perform well on images of different degrees of corruption. With a single model, we inpaint blocks of different sizes at arbitrary locations, restore corrupted images with different percentage of deleted pixels, deblur images at various degrees of blurriness and denoise images with various levels of Gaussian white noise.

Downloads

Our code and pre-trained torch models are available on GitHub.

Acknowledgement

This research is supported in part by NSF IIS-1514118. We also gratefully acknowledge the sup- port of the Texas Advanced Computing Center (TACC) and a GPU donation from Facebook.