Benchmarking Super-Resolution Algorithms on Real Data


Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images.

Toward bridging this simulated-to-real gap, we introduce the Super-Resolution Erlangen (SupER) database, the first comprehensive laboratory SR database of all-real acquisitions with pixel-wise ground truth. It consists of more than 80k images of 14 scenes combining different facets: CMOS sensor noise, real sampling at four resolution levels, nine scene motion types, two photometric conditions, and lossy video coding at five levels. As such, the database exceeds existing benchmarks by an order of magnitude in quality and quantity. This paper also benchmarks 19 popular single-image and multi-frame algorithms on our data. The benchmark comprises a quantitative study by exploiting ground truth data and qualitative evaluations in a large-scale observer study. We also rigorously investigate agreements between both evaluations from a statistical perspective. One interesting result is that top-performing methods on simulated data may be surpassed by others on real data. Our insights can spur further algorithm development, and the publicy available dataset can foster future evaluations.


The images were captured with a monochromatic Basler acA2000-50gm CMOS camera to avoid subsampling introduced by a Bayer pattern and are stored in PNG format (8 bit, grayscale). Each sequence has 40 frames and is available in 4 spatial resolution levels using hardware binning: original (2040×1080), 2×2 binning (1020×540), 3×3 binning (680×360), and 4×4 binning (510×270). The sequences cover multiple types of camera and object motion. Furthermore, besides the regular sequences (inliers), we also provide the same sequences with photometric outliers (5 of 40 frames were captured with significantly less light in the scene) and video compression using H.265/HEVC coding (4 compression levels).

In the future, this data set will be gradually extended by more grayscale sequences, full color sequences, and more motion types.

The sequence for 14 scenes including all motion types, all resolution levels, all compression levels, and both inliers and outliers can be downloaded by clicking on the respective sequence name (~2.5 GB per sequence).

Banknotes Books-and-papers Bookshelf Coffee Dolls
Games Globe Globe-fast Loader Newspapers
Pencils Porsche Tea-bottles Truck-budha-duck

Benchmark Results

All evaluation results including our quantitative study as well as the human observer study are available here

Source Code

Our source code including all evaluation protocols and implementations of the benchmarked SR algorithms is available at github

Extending the Benchmark with New Algorithms

We encourage other authors to validate their SR methods on the SupER database to broaden our benchmark with novel algorithms. Future evaluation results will be published in our result section. If you would like to contribute and include an own algorithm, please contact us. In general, we currently support two options to publish new results using the evaluation framework:

  • Option 1: You run the benchmark yourselves using our evaluation scripts. We will review your submitted results and publish them accordingly.
  • Option 2: You provide a wrapper function for the source code of your algorithm (potentially with 3rdParty dependencies and required training scripts in case of learning-based methods). We include your method to the benchmark and run the evaluation on our servers.

To facilitate reproducibility, we ask you to provide (a reference to) the source code of your algorithm that was used to obtain the benchmark results.


If you use our data, results or evaluation protocols in your research, please cite our publications:

  • T. Köhler, M. Bätz, F. Naderi, A. Kaup, A. Maier and C. Riess. „Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data,“ in IEEE Transactions on Pattern Analysis and Machine Intelligence (to appear). doi: 10.1109/TPAMI.2019.2917037 [PDF]
  • Thomas Köhler, Michel Bätz, Farzad Naderi, André Kaup, Andreas Maier, and Christian Riess. „Benchmarking Super-Resolution Algorithms on Real Data“, arXiv preprint arXiv:1709.04881, 2017 [PDF]


Michel Bätz

Thomas Köhler


Quantitative Study

Here, we provide the results of our quantitative evaluations including SR images and image quality measures. The results can be downloaded separately for the different algorithms. Each algorithm can be identified by its ID. The ground truth data is accessible via ID = 0. See our evaluation framework for details on how to analyze these results.

SR0: Ground Truth SR1: EBSR SR2: ScSR SR3: NBSRF SR4: VSRnet SR5: NUISR
SR18: DRCN SR19: VDSR SR20: A+

Human Observer Study

The results of our human observer study can be downloaded here

Time Measurements

Measurements regarding the computation times of the different algorithms can be downloaded here