Fastvideo

Benchmarks for JPEG2000 encoders on CPU and GPU

Below we provide the benchmarks for Fastvideo JPEG2000 Encoder in comparison with other freely available open source J2K encoding solutions. Some of them are CPU-only, while the others use GPU to accelerate JPEG2000 computations.

Approaches for JPEG2000 performance measurements

There are two standard approaches to performance measurements of JPEG2000 codecs, which utilize GPU. They correspond to the two most common use cases for J2K encoders and decoders.

1. Single image mode consists in processing of single image and could be called "latency-oriented" or "minimum latency" approach. In that case the time interval (latency) between availability of original image in RAM and availability of the processed image in RAM is measured. It means that software cannot expect that any additional images will be processed at the same time and therefore cannot take advantage of multiple image encoding or decoding. Overlapping of current image processing with other activities is undesirable because it would increase the delay for getting the result. We need single image mode almost in all camera applications because apart from JPEG2000 encoding we also have to implement other image processing algorithms at the same pipeline. You can get more info from our Image & Video Processing SDK.

2. Batch mode consists in processing of batch of images and could be called "throughput-oriented" or "maximum performance". In that case frame rate becomes more important feature. It is calculated via division of the total time of processing by the number of processed images. Some JPEG2000 codecs are optimized for this use case, meaning that exploiting of task parallelism leads to better frame rate (throughput) at the expense of increased processing time for separate images. It is possible, because we actually have three devices (CPU, GPU and bus interface between them), which can be used simultaneously in that mode, whereas at single image mode these devices are used sequentially for different stages of JPEG2000 algorithm. Moreover, GPU can process several images simultaneously to increase frame rate even more, if each image is too small to load a multitude of GPU cores (especially at Tier-1 stage). Important limitation for simultaneous processing of several images is imposed by amount of free GPU memory. Batch mode is a must for streaming applications when the pipeline consists of JPEG2000 encoder or decoder. For more complicated workflow it's better to utilize single image mode, though the performance will be less.

Briefly, JPEG2000 batch mode can take into account specific methods of task parallelism, based on the following:

  • both upload to GPU and download from GPU could overlap with JPEG2000 processing on GPU (CUDA Streams)
  • Tier-1 and Tier-2 could be done in parallel: Tier-1 on GPU and multithreaded Tier-2 on CPU at the same time (this is also possible at single image mode)
  • multiple (batch) JPEG2000 processing to increase general GPU occupancy
  • multiple JPEG2000 processing at Tier-1 to improve GPU occupancy for that particular stage

Both the above modes (single image and batch) can not be fully applicable to CPU-based JPEG2000 solutions because in such cases everything is done on CPU. That's why we can consider multithreaded CPU-based JPEG2000 encoding/decoding to be alike single image mode.

At the moment we don't consider here the following possible modes for JPEG 2000 benchmarking on GPU:

  • multiple GPU mode
  • multiple tile mode for big images
  • fast parallel J2K processing with RESET, RESTART, CAUSAL and BYPASS modes

Results for all modes will be published as soon as their implementations are ready.

We don't hide anything concerning benchmarking procedures and achieved results. That's why all our users can always reproduce our benchmarks because we publish not only timing and performance - we supply full info about hardware, JPEG2000 parameters, test images and testing modes.

JPEG 2000 encoding benchmarks

We've carried out time and performance measurements for JPEG2000 encoding for 24-bit images with 2K and 4K resolutions. All results don't include any host I/O latency (image loading to RAM from HDD/SSD and saving back) and we've also excluded host-to-device transfer time. We've done such an assumption to reproduce J2K encoder usage in our conventional image processing pipeline, when initial data reside in GPU memory. Results for GPU-based JPEG2000 encoding software also include Tier-2 time on CPU, because this stage in our implementation is performed on CPU. In the tables below one can find averaged results for the best series of 1000 measurements.

JPEG2000JPEG2000 encoding parameters

  • File format – JP2
  • Algorithm 1 – lossy JPEG 2000 compression with CDF 9/7 wavelet
  • Algorithm 2 – lossless JPEG 2000 compression with CDF 5/3 wavelet
  • Compression ratio (lossy encoding) ~ 12.0 which corresponds to visually lossless compression
  • Subsampling mode – 4:4:4
  • Number of DWT levels – 7
  • Codeblock size – 32×32
  • MCT – on
  • PCRD – off
  • Tiling – off
  • Quality layers – one
  • Progression order – LRCP (L = layer, R = resolution, C = component, P = position)
  • Modes of operation – single or batch
  • 2K test image (24-bit) – 2k_wild.ppm
  • 4K test image (24-bit) – 4k_wild.ppm

Hardware and software

  • CPU Intel Core i7-5930K (Haswell-E, 6 cores, 3.5–3.7 GHz)
  • GPU NVIDIA GeForce GTX 1080 (Pascal, 20 SMM, 2560 cores, 1.6–1.7 GHz)
  • OS Windows 10 (x64)
  • CUDA Toolkit 8.0

JPEG2000 Encoders for comparison

  • OpenJPEG 2.1.0
  • Jasper
  • CUJ2K
  • Fastvideo JPEG2000

JPEG2000 lossy encoding at single image mode for 2K image: 2k_wild.ppm (1920×1080, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second PSNR MSE Compression ratio Hardware
OpenJPEG 695 ms 8.5 MB/s 1.4 fps 39.54 dB 7.23 12.00 CPU
Jasper 679 ms 8.7 MB/s 1.5 fps 39.53 dB 7.24 12.00 CPU
CUJ2K encoder 105 ms 56.5 MB/s 9.5 fps 35.60 dB 17.9 12.00 GPU + CPU
Fastvideo JPEG2000 encoder 7.00 ms 847 MB/s 142 fps 39.50 dB 7.29 12.01 GPU + CPU

JPEG2000 lossy encoding at single image mode for 4K image: 4k_wild.ppm (3840×2160, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second PSNR MSE Compression ratio Hardware
OpenJPEG 2780 ms 8.5 MB/s 0.4 fps 45.10 dB 2.01 12.02 CPU
Jasper 2529 ms 9.4 MB/s 0.4 fps 45.09 dB 2.02 12.02 CPU
CUJ2K encoder 283 ms 83.9 MB/s 3.5 fps 41.42 dB 4.69 12.05 GPU + CPU
Fastvideo JPEG2000 encoder 19.2 ms 1234 MB/s 52 fps 45.08 dB 2.02 12.04 GPU + CPU

MB/s – MegaBytes per second

Fig.1: Fastvideo JPEG2000 performance on GeForce GTX 1080 (lossy encoding, single image mode)

J2K performance analysis for single image mode

From the above figure we can see the encoding speed (JPEG 2000 performance for lossy compression) as a function of image size for Fastvideo JPEG2000 encoder at single image mode. Maximum JPEG2000 performance could be achieved with codeblock size 32×32 in most cases. For images with frame size more than 6 MB, preferred codeblock size is 32×32 at single image mode. It could also be seen that there is a performance saturation, which is dependent on image size for different codeblocks. This is a key point to get better results at batch mode. For 8K image compression with visually lossess parameters, performance saturation is present for any codeblock size at single image mode.

Figure 1 shows that on NVIDIA GeForce GTX 1080 it's possible to achieve important milestones at single image mode for visually lossless JPEG2000 encoding. For codeblocks 16×16 one can overcome 900 MB/s performance, for codeblocks 32×32 maximum performance exceeds 1300 MB/s, for codeblocks 64×64 maximum performance could reach 1100 MB/s. Performance saturation for codeblocks 16×16 occurs at 4K resolution for vusually lossless compression.

Fig.2: Fastvideo J2K performance as a function of compression ratio (lossy encoding, single image mode)

Fastvideo J2K performance versus image compression ratio

Figure 2 shows Fastvideo JPEG 2000 encoder performance as a function of compression ratio for different image resolutions for lossy compression at single image mode with standard testing conditions as stated above.

Lossless JPEG2000 encoding at single image mode for 2K image: 2k_wild.ppm (1920×1080, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second Compression ratio Hardware
OpenJPEG 962 ms 6.2 MB/s 1.0 fps 2.097 CPU
Jasper 873 ms 6.8 MB/s 1.1 fps 2.097 CPU
CUJ2K encoder 127 ms 46.7 MB/s 7.9 fps 2.095 GPU + CPU
Fastvideo JPEG2000 encoder 10.2 ms 582 MB/s 98 fps 2.098 GPU + CPU

Lossless JPEG2000 encoding at single image mode for 4K image: 4k_wild.ppm (3840×2160, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second Compression ratio Hardware
OpenJPEG 3038 ms 7.8 MB/s 0.3 fps 2.776 CPU
Jasper 2702 ms 8.8 MB/s 0.4 fps 2.776 CPU
CUJ2K encoder 347 ms 68.4 MB/s 2.9 fps 2.773 GPU + CPU
Fastvideo JPEG2000 encoder 34.8 ms 681 MB/s 28.7 fps 2.776 GPU + CPU

 

Superior performance of JPEG 2000 encoding at batch mode

For batch mode we've carried out performance measurements for JPEG 2000 encoding exactly with the same parameters as we used at single image mode. All results don't include host I/O latency (image loading to RAM from HDD/SSD and saving back).

To get maximum performance at batch mode, we don't need very large images as for single image mode. For example, 4K image contains 4 times more pixels compared to 2K. It means that at batch mode we can expect that encoding time for 2K will be 4 times less than for 4K. In theory, if at single image mode we can do visually lossless JPEG2000 encoding for 2K at 140 fps and for 4K at 52 fps (Fig.1, codeblock size 32×32), then it could be possible to achieve frame rate 52*4=208 fps for 2K encoding speed by processing 4 images with 2K resolution simultaneously, as a batch. We could expect even higher speedup for lossy compression, if at single image mode GPU is not completely occupied with 4K images (see Fig. 1) and batch size for 2K is more than 4. If we also take into account simultaneous processing on both CPU and GPU, which is possible at batch mode, one could get additional acceleration.

Fastvideo JPEG2000 lossy encoding benchmarks at batch mode

JPEG2000 encoding parameters Frames per second Compression ratio
Lossy encoding, 2K, cb 16×16 255 fps 12.0
Lossy encoding, 2K, cb 32×32 304 fps 12.0
Lossy encoding, 4K, cb 16×16 66 fps 12.0
Lossy encoding, 4K, cb 32×32 83 fps 12.0

Fastvideo JPEG2000 lossless encoding benchmarks at batch mode

JPEG2000 encoding parameters Frames per second Compression ratio
Lossless encoding, 2K, cb 16×16 115 fps 2.011
Lossless encoding, 2K, cb 32×32 119 fps 2.098
Lossless encoding, 4K, cb 16×16 32 fps 2.638
Lossless encoding, 4K, cb 32×32 37 fps 2.776

To the best of our knowledge, the above J2K performance benchmarks for lossy and lossless encoding are the fastest among all existing open source and commercial JPEG2000 encoders on CPU or GPU both for single image mode and for batch mode. To make it transparent and simple, we have published all info concerning time measurements, together with sample images, JPEG2000 parameters and hardware specifications to offer everyone an opportunity to reproduce our results and to check performance measurements of other J2K encoders at the same testing conditions. Demo for Windows for our J2K encoder on GPU could be downloaded here.

Please let us know about your performance results for software JPEG2000 encoders that you could have: Aware, Comprimato, Elecard, ERDAS ECW, FFmpeg, Kakadu, Leadtools, Lizardtech, Lurawave, Mainconcept, Morgan, etc.

     Home                   Contacts                 Site Map
GPU Image Processing