GPU JPEG Compression

We have created Fast GPU JPEG codec for NVIDIA GPU. We got ultra fast GPUJPEG compression and decompression due to parallel implementation of Baseline JPEG algorithm. Our GPU JPEG codec is much faster in comparison with the best commercial multithreaded JPEG codecs for multicore CPUs.

Fast JPEG codec features

  • Baseline JPEG for grayscale (8-bit) and color (24-bit) images
  • 12-bit JPEG compression on GPU
  • Very fast lossy image encoding and decoding with variable compression ratio
  • Data input: 8/24-bit images in CPU/GPU memory or HDD/SSD
  • Data output: final compressed/uncompressed JPEG image in CPU/GPU memory or HDD/SSD
  • Continuous data mode (input one image after another)
  • Standard set of computations for parallel implementation of Baseline JPEG: Color Transform, 2D DCT, Quantization, Zig-zag, AC/DC, DPCM, RLE, Huffman, Byte stuffing, JFIF formatting
  • Maximum input image size is 16,000×16,000 or even more (optional)
  • Compatibility with FFmpeg to read/write MJPEG streams
  • Optimized for the latest NVIDIA GPUs (Kepler, Maxwell, Pascal, Volta)
  • Compatible with Windows-7/8/10 and Linux Ubuntu/CentOS

Benchmarks for JPEG encoding on GeForce GTX 1080

Now we need just 0.78 ms for Baseline JPEG encoding of 24-bit color image with 3840×2160 resolution, JPEG quality 90% and subsampling 4:2:0 (it corresponds to compression ratio ~10:1). If we include DeviceIO latency (copy image data from Host to GPU memory and vice versa), we get 2.95 ms. These are results for single image mode (without batch and without streaming):

  • Performance for computations only = 30.4 GByte/s
  • Performance with DeviceIO latency = 8.0 GByte/s

Comparison with the fastest IP Cores for JPEG compression

The idea about online high speed JPEG compression is not new. There are a lot of different JPEG FPGA implementations for that task. Here are several links for the fastest IP cores on FPGA:

  • Cast Inc. – JPEG-E Baseline JPEG Compression Core with processing rates up to 750 MSamples/s.
  • Alma-Tech (SVE-JPEG-E, SpeedView Enabled JPEG Encoder Megafunction) – IP Core for FPGA Altera/Xilinx with throughput up to 500 MSamples/s.
  • Visengi JPEG Encoder – JPEG / MJPEG Hardware Compressor IP Core with throughput up to 405 Msamples/s on Virtex-5 FPGA.

We've got much better results with GPU, though we understand that GPU is not a universal solution. We consider GPU to be an excellent choice for many tasks, particularly for testing purposes and prototyping. It could be also interesting if there are no strict limitations on power consumption and dimensions.

More info about GPU JPEG Codec


  • Full technical support up to successful client integration
  • SDK, documentation

