cuda jpeg codec

CUDA JPEG codec for NVIDIA GPUs

We have created fast JPEG codec based on NVIDIA CUDA technology. CUDA JPEG codec developed by Fastvideo is a blend of strict compliance with standards and shocking encoding and decoding speed comparing with existing commercial solutions. This is full, performance-oriented implementation of Baseline JPEG. We got ultra fast JPEG compression and decompression on the GPU due to full parallel implementation of Baseline JPEG algorithm. Our JPEG codec is much more fast in comparison with the best commercial multithreaded JPEG codecs for multicore CPUs.

Fast JPEG image compression features for CUDA JPEG codec

  • Implementation is 100% compliant with JPEG Baseline standard
  • Baseline JPEG compression and decompression for grayscale (8-bit) and color (24-bit) images with arbitrary width and height
  • Extremely fast lossy image encoding and decoding with variable compression ratio
  • Maximum input image size is 12000 x 12000 or even more (optional)
  • Maximum number of restart markers (for fast jpeg decoding) up to 65535
  • Subsampling modes: 4:4:4, 4:2:2, 4:2:0 (4:1:1)
  • JPEG image quality in the range from 1 to 100%
  • Data input: 8/24-bit images from RAM/HDD/RAID/SSD
  • Data output: final compressed/uncompressed image in RAM/HDD/RAID/SSD
  • Continuous data mode (input one image after another)
  • Standard set of computations for parallel implementation of Baseline JPEG compression and decompression
    • JPEG Encoding on GPU: Input data parcing, Color Transform, Level shift, 2D DCT, Quantization, Zig-zag, AC/DC, DPCM, RLE, Huffman, Byte stuffing, JFIF formatting
    • JPEG Decoding on GPU: JFIF parcing, Restart marker search, Inverse Huffman, Inverse RLE, Inverse DPCM, AC/DC, Inverse Zig-zag, Inverse Quantization, IDCT, Inverse Level shift, Inverse Color Transform, Output formatting
  • Optimized for the latest NVIDIA GPUs (Fermi, Kepler and Titan)
  • Compatible with Windows-7/8 (32/64)

We have succeeded to make parallel all stages of JPEG algorithm including entropy encoding and decoding. There was a widespread opinion that Huffman algorithm could be only serial. In our solution Huffman coding is not a bottleneck anymore and it's fully parallel. Now we don't off-load anything from GPU to CPU to make JPEG codec faster. CUDA JPEG codec is extremely fast and is functioning completely on GPU.

Benchmarks for JPG encoding and decoding on different GPUs

We got the following performance benchmarks for Baseline JPEG encoding and decoding for 24-bit color image with 7216 x 5408 resolution, JPEG quality 75% and subsampling 4:2:0

JPEG encoding results:

  • NVIDIA GeForce GTX 580 - 3820 MBytes/s
  • NVIDIA GeForce GTX Titan - 5750 MBytes/s

JPEG decoding throughput for the same image:

  • NVIDIA GeForce GTX 580 - 3350 MBytes/s
  • NVIDIA GeForce GTX Titan - 4980 MBytes/s

The above results include DeviceIO latency (copy image data from Host to GPU memory and vice versa). We don't include HostIO latency (image loading to Host from HDD/SSD/RAID and vice versa). Fast decompression could be done only for images that were compressed with the same JPEG codec.

Comparison with the fastest IP Cores for JPEG image compression

The idea about online high speed JPEG compression is not new. There are a lot of different JPEG FPGA implementations for that task. Here are several links for the fastest IP cores on FPGA:

  • Cast Inc. - JPEG-E Baseline JPEG Compression Core with processing rates up to 750 MSamples/s.
  • Alma-Tech (SVE-JPEG-E, SpeedView Enabled JPEG Encoder Megafunction) - IP Core for FPGA Altera/Xilinx with throughput up to 500 MSamples/s.
  • Visengi JPEG Encoder - JPEG / MJPEG Hardware Compressor IP Core with throughput up to 405 Msamples/s on Virtex-5 FPGA.

We've got much better results with GPU, though we understand that GPU is not a universal solution. We consider GPU to be an excellent choice for many high performance tasks. It could be also interesting if there are no strict limitations on power consumption and dimensions.

Options for CUDA JPEG image compressor

We can offer custom software design for combined debayer and JPEG compressor on NVIDIA GPU and other solutions:

  • Debayer + JPEG compressor (quality=75%, subsampling 4:4:4) for 2 Mpix image takes 2.7 ms (at NVIDIA GeForce GTX 580)
  • Debayer + JPEG compressor (quality=75%, subsampling 4:4:4) for 4 Mpix image takes 4.6 ms (at NVIDIA GeForce GTX 580)
  • Image preprocessing + Debayer + Image filtering + Color management + JPEG compression
  • GPU Direct option for fast data transfer from PCIE high speed camera to GPU

We can also extend our JPEG compression software by fast image processing pipeline for high speed and high resolution cameras: bad pixel removal, dark frame subtraction, shading correction, white balance, demosaicing, image filtering, gamma, LUT, color management, online resize, OpenGL output, etc.

More info about CUDA JPEG image compression

Support

  • Full technical support up to successful client integration
  • SDK, documentation

For any further information concerning CUDA JPEG codec or free trial please contact us via email.

     Home                   Contacts                 Site Map
GPU Image Processing