CUDA JPEG codec for NVIDIA GPUs
We have created fast JPEG codec based on NVIDIA CUDA technology. CUDA JPEG codec developed by Fastvideo is a blend of strict compliance with standards and shocking encoding and decoding speed comparing with existing commercial solutions. This is full, performance-oriented implementation of Baseline JPEG. We got ultra fast JPEG compression and decompression on the GPU due to full parallel implementation of Baseline JPEG algorithm. Our JPEG codec is much faster in comparison with the best commercial multithreaded JPEG codecs for multicore CPUs.
Fast JPEG image compression features for CUDA JPEG codec
We have succeeded to make parallel all stages of JPEG algorithm including entropy encoding and decoding. There was a widespread opinion that Huffman algorithm could be only serial. In our solution Huffman coding is not a bottleneck anymore and it's fully parallel. Now we don't off-load anything from GPU to CPU to make JPEG codec faster. CUDA JPEG codec is extremely fast and is functioning completely on GPU.
Benchmarks for JPG encoding on GeForce GTX 980 (Windows-7 and CUDA-6.5, 32-bit)
Now we need just 1.13 ms for Baseline JPEG encoding of 24-bit color image with 3840 x 2160 resolution, JPEG quality 90% and subsampling 4:2:0 (it corresponds to image compression ratio ~10:1). If we include DeviceIO latency (copy image data from Host to GPU memory and vice versa), we get total compression time 3.5 ms. We have chosen the above JPEG encoding parameters because they correspond to so called "visually lossless" compression.
These are benchmarks for 2K / 4K / 20 Mpix images, 24-bit (computations on GPU, without DeviceIO latency)
Benchmarks for CUDA Discrete Cosine Transform (2D DCT) on GeForce GTX 980
The latest version of DCT (Discrete Cosine Transform) from CUDA JPEG Codec shows the following timings on NVIDIA GeForce GTX 980 for 4K image with resolution 3840 x 2160 (8-bit or 24-bit):
Timings are valid for DCT transform without DeviceIO latency (without copy image data from RAM to GPU memory and vice versa).
Comparison with the fastest IP Cores for JPEG image compression
The idea about online high speed JPEG compression is not new. There are a lot of different JPEG FPGA implementations for that task. Here are several links for the fastest IP cores on FPGA:
We've got much better results with GPU, though we understand that GPU is not a universal solution. We consider GPU to be an excellent choice for many high performance tasks. It could be also interesting if there are no strict limitations on power consumption and dimensions.
Options for CUDA JPEG image compressor
We can offer fast SDK for GPU image processing. Here you can see some benchmarks for combined debayer and JPEG encoding on NVIDIA GPU (timings include DeviceIO latency):
We can also extend our JPEG compression software by fast image processing pipeline for high speed and high resolution cameras: bad pixel removal, dark frame subtraction, shading correction, white balance, demosaicing, color correction, tone mapping, image filtering, denoising, LUT, gamma, color management, histogram, online resize, crop, rotate, sharp, OpenGL or GLFW output, integration with FFmpeg, bayer compression, etc.
We license CUDA JPEG and other components of GPU Image processing SDK to software developers, camera manufacturers, internet providers, software integrators, etc. Our SDK is utilized in wide range of imaging applications. SDK evaluation version, documentation, licensing info and quotation are available upon request. We are also offering custom software design according to agreed specification. If you need to get significant speed up for your image processing application, don't hesitate to contact us.
More info about CUDA JPEG image compression
Roadmap 2015 for further improvements of CUDA JPEG Codec