Fastvideo SDK benchmarks on NVIDIA Quadro RTX 6000
Fastvideo SDK for Image and Video Processing on NVIDIA GPU offers super fast performance and high image quality. Now we've done testing of Fastvideo SDK on NVIDIA® Quadro RTX™ 6000 which is powered by the NVIDIA Turing™ architecture and NVIDIA RTX™ platform. That new technology brings the most significant advancement in computer graphics in over a decade to professional workflows. That new hardware is intended to boost the performance of image and video processing dramatically. To check that, we've done benchmarks for mostly frequently utilized image processing features.
We've done time measurements for most frequently used image processing algorithms like demosaic, resize, denoise, jpeg encoder and decoder, jpeg2000, etc. This is just a small part of Fastvideo SDK modules, though they could be valuable to understand the performance speedup on the new hardware.
To evaluate more complicated image processing pipelines we would suggest to download and to test Fast CinemaDNG Processor software which is based on Fastvideo SDK. With that software you will be able to create your own pipeline and to check the benchmarks for your images.
How we do benchmarking
As usual, performance benchmarks can just give an idea about the speed of processing, though exact values depend on OS, hardware, image content, resolution and bit depth, processing parameters, an approach of time measurements, etc. The origin of the particular image processing task could imply any specific type of benchmarking.
To get maximum performance for any GPU software, we need to ensure maximum GPU occupancy, which is not easy to accomplish. That's why we could evaluate max performance by the following ways:
Hardware and software
In the Fastvideo SDK we have three different GPU-based demosaicing algorithms at the moment:
All these algorithms are implemented for 8-bit and 16-bit workflows, and they take into account pixels new image borders. To demonstrate the performance, we imply that initial and processed data reside in GPU memory. This is the case for complicated pipelines in raw image processing applications.
To check image quality for each demosaicing algorithm in real case, you can download Fast CinemaDNG Processor software from www.fastcinemadng.com together with sample DNG image series for evaluation.
JPEG encoding and decoding benchmarks
JPEG codec from Fastvideo SDK offers very high performance both for encoding and decoding. To get better results, we need to have more data to achieve maximum GPU occupancy. This is very important issue to get good results. Here we present results for the best total kernel time for JPEG encoding and decoding. JPEG compression quality q=90%, subsampling 4:2:0 (visually lossless compression), optimum number of restart markers.
JPEG2000 encoding benchmarks
We have high performance JPEG codec on GPU in the Fastvideo SDK and this is the algorithm which is partially utilizing CPU, so total performance is also CPU-dependent, but still it's much faster than any CPU-based J2K codecs like OpenJPEG. In the tests we utilized optimal number of threads, compression ratio corresponded to visually lossless compression.
This is frequently utilized feature and here we present our results for GPU-based resize according to Lanczos algorithm.
"1/2 resolution" means 960 × 540 for 2K and 1920 × 1080 for 4K.
Apart from that, we have done benchmarks for the following pipeline: jpeg decoding - resize - jpeg encoding, which is utilized in web applications.
To summarize, Fastvideo SDK benchmarks are quite fast, though we can see possibilities to make them better by further optimization of our CUDA kernels for Turing architecture.