Benchmarks comparison for Jetson Nano, TX1, TX2 and AGX Xavier
NVIDIA has released a series of Jetson hardware modules for embedded applications. NVIDIA® Jetson is the world's leading embedded platform for image processing and DL/AI tasks. Its high-performance, low-power computing for deep learning and computer vision makes it the ideal platform for mobile compute-intensive projects.
We've developed Image & Video Processing SDK for NVIDIA Jetson hardware. Here we publish performance benchmarks for available Jetson modules. To specify image processing pipeline for testing, we consider a basic camera application as a good example for benchmarking.
Hardware features for Jetson Nano, TX1, TX2, AGX Xavier
Here we present brief comparison for Jetsons hardware features to see the progress and variety of mobile solutions from NVIDIA. These units aim to different markets and tasks.
Table 1. Hardware comparison for Jetson modules
In camera applications we can usually hide Host-to-Device transfers by implementing GPU Zero Copy or by overlapping GPU copy/compute. Device-to-Host transfers could be hidden via overlap of copy/compute.
Hardware and software for benchmarking
NVIDIA Jetson Comparison: Nano vs TX1 vs TX2 vs AGX Xavier
For these NVIDIA Jetson modules we've done performance benchmarking for the following basic image processing tasks which are specific for camera applications: white balance, demosaic (debayer), color correction, optional resize, jpeg encoding, etc. It's not a full set of Fastvideo SDK features, but this is just an example to see what's the performance that we could get from each Jetson. You can also choose particular debayer algorithm and output compression (JPEG or JPEG2000) for your pipeline.
Table 2. GPU kernel times for 2K image processing (1920×1080, 8/16 bits per channel, milliseconds)
Total time is calculated for values from gray rows of the table. This is done to show maximum performance benchmarks for specified set of image processing modules which correspond to real life camera applications.
Here we've compared just the basic set of image processing modules from Fastvideo SDK to let Jetson developers evaluate expected performance before building their imaging applications. Image processing from RAW to RGB or to JPEG is a standard task and now developers can get detailed info about expected performance for the chosen pipeline according to the above table. We haven't tested Jetson H.264 and H.265 encoders and decoders yet, we are going to make such a comparison soon. As soon as H.264 and H.265 encoders are working via hardware-based NVENC, it means that encoding could be done in parallel with CUDA code, so we should be able to get even better performance.
We have done the same kernel time measurements for NVIDIA GeForce and Quadro GPUs. Here you can get that document.
We've released the software for GPU-based camera application on Github and you can download source codes for gpu camera sample project. Currently it's implemented for Windows, but soon we will release Linux Ubuntu and L4T versions as well.
One more way to check the performance of Fastvideo SDK on laptop/desktop/server GPU without any programming is to download Fast CinemaDNG Processor software with GUI for Windows. That software has Performance Benchmarks window and there you can see timing for each stage of image processing. This is more sofisticated way of performance testing because image processing pipeline in that software could be quite complicated and you can test any module that you need. You can also perform various tests on images with different resolutions to see how the performance depends on image size, content and other parameters.
Other blog posts from Fastvideo about Jetson hardware and software