Benchmarks comparison for Jetson Nano, TX1, TX2 and AGX Xavier

Author: Fyodor Serzhenko

NVIDIA has released a series of Jetson hardware modules for embedded applications. NVIDIA® Jetson is the world's leading embedded platform for image processing and DL/AI tasks. Its high-performance, low-power computing for deep learning and computer vision makes it the ideal platform for mobile compute-intensive projects.

We've developed Image & Video Processing SDK for NVIDIA Jetson hardware. Here we publish performance benchmarks for available Jetson modules. To specify image processing pipeline for testing, we consider a basic camera application as a good example for benchmarking.

Jetson Performance Benchmark Comparison: Nano vs TX1 vs TX2 vs Xavier

 

Hardware features for Jetson Nano, TX1, TX2, AGX Xavier

Here we present brief comparison for Jetsons hardware features to see the progress and variety of mobile solutions from NVIDIA. These units aim to different markets and tasks.

Table 1. Hardware comparison for Jetson modules

Hardware feature \ Jetson module Jetson Nano Jetson TX1 Jetson TX2/TX2i Jetson AGX Xavier
CPU (ARM) 4-core ARM A57 @ 1.43 GHz 4-core ARM Cortex A57 @ 1.73 GHz 4-core ARM Cortex-A57 @ 2 GHz, 2-core Denver2 @ 2 GHz 8-core ARM Carmel v.8.2 @ 2.26 GHz
GPU 128-core Maxwell @ 921 MHz 256-core Maxwell @ 998 MHz 256-core Pascal @ 1.3 GHz 512-core Volta @ 1.37 GHz
Memory 4 GB LPDDR4, 25.6 GB/s 4 GB LPDDR4, 25.6 GB/s 8 GB 128-bit LPDDR4, 58.3 GB/s 16 GB 256-bit LPDDR4, 137 GB/s
Storage MicroSD 16 GB eMMC 5.1 32 GB eMMC 5.1 32 GB eMMC 5.1
Tensor cores -- -- -- 64
Video encoding (NVENC) (1x) 4Kp30, (2x) 1080p60, (4x) 1080p30 (1x) 4Kp30, (2x) 1080p60, (4x) 1080p30 (1x) 4Kp60, (3x) 4Kp30, (4x) 1080p60, (8x) 1080p30 (4x) 4Kp60, (8x) 4Kp30, (32x) 1080p30
Video decoding (NVDEC) (1x) 4Kp60, (2x) 4Kp30, (4x) 1080p60, (8x) 1080p30 (1x) 4Kp60, (2x) 4Kp30, (4x) 1080p60, (8x) 1080p30 (2x) 4Kp60, (4x) 4Kp30, (7x) 1080p60 (2x) 8Kp30, (6x) 4Kp60, (12x) 4Kp30
USB (4x) USB 3.0 + Micro-USB 2.0 (1x) USB 3.0 + (1x) USB 2.0 (1x) USB 3.0 + (1x) USB 2.0 (3x) USB 3.1 + (4x) USB 2.0
PCI-Express lanes 4 lanes PCIe Gen 2 5 lanes PCIe Gen 2 5 lanes PCIe Gen 2 16 lanes PCIe Gen 4
Power 5W / 10W 10W 7.5W / 15W 10W / 15W / 30W

In camera applications we can usually hide Host-to-Device transfers by implementing GPU Zero Copy or by overlapping GPU copy/compute. Device-to-Host transfers could be hidden via overlap of copy/compute.

Hardware and software for benchmarking

  • CPU/GPU NVIDIA Jetson Nano, TX1, TX2/TX2i, AGX Xavier
  • OS L4T (Ubuntu 18.04)
  • CUDA Toolkit 10.0 for Jetson Nano, TX2/TX2i, AGX Xavier
  • Fastvideo SDK 0.14.2

NVIDIA Jetson Comparison: Nano vs TX1 vs TX2 vs AGX Xavier

For these NVIDIA Jetson modules we've done performance benchmarking for the following standard image processing tasks which are specific for camera applications: white balance, demosaic (debayer), color correction, resize, jpeg encoding, etc. It's not a full set of Fastvideo SDK features, but this is just an example to see what's the performance that we could get from each Jetson. You can also choose particular debayer algorithm and output compression (JPEG or JPEG2000) for your pipeline.

nvidia jetson image processing sdk

Table 2. GPU kernel times for 2K image processing (1920×1080, 8/16 bits per channel, milliseconds)

Algorithm and parameters / Jetson model Jetson Nano Jetson TX1 Jetson TX2/TX2i Jetson AGX Xavier
Host to Device 0.2 0.2 0.2 0.05
White Balance 0.6 0.32 0.24 0.08
HQLI Debayer 1.8 0.62 0.47 0.36
DFPD Debayer 4.7 2.4 2.06 0.95
MG Debayer 12.7 7.8 5.9 2.2
Color Correction with 3×4 matrix 1.7 1.05 0.81 0.25
Resize from 2K to 960×540 10.0 5.1 4.3 1.5
Resize from 2K to 1919×1079 19.8 9.0 8.2 2.4
Gamma (1920×1080) 1.4 0.96 0.84 0.2
JPEG Encoding (1920×1080, 90%, 4:2:0) 4.3 2.3 1.7 0.62
JPEG Encoding (1920×1080, 90%, 4:4:4) 6.8 3.1 2.6 0.75
JPEG2000 Encoding (lossy, 32×32, single mode) 81 70 63 11.1
JPEG2000 Encoding (lossless, 32×32, single mode) 190 180 163 23.3
Device to Host 0.1 0.1 0.1 0.02
Total for simple camera pipeline (ms) 9.8 5.6 4.4 1.5

 

Total processing time is calculated for values from gray rows of the table. This is done to show maximum performance benchmarks for specified set of image processing modules which correspond to real life camera applications.

Here we've compared just the basic set of image processing modules from Fastvideo SDK to let Jetson developers evaluate expected performance before building their imaging applications. Image processing from RAW to RGB or from RAW to JPEG are standard tasks and now developers can get detailed info about expected performance for the chosen pipeline according to the above table. We haven't tested Jetson H.264 and H.265 encoders and decoders yet, we are going to make such a comparison soon. As soon as H.264 and H.265 encoders are working via hardware-based NVENC, it means that encoding could be done in parallel with CUDA code, so we should be able to get even better performance.

We have done the same kernel time measurements for NVIDIA GeForce and Quadro GPUs. Here you can get the document with the benchmarks.

Software for Jetson performance comparison

We've released the software for GPU-based camera application on Github and it's available to download both binaries and source codes for gpu camera sample project. It's implemented for Windows-7/10, Linux Ubuntu 18.04 and L4T. Apart from full image processing pipeline on GPU for still images from SSD and for live camera output, there are options for streaming and for glass-to-glass (G2G) measurements to evaluate real latency for camera system on Jetson. The software is currently working with machine vision cameras from XIMEA, Basler, JAI, Daheng Imaging.

To check the performance of Fastvideo SDK on laptop/desktop/server GPU without any programming, one can download Fast CinemaDNG Processor software with GUI for Windows. That software has Performance Benchmarks window and there you can see timing for each stage of image processing. This is more sofisticated way of performance testing because image processing pipeline in that software could be quite complicated and you can test any module that you need. You can also perform various tests on images with different resolutions to see how the performance depends on image size, content and other parameters.

Other blog posts from Fastvideo about Jetson hardware and software

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.