Benchmarks comparison for Jetson Nano, TX1, TX2 and AGX Xavier

NVIDIA has released a series of Jetson hardware modules for embedded applications. NVIDIA® Jetson is the world's leading embedded platform for image processing and DL/AI tasks. Its high-performance, low-power computing for deep learning and computer vision makes it the ideal platform for mobile compute-intensive projects.

We've developed Image & Video Processing SDK for Jetson hardware. Here we publish performance benchmarks for available Jetson modules. To specify image processing pipeline for testing, we consider a basic camera application as a good example for benchmarking.

Jetson Benchmark Comparison: Nano vs TX1 vs TX2 vs Xavier

 

Hardware features for Jetson Nano, TX1, TX2, AGX Xavier

Here we present brief comparison for Jetsons hardware features to see the progress and variety of mobile solutions from NVIDIA. These units aim to different markets and tasks.

Table 1. Hardware comparison for Jetson modules

Hardware feature \ Jetson module Jetson Nano Jetson TX1 Jetson TX2/TX2i Jetson AGX Xavier
CPU (ARM) 4-core ARM A57 @ 1.43 GHz 4-core ARM Cortex A57 @ 1.73GHz 4-core ARM Cortex-A57 @ 2GHz, 2-core Denver2 @ 2GHz 8-core ARM Carmel v.8.2 @ 2.26GHz
GPU 128-core Maxwell @ 921MHz 256-core Maxwell @ 998MHz 256-core Pascal @ 1.3GHz 512-core Volta @ 1.37GHz
Memory 4GB LPDDR4, 25.6 GB/s 4GB LPDDR4, 25.6 GB/s 8GB 128-bit LPDDR4, 58.3 GB/s 16GB 256-bit LPDDR4, 137 GB/s
Storage MicroSD 16 GB eMMC 5.1 32 GB eMMC 5.1 32 GB eMMC 5.1
Tensor cores -- -- -- 64
Video encoding (NVENC) (1x) 4Kp30, (2x) 1080p60, (4x) 1080p30 (1x) 4Kp30, (2x) 1080p60, (4x) 1080p30 (1x) 4Kp60, (3x) 4Kp30, (4x) 1080p60, (8x) 1080p30 (4x) 4Kp60, (8x) 4Kp30, (32x) 1080p30
Video decoding (NVDEC) (1x) 4Kp60, (2x) 4Kp30, (4x) 1080p60, (8x) 1080p30 (1x) 4Kp60, (2x) 4Kp30, (4x) 1080p60, (8x) 1080p30 (2x) 4Kp60, (4x) 4Kp30, (7x) 1080p60 (2x) 8Kp30, (6x) 4Kp60, (12x) 4Kp30
USB (4x) USB 3.0 + Micro-USB 2.0 (1x) USB 3.0 + (1x) USB 2.0 (1x) USB 3.0 + (1x) USB 2.0 (3x) USB 3.1 + (4x) USB 2.0
PCI-Express lanes 4 lanes PCIe Gen 2 5 lanes PCIe Gen 2 5 lanes PCIe Gen 2 16 lanes PCIe Gen 4
Power 5W / 10W 10W 7.5W / 15W 10W / 15W / 30W

In camera applications we can usually hide Host-to-Device transfers by implementing GPU Zero Copy or by overlapping GPU copy/compute. Device-to-Host transfers could be hidden via overlap of copy/compute.

Hardware and software for benchmarking

  • CPU/GPU NVIDIA Jetson Nano, TX1, TX2/TX2i, AGX Xavier
  • OS L4T (Ubuntu 18.04)
  • CUDA Toolkit 10.0 for Jetson Nano, TX2/TX2i, AGX Xavier
  • Fastvideo SDK 0.14.2

Performance Comparison: Jetson Nano vs TX1 vs TX2 vs AGX Xavier

For these NVIDIA Jetson modules we've done benchmarking for the following basic image processing tasks which are specific for camera applications: white balance, demosaic (debayer), color correction, optional resize, jpeg encoding, etc. It's not a full set of Fastvideo SDK features, but this is just an example to see what's the performance that we could get from each Jetson. You can also choose a debayer and type of output compression for your pipeline.

nvidia jetson image processing sdk

Table 2. GPU kernel times for 2K image processing (1920×1080, 8/16 bits per channel, milliseconds)

Algorithm and parameters / Jetson model Jetson Nano Jetson TX1 Jetson TX2/TX2i Jetson AGX Xavier
Host to Device 0.2 0.2 0.2 0.05
White Balance 0.6 0.32 0.24 0.08
HQLI Debayer 1.8 0.62 0.47 0.36
DFPD Debayer 4.7 2.4 2.06 0.95
MG Debayer 12.7 7.8 5.9 2.2
Color Correction with 3×4 matrix 1.7 1.05 0.81 0.25
Resize from 2K to 960×540 10.0 5.1 4.3 1.5
Resize from 2K to 1919×1079 19.8 9.0 8.2 2.4
Gamma (1920×1080) 1.4 0.96 0.84 0.2
JPEG Encoding (1920×1080, 90%, 4:2:0) 4.3 2.3 1.7 0.62
JPEG Encoding (1920×1080, 90%, 4:4:4) 6.8 3.1 2.6 0.75
JPEG2000 Encoding (lossy, 32×32, single mode) 81 70 63 11.1
JPEG2000 Encoding (lossless, 32×32, single mode) 190 180 163 23.3
Device to Host 0.1 0.1 0.1 0.02
Total for simple camera pipeline (ms) 9.8 5.6 4.4 1.5

 

Total time is calculated for values from gray rows of the table. This is done to show maximum performance for specified set of image processing modules which correspond to real life camera application.

Here we've compared basic set of image processing modules from Fastvideo SDK to let Jetson developers evaluate expected performance before building their imaging applications. Image processing from RAW to RGB or to JPEG is a standard task and now developers can get detailed info about the performance for the chosen pipeline according to the above table. We haven't tested H.264 and H.265 encoders and decoders yet, we are going to make such a comparison soon.

We have done the same time measurements for NVIDIA GeForce and Quadro GPUs. Here you can get that document.

One more way to check the performance of Fastvideo SDK on laptop/desktop/server GPU is to download Fast CinemaDNG Processor software with GUI for Windows. That software has Benchmarks window and there you can see timing for each stage of image processing. This is more sofisticated way of performance testing because image processing pipeline in that software could be quite complicated and you can test any module that you need. You can also do various tests on images with different resolutions to see how the performance depends on image size, content and other parameters.

Other blog posts from Fastvideo about Jetson hardware and software

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.