|
Jetson AGX Xavier benchmarks for image and video processing
Jetson AGX Xavier is the latest mobile system on a chip from NVIDIA. 64-bit ARM CPU with 8 cores and Volta GPU with Tensor cores are offering high CPU and GPU performance to mobile computing. PC class imaging applications that require low latency, high performance, low energy consumption and large amounts of memory can now be developed for mobile devices with Fastvideo SDK for Jetson Xavier. This is the way to get super fast imaging solutions on AGX Xavier GPU for real time imaging and video applications.
Jetson AGX Xavier Tech Specs
- GPU 512-core Volta GPU with Tensor Cores
- CPU 8-core ARM v8.2 64-bit CPU, 8MB L2 + 4MB L3
- Memory 16 GB 256-Bit LPDDR4x | 137 GB/s
- Storage 32 GB eMMC 5.1 DL
- Accelerator (2x) NVDLA Engines
- Vision Accelerator 7-way VLIW Vision Processor
- Encoder/Decoder (2x) 4Kp60 | HEVC/(2x) 4Kp60
- Size 105 mm x 105 mm
- Deployment Module (Jetson AGX Xavier)
We have done performance benchmarks at Jetson AGX Xavier for the key components of Fastvideo SDK. We've tested images with 2K and 4K resolutions and got the following averaged benchmarks. For each particular algorithm we've measured an average kernel time that we need to process an image which has already been uploaded to GPU memory. This is not latency, but average kernel time on GPU.
Jetson Xavier performance benchmarks for 2K images (1920×1080)
- HQLI Demosaic (8-bit, RGGB) – 0.30 ms
- HQLI Demosaic (16-bit, RGGB) – 0.36 ms
- DFPD Demosaic (8-bit, RGGB) – 0.45 ms
- DFPD Demosaic (16-bit, RGGB) – 0.96 ms
- MG Demosaic (16-bit, RGGB) – 2.23 ms
- JPEG encoder (8-bit, quality 90%) – 0.42 ms
- JPEG encoder (24-bit, quality 90%, 4:2:0) – 0.62 ms
- JPEG encoder (24-bit, quality 90%, 4:4:4) – 0.75 ms
- JPEG decoder (8-bit, quality 90%) – 0.86 ms
- JPEG decoder (24-bit, quality 90%, 4:2:0) – 1.35 ms
- JPEG decoder (24-bit, quality 90%, 4:4:4) – 1.37 ms
- 24-bit image resize (algorithm Lanczos3) from 1920×1080 to 960×540 – 1.48 ms
- 24-bit image resize (algorithm Lanczos3) from 1920×1080 to 1919×1079 – 2.34 ms
- Denoise (8-bit, wavelet 9/7, 7 dwt levels) – 1.6 ms
- Denoise (24-bit, wavelet 9/7, 7 dwt levels) – 4.4 ms
- J2K Encoder (24-bit, wavelet 9/7, 7 dwt levels, cb 32, cr 12, lossless, single) – 23.3 ms
- J2K Encoder (24-bit, wavelet 9/7, 7 dwt levels, cb 32, cr 12, lossy, single) – 11.1 ms
- J2K Encoder (24-bit, wavelet 9/7, 7 dwt levels, cb 32, cr 12, lossy, batch) – 10.5 ms
Jetson Xavier performance benchmarks for 4K images (3840×2160)
- HQLI Demosaic (8-bit, RGGB) – 0.53 ms
- HQLI Demosaic (16-bit, RGGB) – 0.93 ms
- DFPD Demosaic (8-bit, RGGB) – 1.43 ms
- DFPD Demosaic (16-bit, RGGB) – 2.18 ms
- MG Demosaic (16-bit, RGGB) – 5.8 ms
- JPEG encoder (8-bit, quality 90%) – 1.23 ms
- JPEG encoder (24-bit, quality 90%, 4:2:0) – 1.83 ms
- JPEG encoder (24-bit, quality 90%, 4:4:4) – 2.67 ms
- JPEG decoder (8-bit, quality 90%) – 2.13 ms
- JPEG decoder (24-bit, quality 90%, 4:2:0) – 4.0 ms
- JPEG decoder (24-bit, quality 90%, 4:4:4) – 4.15 ms
- 24-bit image resize (algorithm Lanczos3) from 3840×2160 to 1920×1080 – 5.45 ms
- 24-bit image resize (algorithm Lanczos3) from 3840×2160 to 3839×2159 – 8.6 ms
- Denoise (8-bit, wavelet 9/7, 7 dwt levels) – 4.6 ms
- Denoise (24-bit, wavelet 9/7, 7 dwt levels) – 13 ms
- J2K Encoder (24-bit, wavelet 9/7, 7 dwt levels, cb 32, cr 12, lossless, single) – 92 ms
- J2K Encoder (24-bit, wavelet 9/7, 7 dwt levels, cb 32, cr 12, lossy, single) – 42 ms
- J2K Encoder (24-bit, wavelet 9/7, 7 dwt levels, cb 32, cr 12, lossy, batch) – 39 ms
Jetson Xavier performance benchmarks for 12-bit images 4032×2192
- JPEG encoder (gray, 12-bit, quality 90%) – 2.07 ms
- JPEG encoder (color, 12-bit, quality 90%, 4:2:0) – 3.1 ms
- JPEG encoder (color, 12-bit, quality 90%, 4:4:4) – 5.0 ms
Benchmarks for Fastvideo SDK on Jetson Nano, TK1, TX1, TX2
|