J2K codec performance on Jetson TX2

Author: Fyodor Serzhenko

NVIDIA Jetson TX2 hardware is very promising for imaging and other embedded applications. That high-performance and low-power hardware is utilized in autonomous solutions, especially the industrial version Jetson TX2i. Since J2K compression is a common task for UAV (Unmanned Aerial Vehicle) applications, here we evaluate such a solution and its limitations.

Detailed info concerning our testing approach for JPEG2000 encoding and decoding on desktop/server NVIDIA GPUs you can find at the corresponding links. Here we follow exactly the same procedure, but it's applied to the Jetson hardware.

 

jetson tx2 j2k codec

 

J2K encoding/decoding parameters

  • File format – JP2
  • Lossy JPEG2000 compression with CDF 9/7 wavelet
  • Lossless JPEG2000 compression with CDF 5/3 wavelet
  • Compression ratio (for lossy algorithm) ~ 12.0:1 which corresponds to visually lossless encoding
  • Subsampling mode – 4:4:4
  • Number of DWT resolutions – 7
  • Codeblock size – 32×32
  • MCT – on
  • PCRD – off
  • Tiling – off
  • Window – off
  • Quality layers – one
  • Progression order – LRCP (L = layer, R = resolution, C = component, P = position)
  • Modes of operation – single or multithreaded batch
  • 2K test image (24-bit) – 2k_wild.ppm
  • 4K test image (24-bit) – 4k_wild.ppm

It's obvious that in many cases compression ratio for visually lossless encoding could be much higher for JPEG2000 algorithm. So we would suggest testing different parameters to achieve the best compression ratio with an acceptable image quality. Decreasing the quality coefficient one can get not only better compression, but also higher framerate both for encoding and decoding. Our benchmarks show the performance results for the above images and parameters. It's not the maximum performance, which could be better in many other cases.

Hardware and software

  • NVIDIA Jetson TX2
  • CUDA Toolkit 10.2

JPEG2000 codec benchmarks on NVIDIA Jetson TX2

Mode / Resolution / Algorithm 2K (Lossy) 2K (Lossless) 4K (Lossy) 4K (Lossless)
J2K Encoder (single mode) 30.1 fps 10.2 fps 9.1 fps 3.1 fps
J2K Encoder (multithreaded batch mode) 41.4 fps 13.1 fps 14.3 fps 3.8 fps
J2K Decoder (single mode) 12.3 fps 5.5 fps 3.5 fps 1.5 fps
J2K Decoder (multithreaded batch mode) 30.1 fps 9.0 fps 8.6 fps 2.4 fps

 

Jetson TX2 has 4-core ARM Cortex-A57 @ 2 GHz and 2-core Denver2 @ 2 GHz. These two types of cores have different performance, which should be taken into account. Since Tier-2 stage of JPEG2000 algorithm is implemented on CPU, the performance of both CPU and GPU cores determine the framerate. From that point of view, multithreading can be useful (we use up to 12 threads), but in the single mode we could get different performance depending on the CPU core used. So in the single mode we need to set affinity mask to ensure utilizing the fastest CPU core.

In the tests discussed we've restricted memory usage to 2 GB. This was done under an assumption that Jetson TX2 can have only 4 GB memory, so this is important limitation for the whole image processing solution.

Here we haven't considered the task of J2K transcoding to H.264 on Jetson. That task requires additional tests, though from our previous experience with desktop/server GPUs, performance of the transcoding should not differ significantly, because Jetson has hardware support of H.264 encoding (separate from GPU), which is accessible via V4L2 interface and can be used simultaneously with JPEG2000 decoder.

By request we could offer Fastvideo SDK for Jetson for evaluation - please fill the form below and send it to us.

Other info from Fastvideo concerning JPEG2000 and Jetson

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.