GPIO Latency Test for Camera ApplicationsAuthor: Fyodor Serzhenko
In robotics applications, even a small delay can cause problems. Often, response time targets below 50 milliseconds are mandatory. This prompts Jetson and other platforms to optimize each stage of the workflow rather than focusing solely on throughput. The G2G (glass-to-glass) test is widely used for latency evaluation. It measures the time between two "glasses". In this test, we typically capture a frame from a monitor, send the data to a PC for computation, and output the processed image to the monitor again. This method encompasses exposure time, sensor readout, data transfer, ISP, encoding, network, decoding, and display. Therefore, benchmarking must isolate each stage to identify the actual bottleneck in the entire pipeline.
![]()
What are the most important limitations for low-latency benchmarks?To identify the main limitations on the road to minimum latency, we conducted the following G2G test with the shortest pipeline:
We used a Jetson Orin NX 8GB with a connected XIMEA MC031CG-SY camera for testing. The camera has a USB3 interface, a frame rate of 150 fps, a ROI resolution of 1920×1080 at 8-bit, and an exposure time of 0.2 milliseconds. The monitor's frame rate was only 60 fps because it is not possible to achieve a higher frame rate with that Jetson. It turned out that the minimum latency for G2G test was 70 ms or more (to achieve better results, we need a camera with a higher frame rate and a monitor with a much better FPS). As we can see, data transfer is very fast. There is no processing on the GPU, the camera frame rate is high, and the exposure time is low. The following could be reasons for the poor G2G test result: USB3 or OpenGL. GPIO latency test without a monitor or OpenGLFor latency measurements, let's try bypassing the monitor and OpenGL. This makes sense because many robotics applications don't require a monitor. They just need to capture an image with a camera, transfer it to a Jetson or other hardware, run ISP and/or AI to make a decision, and send a command to a mechanism. To estimate the latency, we first switched off both the ISP and OpenGL. Then, we acquired raw Bayer images from the camera and stored each captured image in the Jetson memory to shorten the pipeline. This doesn't affect the main idea because we can easily distinguish black frames from white ones, even if we use unprocessed raw Bayer frames. We've run that test using GPIO from Jetson and LED. We can switch the LED on and off via GPIO, and we can see whether the LED is on or off in the captured raw image. We used the following scenario:
We've performed numerous tests with the USB3 camera at frame rates ranging from 30 to 200 fps. In all cases, the interval between the final black frame and the initial white frame was consistently one frame. It means that we've captured a black frame, then we've sent a command to the LED, and the next acquired frame was white. For the 1920×1080 resolution at 150 fps (exposure time was 0.2 ms), we've got up to 6.6 ms latency in the case without ISP and without OpenGL. It's worth mentioning that the final result includes USB3 interface latency, so it's definitely not the main bottleneck, at least for this XIMEA camera. Currently, the actual latency for that use case is unclear; we've only determined an upper bound. We also used smaller resolutions, such as 720×576, to achieve 200 fps performance (exposure time is 0.2 ms), yet the difference was just one frame. We've also checked that we haven't missed any frames (there weren't any dropped frames). ISP Pipeline on CUDAThe case with the shortest pipeline is important, but it's only suitable for evaluation purposes. Now, let's consider a real use case for robotics. This pipeline offers data preprocessing prior to the AI part of the application:
Assuming we are working with 8-bit RAW Bayer images at a resolution of 1920×1080 and 150 fps on a Jetson Orin NX 8GB, the total latency for that pipeline is up to 13 ms. After switching on the LED, the next frame comes black, an dthe next one is white. So it takes up to 6.6 ms to acquite and to transfer the RAW frame from the image sensor to the Jetson, and the GPU-based ISP time for the above pipeline is around 4-5 ms, so we can see the result within 13 ms. Total load of the Orin NX GPU is around 75% in that case. We've done the same test with 8-bit RAW Bayer images at resolution of 720×576 and 200 fps on a Jetson Orin NX 8GB, and in that case the total latency is up to 5 ms, including ISP. It means that the next frame after switching on the LED comes white. Total load of the Orin NX GPU is around 60%. As we can see, the latency is in the range of 1-2 frames for the case with/without GPU-based ISP and without OpenGL, and this is an upper estimate for the case without OpenGL. How can OpenGL be tuned to achieve better latency?Once we determine that the USB3 interface from XIMEA camera doesn't significantly impact latency, we can conclude that the main problem with the G2G test could be OpenGL. Let's check what's going on with the OpenGL workflow. These are the main stages of data processing, from GPU memory to the monitor:
In summary, to achieve optimal OpenGL performance at G2G test, we require a high-frame-rate monitor, a high-FPS camera with a high-bandwidth interface and low exposure, VSync disabled, a high-end GPU, and a fast panel. The G2G test is viable, but we need to pay additional attention to the OpenGL implementation and usage. Other blog posts: |