GPU Software for Camera Applications

Author: Fyodor Serzhenko

Most of software for camera applications are working without intensive usage of GPU because quite often CPU performance is high enough to solve the task of image processing, especially for relatively small resolutions of image sensors. Contemporary CPUs are very powerful and they have many cores, so multithreaded application is the key for realtime processing for camera applications.

Now there are many machine vision and industrial cameras with high resolution image sensors and they generate a lot of data for processing. The following high resolution image sensors are very popular nowadays:

AMS CMV20000 (5120 × 3840, 20 MPix, global shutter)
On-Semi VITA 25K (5120 × 5120, 26 MPix, global shutter)
Kodak KAI-47051 CCD (8856 × 5280, 47 MPix, global shutter)
AMS CMV50000 (7920 × 6004, 48 MPix, global shutter)
Gpixel GMAX4651 (8424 × 6032, 51 MPix, global shutter)
Gpixel GMAX3265 (9344 × 7000, 65 MPix, global shutter)
Sony IMX461 (11656 × 8742, 101 MPix, rolling shutter)
Canon 120MXS (13272 × 9176, 122 MPix, rolling shutter)
Sony IMX411 (14192 × 10640, 151 MPix, rolling shutter)

If we try to process output stream at maximum frame rate and minimum latency from such image sensors, we can see that even very powerful multicore CPUs just can't meet realtime requirements and resulted latency is too big. To create CPU-based software for realtime image processing, software developers usually utilize simplified algorithms to be on time. Almost all high quality algorithms are slow even at multicore CPUs. The slowest algorithms for CPU are demosaicing, denoising, color grading, undistortion, resize, compression, etc.

There are quite a lot of different external interfaces at modern cameras: GigE, USB3, CameraLink, CoaXPress, 10G, 25G, Thunderbolt, PCIE, etc. Some of them have impressive bandwidth and it's done exactly to match data rate from high performance image sensors. New generations of image sensors offer higher resolutions and bigger frame rates, so the task of realtime processing is getting more complicated.

The same problem is even more hard to solve for realtime image processing at multiple camera systems. Image sensors may have not very high resolution, fps could be moderate or high, but total output stream from all these cameras could be significant. Multiple-PC hardware is not a good solution for such a task, though it could be a way out. Fast image processing is vitally important to cope with such streams in realtime.

Image Processing on FPGA

The first method to solve that problem is to use internal FPGA for realtime image processing. This is the solution for camera manufacturers, but not for system integrators, software developers or end users. In that case latency could be kept low, image processing performance is high, but usually FPGA-based algorithms offer low/middle image quality, the cost of development is very high. That could be acceptable for some applications, especially for embedded vision solutions, but not always. The case with FPGA is hardware-based solution. It has many advantages, apart from image quality and ease/cost of development. Without FPGA it's much more easy to use third-party software in vision systems.

Realtime Image Processing on GPU

The second case is GPU-based image processing. NVIDIA company is offering full line of GPUs starting from mobile Jetson GPU to professional high performance Quadro/Tesla hardware. Image processing on GPU is very fast because such algorithms could be processed in parallel, which is a must to get high performance results. Usually different parts of an image could be considered as independent, so they could be processed at the same time. Such an approach gives significant acceleration for most of imaging algorithms and it's applicable to any camera.

As an example of high performance and high quality raw image processing we could consider Fast CinemaDNG Processor software which is doing all computations on NVIDIA GPU and its core is based on Fastvideo SDK engine. With that software we can get high quality image processing according to digital cinema workflow. One should note that Fast CinemaDNG Processor is offering image quality which is comparable to the results of raw processing at Raw Therapee or Adobe Camera Raw software, but much faster.

Below we consider the case of image processing on GPU with Fastvideo SDK on NVIDIA GPU. Usually the task of software development for machine vision or industrial camera is quite complicated. Nevertheless, there is interesting approach which could combine GPU-based SDK with standard camera application to implement fast and high quality image processing. These are two constituent parts for the solution:

Standard CPU-based camera SDK
Fast CinemaDNG Processor software on GPU (PRO version)

One could use standard capture application which is supplied with any camera SDK. That application is capturing images from the camera and writing these frames to a ring buffer in system memory. Then we can take each frame from that ring buffer and send it to GPU for further processing with Fast CinemaDNG Processor software. This is the way to combine two available solutions to get camera application with full image processing pipeline in realtime.

This is infographics for data flow in the application. Camera captures frames and sends them via external interface to PC. Then the software copies frames to GPU for image processing and later on collects processed frames, outputs them to monitor and stores them on SSD.

GPU-based image processing can be very fast and it could offer very high quality at the same time. On a good GPU the total performane could reach 1.5–4 GPix/s, though it strongly depends on complexity of image processing pipeline. Multiple GPU solutions could significanly improve the performance. This is actually the way to create the software for realtime processing at multicamera systems.

Which threads do we need at camera application?

CPU-based image processing software for any camera application is usually multithreaded. Different threads are working on specific tasks and such a parallelism allows us to meet realtime requirements at multicore CPU. For GPU-based software we can perform most of image processing tasks on GPU and finally we can get the following CPU threads at the software:

Image acquisition from the camera
CUDA thread which controls GPU-based image processing
Additional thread to run NVENC encoder (h.264 compression) on GPU
Optional thread for AI on GPU Tensor cores
Display processed frames on the monitor via OpenGL
Several CPU threads to store raw or compressed images or video to SSD
Several CPU threads for reading and parsing RAW frames from SSD for offline processing

As soon as we have implemented most of the above threads in the Fast CinemaDNG Processor software, we can use it with any machine vision or industrial camera. We just need to take camera SDK and to build sample application which is capable of frame capture and data copy to external application. We've already done that for XIMEA high performance cameras and the solution is working well.

GPU Software for Camera Applications

Image Processing on FPGA

Realtime Image Processing on GPU

Which threads do we need at camera application?

References

Contact Form