Applications for fast Image & Video Processing on GPU

We can offer custom solutions for high performance imaging applications in various fields:

Long term high speed video recording
8K/16K GPU codecs (custom or standard-compliant)
High performance and high quality debayering
Raw Bayer Compression for machine vision and 3D applications
High quality image processing for industrial cameras
Solutions for fast machine vision
High performance server for multiple camera systems
Image processing for underwater cameras for aquaculture industry
Fast RAW processing on GPU
Batch conversion of DNG, CR2, CR3, NEF raw images to JPEG/TIFF/JP2
How to get video directly from series of RAW files
High speed JPEG2000 encoding and decoding
Fast J2K Viewer on GPU for geospatial applications
Compression and decompression for Digital Cinema applications
Remote collaborative post production for color grading and reviewing
Capturing, encoding and delivering broadcast quality video
Custom FFmpeg codecs and filters on GPU
Image recompression, crop, sharp and resize for web
Streaming applications
Fast image loading, decoding and visualizing for 3D and VA/VR
Color grading and toning
Medical imaging
Network displays
Video Walls
Software for film scanners and book scanners
Digital Air Traffic Solutions

To get better understanding about image processing performance in such solutions, please have a look at SDK benchmarks.

Long-term video recording for cameras with high frame rate

Many modern high speed and high resolution cameras have data rate in the range of 200–4500 MByte/s or even more and it's quite complicated to capture and to save that stream to HDD/SSD. To solve that problem one could do real time JPEG compression on GPU to increase duration time of video recording to 10–20 times. One more benefit of that solution - one can use conventional SSD instead of RAID. On GPU NVIDIA GeForce RTX 4090 one can do JPEG compression of 24-bit 65 MPix image within 0.65 ms.

Super fast demosaicing and JPEG compression

Software for machine vision and industrial cameras usually use the following image processing pipeline: read a stream of raw Bayer CFA images, demosaicing, jpeg compression, save each image to disk. We have designed a solution to combine both demosaicing and jpeg compression in our GPU software and got significant speed up. For example, our software on GPU NVIDIA GeForce RTX 4090 needs less than 1 ms to carry out high quality demosaicing and jpeg compression (quality 90% and 4:4:4 subsampling, demosaic algorithm is MG, without I/O latency) for input raw 4K image with Bayer CFA.

Raw Bayer Compression for machine vision and 3D applications

To get additional speed up for high performance applications, one can split Raw Bayer image into four color planes and to compress them with JPEG algorithm separately. In this way we exclude debayer from the pipeline and get less data for JPEG compression. Total speed up could be up to 50% in comparison with standard JPEG compression solution for Raw Bayer data.

In that case decompressing and viewing such a data will be more complicated, but this is not a problem for GPU image processing and it could be done completely on GPU as well.

Streaming applications

We are developing high performance GPU-based software for streaming applications:

Fast JPEG and JPEG2000 codecs on GPU
High speed HD-SDI processing
MXF and TS transcoding
Multiview
FFmpeg filters and codecs on GPU
Other GPU software

We can implement full image processing pipeline on GPU to achieve good image quality, high performance and minimum latency.

VR applications

Fastvideo Image & Video Processing SDK on CUDA is a core for many VR applications with 2K/4K/8K resolutions. High performance and excellent quality for image processing is a must for most of VR solutions.

Color management system according to camera and monitor DCP and ICC profiles

It's not enough to create high quality camera for imaging application - one have to design powerful software to be able to render and to visualize images from the camera. Here comes a question about colour management, DCP and ICC profiles for camera and monitor, and related matters. To take these things into account one can incorporate Color Management System for image processing and to make it fast we have implemented that on GPU.

High performance server for multiple camera systems

When you need to record data from multiple cameras in real time it could be a good idea to use high quality image processing pipeline on GPU. Some our customers do that to cope with huge amounts of data. They utilize Fastvideo SDK to do full image processing pipeline on GPU in real time from multiple cameras. We can perform realtime image processing and visualizing from two cameras XIMEA xiB with resolution 65 Mpix, 10-bit at 70 fps on one NVIDIA GeForce RTX 4090.

GPU RAW Processor

We've created GPU RAW Processor which is capable to do realtime image processing and to play video for RAW files. Usually this is time-consuming task to transform series of RAW images into 16-bit TIFF or 8/24-bit JPEG sequence. Now it could be done on NVIDIA GPU really fast. We can also monitor RGB Parade and Histogram and apply wavelet-based denoising in realtime or at video playback.

GPU RAW Processor can offer fast preview of your RAW frames directly from Windows Explorer, just click right button on the folder and see the result in player. The software also has excellent trimming capabilities to remove unnecessary frames from the footage.

Batch convert of CR2, CR3, NEF, ARW, DNG raw images to JPEG/TIFF/JP2

Full image processing pipeline for Canon CR2, CR3, Nikon NEF and Adobe DNG raw data could be done very fast on GPU and this is the way to ensure high performance conversion to JPG/TIF. Standard pipeline includes raw decoding and preprocessing, WB, demosaicing, denoising, color correction, curves and levels, DCP and LCP support, resizing, sharpening, 3D LUTs, visualizing, etc.

Media & Entertainment, Digital Cinema and JPEG2000

There is wide adoption of JPEG2000 in digital cinema as well as in post production and in the broadcast. J2K solutions have excellent results in quality, though they need a lot of compute power and bandwidth. JPEG2000 performance is not a bottleneck any more. Our GPU-based JPEG2000 codec runs much faster than any CPU-based solution. You just need conventional NVIDIA GPU for laptop, desktop or server. We also have special J2K solutions for mobile GPUs Jetson TX2, NX/AGX Xavier and Orin.

A couple of other important applications are fast MXF converter and MXF player. MXF format is widely used in M&E and Digital Cinema, so our solution could be utilized for realtime MXF reading, writing and transcoding.

JPEG2000 compression according to DCP is also very interesting application. We are doing that on GPU very fast.

GPU-based Transcoding

We are developing fast GPU-based software for transcoding applications:

J2K to H264/H265
MXF and TS transcoding to H264/H265
ProRes to H264/H265, ProRes to J2K

We can implement your desired image processing pipeline on GPU to achieve target image quality, bitrate, performance and latency.

Remote collaborative post production for editing, color grading and reviewing

We can capture live feed from HD-SDI or 3G-SDI sources in real time, then write data to local storage and simultaneously copy to NVIDIA GPU for preprocessing and J2K encoding. The software can stream compressed data over commodity internet to a remote PC/server. At the remote post production facility, the stream is received and decoded to make data immediately available for editing or processing. Here you can find more info for that solution.

Capturing, encoding and delivering broadcast quality video from SD, HD and 3G-SDI cameras and grabbers with minimum latency

That solution was developed for single or multi-camera environments and provides full capture, encoding and delivering of multiple HD-SDI video streams with GPU-based processing. We support SD-, HD- and 3G-SDI grabbers at 2K/4K/8K resolutions and more.

FFmpeg codecs and filters on GPU

There is a huge number of applications which are based on FFmpeg. To accelerate FFmpeg, we've implemented GPU-based codecs and filters which could be easily integrated in the existing CPU-based software on FFmpeg.

These are our GPU-based solutions which are fully compatible with FFmpeg:

FFmpeg MXF Transcoder
FFmpeg J2K Codec
FFmpeg J2K Decoder
FFmpeg Remap Filter

Image recompression and resize on GPU for web applications

If you have really big photo hosting or powerful image server, you have to think how to optimize image loading from your database to client's browser. Usually original photos are saved in JPEG format and they don't have the same resolution as you need at HTML page, so you need to do fast image resize before sending the photo. Complete workflow is the following:

load jpg image from database to PC RAM
jpg image decompression
image crop
image resize
image sharp
jpg image compression
send the image via network

We offer solution on GPU for that task and its performance is much better in comparison with standard solutions on CPU. With that approach we can solve one more problem – we can create super fast thumbnail generator, capable to work with really big number of jpeg images. That could be interesting for photo catalogs, web shops, etc. Now for JPEG image with resolution 1920×1080 with quality 90% and subsampling 4:4:4 we need around just ~1 ms for resize to resolution 960×540 on Tesla V100. These are our jpeg resize time measurements for Tesla V100.

JPG image loading, decoding and visualizing

Let us consider a situation when we need to load JPG image and visualize it on the monitor. If we do image decoding on CPU first and then send it to GPU, we have the following image processing pipeline:

first stage: JPG decoding on CPU
second stage: we send decoded image data to GPU
third stage: GPU shows image data on the screen

If we do image decoding on GPU, the pipeline is not the same:

first stage: we send compressed image data to GPU
second stage: JPG decoding on GPU
third stage: GPU shows image data on the screen via OpenGL

Taking into consideration the facts that JPG decompression is faster on GPU and we need less time to send compressed data to GPU instead of uncompressed (we also have less CPU usage), we see that the case with decoding and visualizing on GPU is faster. This is the way to work with 8K and 16K resolutions.

Here we get benefits four times:

we need less bandwidth of the system bus
less SSD/HDD space to store compressed data and/or less network bandwidth
now CPU is free to do other tasks
minimum latency to show decompressed data on monitor

Video conferencing over gigabit network

The idea is quite simple – one can do online JPEG or JPEG2000 compression on GPU and send compressed data frame by frame to offer video conferencing over gigabit network. For that solution we need two PCs or laptops with NVIDIA GPU.

Image compression and image enhancement in the field of medical imaging

Image compression in the field of medical imaging is very important because medical equipment can generate huge amounts of image data. Therefore image compression is a must for that kind of applications. As an example one can consider JPEG-DICOM and JPEG2000-DICOM converters or DICOM Viewers.

We can offer the following high performance codecs for PACS software (Ultrasound, Endoscopy, X-Ray, CT, MR, PET):

JPEG on GPU (8/12 bits)
JPEG2000 on GPU (reversible or irreversible, up to 16 bits per channel)
MPEG-4/H.264/H.265 on GPU (8/10 bits)
Lossless JPEG on CPU (up to 16 bits per channel)

One more interesting application for medical imaging is GPU-accelerated image enchancement technology. NVIDIA GPUs and Fastvideo SDK are utilized in Medical Vision software for flexible video endoscopy. That kind of solution could be used for raw image processing in endoscopy as well.

GPU-based software for film scanners and book scanners

Quite a lot of such scanners are based on machine vision cameras with high resolution. Usually they are working at low frame rates at 12-bit mode. That could be ok for book scanners, though for film scanners it's a must to have HDR implemented. It could be done via multi-exposure approach to compose 16-bit frames for each color plane. To achieve that, we need to get 3 raw 12-bit frames per color channel (9 raw frames in total) at different exposures from a camera to get just one RGB image with max dynamic range and high image quality. Such image processing is quite complicated and we've done several projects in that field.

Network displays

Our software can do real time screen capture on the host and send video stream to remote PC via network or wireless connection. We compress data on GPU and send it via network, then we do decompression on remote hardware (usually this is a custom board or NVIDIA Tegra GPU) and show image or video on monitor. Main idea is to increase the distance between host and display with minimum latency.

Video Walls on GPU

Video Wall application is basically a PC-based network video server to deliver a variety of content for up to 30-50 total remote displays and even more. Content can include source image or video with resolutions in the rage from 4K to 8К. All image processing is done on NVIDIA GPU in real time.