Fast Image and Video Denoising on CUDA

Image/video denoising is widely used in many applications. We have developed CUDA-accelerated denoising kernel that runs on existing CUDA hardware from NVIDIA. We have implemented both luma and chroma denoising and got very high performance both for image and video processing.

CUDA Denoising Features

  • Input format: 8/12/16-bit per channel input data array from CPU or GPU memory
  • Output format: 24/48-bit output data array in CPU or GPU memory
  • High quality and high speed denoising algorithms
  • GUI to show processed data via OpenGL with minimum latency
  • Timing and performance measurements
  • NVIDIA GPU (Compute Capability >= 3.0)
  • Compatibility with Windows-7/8/10 and Linux Ubuntu/CentOS

Benchmarks for fast image denoising with CUDA

Images: 2K image (1920×1080, 24-bit) and 4K image (3840×2160, 24-bit)
Wavelet transform: CDF 9/7
Number or DWT resolutions: 7
DWT thresholds for YCbCr: 10;10;10
Test description: all data in GPU memory, timing includes GPU computations only
Software: OS Windows-10 (64-bit), CUDA-10
Hardware: Intel Core i7 3820, NVIDIA GeForce GTX 1080

  • 2K denoising time – 1.78 ms (3.3 GByte/s)
  • 4K denoising time – 5.84 ms (4.0 GByte/s)

We have designed that software as a part of our CUDA image processing SDK. Now our customers have opportunity to utilize CUDA-accelerated denoising in their applications as a part of general image processing pipeline.


To test our CUDA-based denoiser, please download Fast CinemadDNG Processor software from the download page. We have implemented two types of denoisers: before demosaicing and after demosaicing. Currently that software is working with DNG series and you can get sample set of DNG images at the download page as well.

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.