gpu video transcoder

Video Transcoding on GPU

We have developed highly optimized video transcoding solution with GPU-accelerated video processing pipeline that significantly outperforms conventional CPU video processing pipelines in widely spread applications.

GPU Transcoder Features

  • Inputs: H.264/AVC
  • Outputs: H.264 compressed streams from NVIDIA GPU
  • Frame format YV12
  • High quality resize algorithm (Lanczos)
  • Compatibility with Windows-7/8/10 (32/64) and Linux (32/64)

GPU Transcoding pipeline description

  • Video decoding with libavcodec on CPU
  • Resize filter on CPU or GPU
  • Output video encoding with NVENC on GPU

The above video transcoding workflow could be done on several processes at the same time. The results for one-process (1x) and two-process (2x) tests one can find below.

Benchmarks for fast video transcoding on GPU

  • Input video: Full HD (1920×1080, 4:2:0, compressed with h264, mp4 container, high profile)
  • Output video: SD (1280×720, 4:2:0, compressed with h264, mp4 container, high profile)
  • Resize algorithm for CPU: bicublin = Bicubic resize for luma and Linear resize for chroma
  • Resize algorithm for CPU/GPU: lanczos = Lanczos
  • Test description: load/store from SSD
  • One-process and two-process configurations
  • Software: OS Windows-7 (64-bit), CUDA-7.5
  • Hardware: Intel Core i7 5930K, NVIDIA GeForce GTX 980 GPU, SSD

Transcoding benchmarks with GPU resize

  • 1xCPU decode + 1xGPU resize (lanczos) + 1xNVENC = 460 fps
  • 2xCPU decode + 2xGPU resize (lanczos) + 2xNVENC = 2*270 fps

Transcoding benchmarks with CPU resize

  • 1xCPU decode + 1xFFmpeg resize (bicublin) + 1xNVENC = 300 fps
  • 1xCPU decode + 1xFFmpeg resize (lanczos) + 1xNVENC = 180 fps
  • 2xCPU decode + 2xFFmpeg resize (bicublin) + 2xNVENC = 2*230 fps
  • 2xCPU decode + 2xFFmpeg resize (lanczos) + 2xNVENC = 2*155 fps

Transcoding benchmarks on CPU

  • 1xCPU decode + 1xFFmpeg resize (bicublin) + 1xCPU encode = 55 fps

Our solution performs in parallel all important stages: video decoding on CPU, image/video processing on GPU (frame format conversion, crop, resize and other filtering) and video encoding on GPU with NVENC. For the best performance it is necessary to fully overlap CPU threads, GPU kernels and GPU-based NVENC sessions at the same time by running two or more transcoding processes in parallel.

We have designed that software as a part of our GPU image processing SDK. Now our customers have opportunity to utilize CUDA-accelerated components to boost transcoding in their applications as a part of video processing pipeline.

Roadmap 2016 for GPU transcoding

  • Implementation of H.265 encoder
  • Further optimizations for transcoding pipeline
  • Transcoder with built-in denoising option
  • Multi-process and multi-GPU solutions for transcoding
     Home                   Contacts                 Site Map
GPU Image Processing