3D LUT on CUDA
3D LUT Transform is massively used for color grading and gamut mapping applications. To solve the task of 3D LUT grading, we have developed fast kernels that run on existing CUDA hardware from NVIDIA. We have implemented various formats for 3D LUTs and achieved very high performance for color grading.
Fast CUDA kernels require to put all initial data into GPU shared memory. This is possible for 3D LUT cubes with dimensions up to 17×17×17 (float) and 33×33×33 (integer). Each point of 3D cube consists of three int or float values and it means that even for the latest NVIDIA GPUs not every 3D LUT could match the size of GPU shared memory.
3D LUT Transform Features
Hardware and software
Performance for 2.5D and 3D LUT Transforms on CUDA
Test images: 16-bit RGB, 2432×1366 (2.5K) and 4032×2192 (4K)
We have designed that software as a part of our CUDA image & video processing SDK. Now our customers have opportunity to use fast 3D LUT transforms on CUDA in their realtime color grading applications with user-defined 3D LUTs.