Low-latency software for remote collaborative post production

Author: Fyodor Serzhenko

Fastvideo company is a team of professionals in GPU image processing, realtime camera applications, digital cinema, high performance imaging solutions. Fastvideo has been helping production companies for quite a long time and recently we've implemented low-latency software to offer collaborative post production.

Today, with restrictions on in-person collaboration, delays in shipping and limitations on travel, single point of ingest and delivery for an entire production becomes vitally important. The main goal is to offer all services both on-premises and remotely. We believe that in the near future we will see virtual and distributed post production finishing.

When you are shooting a movie at tight schedule and you need to accelerate your post production workflow, then remote collaborative approach is a right solution. You don't need to have all professionals on-site, via remote approach you can collaborate at realtime wherever your teammates are located. Industry trend to remote production solutions is clear and it happens not just due to the coronavirus. The idea to accelerate post via remote operation is viable and companies strive to remove various limitations of conventional workflow - now the professionals could choose a place and a time to work remotely on post production.


remote post production software


Nowadays, there are quite a lot of software solutions to offer reliable remote access via local networks or via public internet. Still, most of them were built without an idea about professional usage in tasks like colour grading, VFX, compositing and much more. In post production we need to utilize professional hardware which could visualize 10-bit or 12-bit footages. Skype, ZOOM and many other video conference solutions are not capable of doing that, so we've implemented the software to solve that matter.

Business goals to achieve at remote collaborative post production

  • You will share content in realtime for collaborative workflows in post production
  • Lossless or visually lossless encoding guarantees high image quality and exact colour reproduction
  • Reduced travel and rent costs for the team due to remote colour grading and reviewing
  • Remote work will allow to choose the best professionals for the production
  • Your team will work on multiple projects (time saving and multi-tasking)

Goals from technical viewpoint

  • Low latency software
  • Fast and reliable data transmission over internal or public network
  • Fast acquisition and processing of SD/HD-SDI and 3G-SDI streams (unpacking, packing, transforms)
  • Realtime J2K encoding and decoding (lossy or lossless)
  • High image quality
  • Precise colour reproduction
  • Maximum bit depth (10-bit or 12-bit per channel)

Task to be solved

Post industry needs low-latency, high quality video encode/decode solution for remote work according to the following pipeline:

  • Capture baseband video streams via HD-SDI or 3G-SDI frame grabber (Blackmagic DeckLink 8K Pro, AJA Kona 4 or Kona 5)
  • Live encoding with J2K codec that supports 10-bit YUV 4:2:2 and 10/12-bit 4:4:4 RGB
  • Send the encoded material via TCP/UDP packets to a receiver/decoder - point-to-point transmission over ethernet or public internet
  • Decode from stream at source colorspace/bit-depth/resolution/subsampling - Rec.709/Rec.2020, 10-bit 4:2:2 YUV or 10/12-bit 4:4:4 RGB
  • Send stream to baseband video playout device (Blackmagic/AJA frame grabber) to display 10-bit YUV 4:2:2 or 10/12-bit 4:4:4 RGB material on external display
  • Latency requirements: sub 300 ms

Basic hardware layout: Video Source (Baseband Video) -> Capture device (DeckLink) -> SDI unpacking on GPU -> J2K Encoder on GPU -> Facility Firewall (IPsec VPN) -> Public Internet -> Remote Firewall (IPsec VPN) -> J2K Decoder on GPU -> SDI packing on GPU -> Output device (DeckLink) -> Video Display (Baseband Video)


  • HD-SDI or 3G-SDI frame grabbers: Blackmagic DeckLink 8K Pro, AJA Kona 4, AJA Kona 5
  • NVIDIA GPU: GeForce RTX 2070, Quadro RTX 4000 or better
  • OS: Windows-10 or Linux Ubuntu/CentOS
  • Frame Size: 1920×1080 (DCI 2K)
  • Frame Rates: 23.976, 24, 25, 29.97, 30 fps
  • Bit-depth: 8/10/12 (encode - ingest), 8/10/12 (decode - display)
  • Pixel formats: RGB or RGBA, v210, R12L
  • Frame compression: lossy or lossless
  • Colour Spaces for 8/10-bit YUV or 8/10/12-bit RGB: Rec.709, DCI-P3, P3-D65, Rec.2020 (optional)
  • Audio: 2-channel PCM or more

How to encode/decode J2K images fast?

CPU-based J2K codecs are quite slow. For example, if we consider FFmpeg-based software solutions, they are working with J2K codec from libavcodec (mj2k) or with OpenJPEG, which are far from being fast. Just test that software to check the latency and the performance. It's not surprizing, as soon as J2K algorithm has very high computational complexity. If we implement multiple threads/processes on CPU, the performance of J2K solution from libavcodec is still unsuffcient. This is the problem even for 8-bit frames with 2K resolution, though for 4K images (12-bit, 60 fps) the performance is much worse.

The reason why FFmpeg and other software are not fast at that task is obvious - they are working on CPU and they are not optimized to be high performance software. Here you can see benchmarks comparison for J2K encoding and decoding for OpenJPEG, Jasper, Kakadu, J2K-Codec, CUJ2K, Fastvideo codecs to check the performance for images with 2K and 4K resolutions (J2K lossy/lossless algorithms).

Maximum performance for J2K encoding and decoding at streaming applications could be achieved at multithreaded batch mode. This is a must to ensure massive parallel processing according to JPEG2000 algorithm. If we do batch processing, it means that we need to collect several images, which is not good for latency. If we implement batch with multithreading, it improves the performance, but the latency gets worse. This is actually a trade-off between performance and latency for the task of J2K encoding and decoding. For example, at remote color grading application we need minimum latency, so we need to process each J2K frame separately, without batch and without multithreading. Though in most cases it's better to choose acceptable latency and get the best performance with batch and multithreading.

Other info from Fastvideo about J2K and digital cinema applications

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.