Low-latency software for remote collaborative post production
Fastvideo company is a team of professionals in GPU image processing, realtime camera applications, digital cinema, high performance imaging solutions. Fastvideo has been helping production companies for quite a long time and recently we've implemented low-latency software to offer collaborative post production.
Today, with restrictions on in-person collaboration, delays in shipping and limitations on travel, single point of ingest and delivery for an entire production becomes vitally important. The main goal is to offer all services both on-premises and remotely. We believe that in the near future we will see virtual and distributed post production finishing.
When you are shooting a movie at tight schedule and you need to accelerate your post production workflow, then remote collaborative approach is a right solution. You don't need to have all professionals on-site, via remote approach you can collaborate at realtime wherever your teammates are located. Industry trend to remote production solutions is clear and it happens not just due to the coronavirus. The idea to accelerate post via remote operation is viable and companies strive to remove various limitations of conventional workflow - now the professionals could choose a place and a time to work remotely on post production.
Nowadays, there are quite a lot of software solutions to offer reliable remote access via local networks or via public internet. Still, most of them were built without an idea about professional usage in tasks like colour grading, VFX, compositing and much more. In post production we need to utilize professional hardware which could visualize 10-bit or 12-bit footages. Skype, ZOOM and many other video conference solutions are not capable of doing that, so we've implemented the software to solve that matter.
Business goals to achieve at remote collaborative post production
Goals from technical viewpoint
Task to be solved
Post industry needs low-latency, high quality video encode/decode solution for remote work according to the following pipeline:
Basic hardware layout: Video Source (Baseband Video) -> Capture device (DeckLink) -> SDI unpacking on GPU -> J2K Encoder on GPU -> Facility Firewall (IPsec VPN) -> Public Internet -> Remote Firewall (IPsec VPN) -> J2K Decoder on GPU -> SDI packing on GPU -> Output device (DeckLink) -> Video Display (Baseband Video)
How to encode/decode J2K images fast?
CPU-based J2K codecs are quite slow. For example, if we consider FFmpeg-based software solutions, they are working with J2K codec from libavcodec (mj2k) or with OpenJPEG, which are far from being fast. Just test that software to check the latency and the performance. It's not surprizing, as soon as J2K algorithm has very high computational complexity. If we implement multiple threads/processes on CPU, the performance of J2K solution from libavcodec is still unsuffcient. This is the problem even for 8-bit frames with 2K resolution, though for 4K images (12-bit, 60 fps) the performance is much worse.
The reason why FFmpeg and other software are not fast at that task is obvious - they are working on CPU and they are not optimized to be high performance software. Here you can see benchmarks comparison for J2K encoding and decoding for OpenJPEG, Jasper, Kakadu, J2K-Codec, CUJ2K, Fastvideo codecs to check the performance for images with 2K and 4K resolutions (J2K lossy/lossless algorithms).
Maximum performance for J2K encoding and decoding at streaming applications could be achieved at multithreaded batch mode. This is a must to ensure massive parallel processing according to JPEG2000 algorithm. If we do batch processing, it means that we need to collect several images, which is not good for latency. If we implement batch with multithreading, it improves the performance, but the latency gets worse. This is actually a trade-off between performance and latency for the task of J2K encoding and decoding. For example, at remote color grading application we need minimum latency, so we need to process each J2K frame separately, without batch and without multithreading. Though in most cases it's better to choose acceptable latency and get the best performance with batch and multithreading.
Other info from Fastvideo about J2K and digital cinema applications