GPU Direct Minimizes FPGA-GPU Data Transfer Latency

Many our projects are based on GPU computations. We believe this approach can provide excellent performance in particular cases, such as real-time video recording and image processing. Nevertheless a certain improvement can be done on the hardware level as well.

Typically, a digital camera uses a two-stage approach to show data on monitor. At first, the data are written to CPU memory via the PCI-Express bus and only then data could be loaded to GPU (data transfer from CPU to GPU via PCIE). This double data transfer significantly affects latency. To prevent the loss of latency due to this issue, we have applied NVIDIA GPU Direct technology and created compatible PCI-Express driver. This technology allows us to send data from camera in DMA mode directly to GPU, bypassing CPU memory intermediate step.

In addition, we have designed two high-speed cameras with the PCI-Express interface. Cameras are based on Altera Cyclon IV FPGA with PCI-Express IP core. For PC connection we have used a PCI-Express or DisplayPort external cables and a PCIE x1/x4 passive adapter, so in standard approach all data go from image sensor, via FPGA, PCIE/DisplayPort cable, PCIE adapter, PCIE bus to PC RAM. With GPU Direct option we can do DMA data transfer from camera to GPU directly.

XIMEA company is also manufacturing PCI-Express cameras which are connected to PC via PCI-Express or optical cable with PCIE x2 or x4 Gen2 adapters. These cameras are compatible with GPU Direct technology as well, so they can send data via DMA, directly from camera FPGA to GPU memory, bypassing CPU. To get more info, please check XIMEA SDK for Linux.

XIMEA xiX cameras stream images to the host computer via 2 or 4 lanes on a PCI Express Gen2 bus. Together with minimum latencies and CPU load, the cameras are a perfect fit for embedded vision and multi-camera applications. Thanks to flat flex cabling, the board-level and semi-housed variants allow integration in tight spaces and close proximity between cameras.

In order to effect fast data transfer from PCIe camera to GPU memory over DMA, developer has to organize ring buffer on GPU to store incoming images from a camera. This is the way to insure minimum latency solution for image acquisition.

Software features

  • Data input: uncompressed 8/12-bit grayscale or bayer image with arbitrary resolution and high frame rate
  • Direct data transfer from PCIE camera to GPU memory over DMA
  • GPU software for realtime image processing: full image processing pipeline from Fastvideo SDK
  • Compatibility with NVIVIA GPUs Quadro/Tesla
  • Linux PCIe driver with GPU Direct option (Fastvideo, XIMEA)

PC test configuration

  • ASUS P6T Deluxe V2 LGA1355, Core i7 2600, 3.4 GHz, DDR-III 8 GB
  • GPU Quadro 6000
  • OS Linux Ubuntu/CentOS (64-bit), CUDA 10.1

For any further information on these solutions please contact us via email.

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.