Fast Multi Camera Usage

Hi everyone,

I want to use multiple Ensenso cameras in my C++ software and keep the frame rate as high as possible.

Is the best approach to create one (or two) threads per camera and follow the example here: https://manual.ensenso.com/latest/guides/multithreading/parallelcaptureandprocess.html and just adding the camera ID to the NxLibCommand? Or is there a better way to set up a fast multi-camera system?

Is there an example how this is done in the NxView software?

Most NxLib commands are parallelized internally, so whether you create your own threads or not doesn’t make much of a difference for performance. It is more about getting the timing correct for when each command should run. But that’s possible to achieve from a single user thread.

For a high framerate it is most important to make sure that the next image is getting captured and transmitted while the stereo matching is running. This can be achieved in different ways and is the case for both examples of the guide you linked.

The principle is perhaps easier to understand by using separate Trigger and Retrieve commands instead of a single Capture command. The basic loop then looks something like this:

Trigger
while(true) {
    Retrieve
    Trigger // Trigger the next frame.
    // Do the processing while the images for the next frame are transmitted.
    ComputeDisparityMap
}

Extending this for multiple cameras depends on whether they should be synchronized or not.

  • When the cameras should be synchronized (e.g. because they are hardware triggered or you want to process their data with the same frame rate) you can use the same loop and simply include multiple cameras in each command. This is exactly what NxView does.
  • When the cameras should run independently (each one as fast as possible) the easiest way is to create a user thread for each of them and run the loops independently. As you already noted, in this case you have to specify the camera for each of the NxLibCommands.

Thank you very much! I’ve now implemented it according to the principle you suggested with the while loop, but using one thread per camera to dynamically start and stop cameras (since we sometimes use different cameras simultaneously).

I have another question: Is there a way to directly obtain the depth image? I’m currently using a second processing thread (synchronized with semaphores) that works as shown in the code below. However, copying from a PointMap into the depth image (where I only use the z-values) unnecessarily costs a lot of performance. Is there a more efficient way to obtain the depth image?

NxLibCommand computeDisparityMap(cmdComputeDisparityMap);
computeDisparityMap.parameters()[itmCamera] = cameraSerial;
computeDisparityMap.execute();
                    
NxLibCommand computePointMap(cmdComputePointMap);
computePointMap.parameters()[itmCamera] = cameraSerial;
computePointMap.execute();

std::vector<float> pointMap;
int width, height;
double timestamp;

camera[itmImages][itmPointMap].getBinaryDataInfo(&width, &height, nullptr, nullptr, nullptr, nullptr);
camera[itmImages][itmPointMap].getBinaryData(pointMap, &timestamp);

pc->depth = new uint16_t[width * height];

for(int y = 0; y < height; ++y){
    for(int x = 0; x < width; ++x){
        pc->depth[x + y * width] = pointMap[(x + y * width) * 3 + 2];
    }
}       

Since we further process the point cloud on the GPU, is it also possible to get the pointer to the CUDA point cloud / depth image in order to completely avoid copying (download to RAM, uploading to VRAM)?

Is there a way to directly obtain the depth image?

Not with the ComputePointMap command. If you wanted to use the functionality of RenderPointMap, it has a Z-only mode, but this would cause additional overhead for your application.

is it also possible to get the pointer to the CUDA point cloud / depth image

Not at the moment.


For your use case maybe the following way would be interesting:

  • Skip ComputePointMap and get the disparity map instead.
  • Copy the disparity map back to the GPU. It is smaller than the point map (16 bit fixed point instead of three float channels), so this will be much faster.
  • Reproject the points yourself. This is very fast on the GPU when you don’t need to copy the result back.
    • For reprojecting you need to apply the reprojection matrix and the camera’s link if it has one.
    • Invalid pixels in the disparity map have the value NxLibInvalidDisparityScaled and should not be reprojected.