Working with colour and depth

jamesb · May 7, 2025, 12:21pm

Hi!
I have a C57 and I’m trying to integrate the nxLib SDK into our software using C++.
I have a few questions:

We need both colour and depth frames. When initialising the camera, do we have to find the serial number for the stereo device and run cmdOpen with that as a parameter, then run another cmdOpen with [serial]-Color? Then when streaming does cmdCapture work with both depth and colour or do we need to do something for each? Do we have to get the depth frame from the depth device and colour frame from the colour device? Sorry if I missed a sample or guide that explains this, I have tried looking but not found anything.
For various reasons (our software is used with multiple 3D camera SDKs) we handle frame alignment and projection/deprojection ourselves so we need the colour and depth intrinsics, and the extrinsics (rotation & translation) between them. I wanted to check which intrinsics are correct for the depth image (ie, what we should use to deproject depth to points) as there are several potential options in the tree: Monocular/Left or Right and Stereo/Left or Right, all of which have slightly different values.
I have the colour->depth extrinsics from [colour device]/ResolvedLink (hopefully that is correct). Is [stereo device]/ResolvedLink the depth->colour extrinsics?

Thanks!
James

daniel.saier · May 8, 2025, 7:45am

Hi,

Yes, the stereo and color devices are separate and you can use them independently in the API. If you want to use both, you can specify both serials to the commands (most commands also take a list of cameras, so you can handle both with a single execution). The images and 3D data are inside of the respective camera nodes.

You can find some general information on the two devices for a C-series camera in this guide.
The reprojection matrix for the disparity map is in Calibration/Dynamic/Stereo/Reprojection and the camera matrix for the left rectified images is in Calibration/Dynamic/Stereo/Left/Camera. Also see this guide for the pixel layout of the different stereo camera data.

Unfortunately the description of the calibration nodes in the manual is not very clear at the moment. To summarize the difference between the different nodes of the stereo calibration:
- Monocular nodes are for raw images and Stereo nodes are for rectified images / 3D data.
- Dynamic includes the dynamic recalibration and nodes outside of it are the original calibration.
I am not sure if I understand the question correctly here.

<Serial>-Color/Link contains the relative position of the color sensor to the stereo camera (assuming a factory calibration where the color sensor link targets the stereo camera).

The ResolvedLink nodes contain the workspace (world) position of each device. For the color sensor this is the same as above by default, since the stereo camera does not have a link by default. If you change its link you can move both the stereo camera and its color sensor together, which would affect both their ResolvedLink. Also see this guide on the general concept of the link tree in the NxLib.

jamesb · June 3, 2025, 3:43pm

Hi Daniel, thanks for your reply, sorry it’s taken a while to get back to you.

I think I have all the intrinsic/extrinsic stuff sorted now, thanks for that.

I’m hitting another roadblock now. When I run getBinaryDataInfo on the colour image it just spits out zeroes, though it works for the depth image. Here’s the relevant code:

// Open stereo & colour devices
NxLibCommand openCmd(cmdOpen);

std::string camString = "[\"" + deviceSerialNumber + "\",\"" + _colourSerialNumber + "\"]";
openCmd.parameters()[itmCameras].setJson(camString, true);
openCmd.execute();

NxLibItem colourCamera = NxLibItem()[itmCameras][_colourSerialNumber];
NxLibItem depthCamera = NxLibItem()[itmCameras][_serialNumber];

// Set colour camera to trigger with depth
colourCamera[itmParameters][itmCapture][itmTriggerMode] = "Internal";

std::vector<std::byte> colourBuffer;
std::vector<std::byte> depthBuffer;

while(true)
{
	NxLibCommand(cmdCapture).execute();

	// Compute disparity map (this also computes the rectfied images)
	NxLibCommand(cmdComputeDisparityMap).execute();

	// Get image from each device
	NxLibItem colourImage = colourCamera[itmImages][itmRectified];
	NxLibItem depthImage = depthCamera[itmImages][itmDisparityMap];

	// Get frame info
	int colourWidth, colourHeight, colourChannels, colourBytesPerElement;
	colourImage.getBinaryDataInfo(&colourWidth, &colourHeight, &colourChannels, &colourBytesPerElement, nullptr, nullptr);
	colourImage.getBinaryData(colourBuffer, 0);

	int depthWidth, depthHeight, depthChannels, depthBytesPerElement;
	depthImage.getBinaryDataInfo(&depthWidth, &depthHeight, &depthChannels, &depthBytesPerElement, nullptr, nullptr);
	depthImage.getBinaryData(depthBuffer, 0);
}

I found a doc saying to set the colour TriggerMode to Internal which I’ve done, do I need to add any parameters to the capture command or something else too to make it capture both depth and colour images?

Also, for the disparity data, should it be interpreted as 16-bit float values representing the Z distance (in mm?) for each pixel? Or are they int/uints which need to be multiplied by a depth scale factor to get the z values? If the latter, where do we get the factor from?

Thanks!
James

daniel.saier · June 4, 2025, 6:57am

Hi,

setting the TriggerMode is correct, although you could use the constant valInternal instead of the string literal.

The problem with your code is that the color camera has acquired a raw image, which you could read from /Cameras/<Color>/Images/Raw. But the Rectified one (which for a color camera only does the undistortion) was not computed yet. For the stereo camera, rectification is done implicitly by cmdComputeDisparityMap (and it is good practice to only run that command for improved performance), but for the color camera, you have to run the undistortion explicitly using the cmdRectifyImages command. See this guide for the available image types and the commands computing them.

Also, for the disparity data, should it be interpreted as 16-bit float values representing the Z distance (in mm?) for each pixel? Or are they int/uints which need to be multiplied by a depth scale factor to get the z values? If the latter, where do we get the factor from?

The disparity map contains “disparities”, i.e. the offset between left and right rectified image in pixels. You have two options here:

Compute the PointMap using cmdComputePointMap. It is a 3 channel float image containing X, Y, Z values, which are easy to use.
Use the disparity values and reproject them to 3D points yourself using the reprojection matrix. This is mainly a performance optimization when you want to combine reprojection with additional processing or to reduce the amount of data to send to the next processing step. If you want to do this, please note the fixed-point format and the special value for an invalid disparity described in the documentation.

jamesb · June 20, 2025, 3:41pm

Hi Daniel, thanks for your help up to now.

I’ve noticed that the points from cmdComputePointMap come out with a strange rotation and translation applied - I’m guessing this is due to the transformation to world coordinates mentioned here - is it possible to set a parameter to leave the point cloud in the camera coordinate system? If not do you provide any helper functions to apply transforms to points or do we have to do that manually? I’m guessing we will have to apply the inverse of the transformation in the Link node of the camera to get back?

The reason I’m asking is to fit in with the way our software works with other camera systems we need just the z values (which I get from the point map) then later on we reconstruct x & y using the intrinsics (which I’m getting from stereoCameraBaseNode[itmCalibration][itmDynamic][itmStereo][itmLeft]) and currently I’m trying to compare our calculated xyz points with the ones we get from the camera to check accuracy and am getting wildly different values - I think because of the world transformation.

Thanks again and sorry for all these questions, I am trying to read and join together all the various docs to piece together how to use this camera in a similar way to the other ones we support but it’s quite confusing!

daniel.saier · June 23, 2025, 5:58am

Hi,

yes, cmdComputePointMap applies the camera link in addition to the reprojection. By default this should be an identity transformation, but it can be different if you performed a workspace calibration.

To change this you can:

Clear the camera’s link in the NxView workspace calibration section (or manually by setting it to an identity in the tree and running cmdStoreCalibration).
If you still need the link for another purpose you can also run cmdComputePointMap with the parameter Relative set to true. I just saw that this is actually not documented, but it exists and skips applying the transformation.