Is the nxlib shared library threadsafe?

Hi,
we have been using NxLib in a Java application for several years. After upgrading to Java 17, our application now crashes on RedHat Linux without any logs or error messages. We suspect that this might be related to how Java 17 and RHEL handle access to native shared libraries. There may be a race condition or unsafe concurrent memory access that causes the OS to immediately terminate the process.

Our question:
Is the native NxLib shared library internally thread-safe, or does it require external synchronization when accessed from multiple threads?

Hi,

the NxLib is designed to be thread safe and can be accessed from multiple threads.

Are you sure that the NxLib is the source of the problem? In this case, do you have a minimal example of how we can reproduce this?

Kind regards

Joel

Hi Joel,
thanks for the immediate reply.
No, I’m not 100% sure that it is an issue of the camera access: There are some evidence that it might be somewhere in the nxLib. But it occurs only under certain circumstances:

  • The old software under Java 11 worked under RHEL. The new based on Java 17 works under Windows and Debian.
  • I wrote a small test program that successfully takes a hunderts of images under RHEL and Java 17
  • As soon as the camera connection is managed by a more complex camera handler in a service framework it can only take images (1-4).
  • However the service without camera in a simulation mode with images loaded from filesystem works without any issues.
  • We also made a test with a virtual camera which was successful.
  • In general the crashes are silent with no error log. This could be a sign that it is an issue of access to the native library. Seems that RHEL is more strict then Debian when it comes to native memory access. If there is a violation the process is killed immediately.
  • With special parameters we got a hs_err_pid log. It states libNxLib64.so as the source of error: "# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (17.0.9+9, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)

Problematic frame:

C [libNxLib64.so+0xd7a950]"

A unsynchronized access to the native lib or the memory managed by the lib is one possible error source. However if the native lib is threadsafe it gets more unlikely.

FYI: I created a file based camera (from a N46 and an IDS color camera) and that worked too in the service scenario. Is there a difference how nxLib is managing the memory access compared to a real camera?

Hello Dennis,

the cameras indeed handle memory and thread synchronization quite differenlty, so a test with a file camera will not reveal concurrency bugs in the implementation of the different hardware cameras, but that information alone might make our search easier.

Can you tell me which NxLib version you are using and the models name of the IDS color camera? We did fix a race in our UEye camera code that maybe would have explained your crashes, but that was three years ago.

Have you setup you Linux so it generates a core dump? For me that usually involves running ulimit -c unlimited and changing /proc/sys/kernel/core_pattern to point to some file. If you could provide us with a core dump, that would help us greatly. I can send you an upload link via DM if you need one.

Regards,
Raphael

Hi Raphael,
we are using a N46-1202-16-BL and a UI558xSE-C as color camera. Currently we are running NxLib 3.5.1419 (but we did an unsuccessful test with the newest NxLib as well) and uEye driver 4.96.3985.
I will see if I get the core dumps and maybe the newest nxLib installed again.

Hello Dennis,

just as a fyi, the fix of the previous data race condition from 3 years ago is already part of the NxLib 3.5.1419.

Regards,

Joel

I found some new strange behavior:
I have an N35 and an N46. If I run both together with an ueye color camera the Java VM’s are dying immediately after a measurement is triggered (each Ensenso sensor is running together with a color camera in one VM).
If I run the N-sensors without the color cameras I can do multiple images. Now the very strange behavior: the NxView is open (not connected to any camera - just open), I close my Java based connection - the camera is closed (it is displayed as “available” in NxView) and the nxLibFinalize method was successfully executed. I restart the Java connection and it crashes with the first image. When the NxView is not running and I restart the Java connection then I can take multiple images without any problems. This is reproducable: I tried both variant (NxView on/off) several times and it was always the same.

It seems like the NxView prevents RHEL from clearing the handle on the shared library or something.

I will write more tests. Fore further analysis: is there any debug version of the libNxLib64.so? Anything that shows the actual memory access or something?

Hi Dennis,

thanks for the hints. I’ll inspect this on Monday.

Regards,

Joel

Hi Dennis,

we do have a debug version of the NxLib, but only for internal usage. I need more information to make progress on this topic. Either a core dump (we are happy to provide file server access for uploading if required so that the core dump is not published in the forum, send me a direct message then). Otherwise I would try to reproduce this behaviour, but I would need more information about the calling system, environment of the client programmes, versions of the OS (both host and Docker if used) and versions of the other components involved. Which commands are called in which threads on the NxLib etc.? It may then be possible to generate a minimal example of the crash on our site. A simple test by me did not trigger the problem.

Thank you

Joel