Going Native - Foreign Function & Memory API (FFM)

Be not the first by whom the new are tried, nor yet the last to lay the old aside.

Alexander Pope

When I started doing research for my post on JNI, I heard about some newfangled thing called the Foreign Function and Memory API (FFM). Apparently it does all the same things as JNI, but purely in Java code, so you have all the conveniences of modern Java development without all the hassles of compiling and linking two different languages and getting them to play nicely together. After finishing my experiments in JNI, therefore I was excited to give it a try.

For a refresher on the native matrix library, see the section The native code in the introduction to this series.

Background

The concepts in the FFM have been kicking around for several Java versions, going back at least to Java 17. However, it's nearly finalized in Java 24, although the native-accessing code is still marked experimental and give warnings when compiled without specific flags (--enable-native-access=ALL-UNNAMED).

There are several blog posts about using FFM, but they all seem to copy the same examples on the official Java website. Thus I was truly on my own this time.

An aside about AI programming aids

Well, not completely on my own. I made extensive use of AI programming aids during this project, particularly a couple of installations of ChatGPT. I have been slow to get on the AI train, and I am still highly skeptical of many of the claims that are made about it. But I freely admit that I could not have completed this project or the JNI project without its help. There is just so much detailed, obscure, and esoteric knowledge about compiling, linking, tool flags, and platform idiosyncrasies that no person can know it all. While my Google searching skills are decent, I don't believe I could have found the answers I needed within the bounds of my patience in order to bring this to a conclusion. While ChatGPT is not perfect (it is limited by published APIs and documentation and can get confused about the requirements of different software versions), it was definitely a big help to me!

The Arena

The basic idea of FFM is that you take over the management of native memory in Java code instead of native code. This starts with an Arena, which can be opened and disposed of in a try block like any other try-with resource. Also within the Java code you can lay out the memory of structs you'll be using.

 1GroupLayout RASHUNAL_LAYOUT = MemoryLayout.structLayout(
 2    JAVA_INT.withName("numerator"),
 3    JAVA_INT.withName("denominator")
 4);
 5
 6GroupLayout GAUSS_FACTORIZATION_LAYOUT = MemoryLayout.structLayout(
 7    ADDRESS.withName("PI"),
 8    ADDRESS.withName("L"),
 9    ADDRESS.withName("D"),
10    ADDRESS.withName("U")
11);
12
13try (Arena arena = Arena.ofConfined()) {
14...    
15}

MemoryLayout is an interface with static methods to lay out primitives, structs, arrays, and other entities. The Arena object is then used to allocate blocks of native memory using a layout as a map.

 1int[][][] data = <data read from input>;
 2int height = data.length;
 3int width = data[0].width;
 4int elementCount = height * width;
 5
 6long elementSize = RASHUNAL_LAYOUT.byteSize();
 7long elementAlign = RASHUNAL_LAYOUT.byteAlignment();
 8long totalBytes = elementSize * (long)elementCount;
 9MemorySegment elems = arena.allocate(totalBytes, elementAlign);
10long numOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("numerator"));
11long denOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("denominator"));
12for (int i = 0; i < elementCount; ++i) {
13    int row = i / width;
14    int col = i % width;
15    int[] element = data[row][col];
16    int numerator = element[0];
17    int denominator = element.length == 1 ? 1 : element[1];
18    MemorySegment elementSlice = elems.asSlice(i * elementSize, elementSize);
19    elementSlice.set(JAVA_INT, numOffset, numerator);
20    elementSlice.set(JAVA_INT, denOffset, denominator);
21}

Before with the JNI this was all done in C. Now it's all being done in Java code. It's a lot of steps, and it gets down pretty far into the weeds, but there are advantages to doing it all in Java. Pick your poison.

Native methods are retrieved into the Java code as method handles. They are retrieved by making downcalls (from Java to native methods) on a Linker object. To make the downcall you need the full signature of the native method, with the return value of the call first.

1Linker linker = Linker.nativeLinker();
2SymbolLookup lookup = OpenNativeLib("rmatrix", arena); // I'll come back to this later
3MemorySegment newRMatrixLocation = lookup.find("new_RMatrix").getOrThrow();
4MethodHandle new_RMatrix_handle = linker.downcallHandle(newRMatrixLocation, FunctionDescriptor.of(ADDRESS, JAVA_LONG, JAVA_LONG, ADDRESS));

After getting a Linker object, the native library needs to be opened and brought into the JVM. OpenNativeLib is a static method I wrote on the utility class this code is coming from, and I'll come back to its details later.

linker.downcallHandle accepts a MemorySegment, a FunctionDescriptor, and a variable-length list of Linker.Options. It returns a MethodHandle that can be used to call into native methods.

The SymbolLookup returned by OpenNativeLib is used to search the native library for methods and constants. It's a simple name lookup, and returns an Option<MemorySegment> with whatever it finds.

The FunctionDescriptor is fairly self-explanatory: it's the signature of a native method with constants from java.lang.foreign.ValueLayout representing the return value and the arguments (return value first, followed by arguments). ADDRESS is a general value for a C pointer. new_RMatrix accepts longs representing the height and width of the matrix to be constructed, a pointer to an array of Rashunals, and returns a pointer to the newly-allocated RMatrix.

Once the handle for new_RMatrix is in hand, it can be called to allocate a new RMatrix:

1new_RMatrix_handle.invoke((long) height, (long) width, elems);
2// compiles, but blows up when run

Not so fast! elems represents an array of Rashunal structs laid out in sequence in native memory. But what new_RMatrix expects is a pointer to an array of Rashunal pointers, not the array of Rashunals themselves. So that array of pointers also needs to be constructed:

1MemorySegment ptrArray = arena.allocate(ADDRESS.byteSize() * elementCount, ADDRESS.byteAlignment());
2for (int i = 0; i < elementCount; ++i) {
3    MemorySegment elementAddr = elems.asSlice(i * elementSize, elementSize);
4    ptrArray.setAtIndex(ADDRESS, i, elementAddr);
5}
6MemorySegment nativeRMatrix = new_RMatrix_handle.invoke((long) height, (long) width, ptrArray);

In a similar way, I got handles to RMatrix_gelim to factor the input matrix and RMatrix_height, RMatrix_width, and RMatrix_get to get information about the four matrices in the factorization. There was one wrinkle when getting information about structs returned by pointer from these methods:

1MemorySegment factorZero = (MemorySegment) RMatrix_gelim_handle.invoke(rmatrixPtr);
2MemorySegment factor = factorZero.reinterpret(GAUSS_FACTORIZATION_LAYOUT.byteSize(), arena, null);
3long piOffset = GAUSS_FACTORIZATION_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("PI"));
4...
5MemorySegment piPtr = factor.get(ADDRESS, piOffset);
6...

When a native method returns a pointer to a struct, the handle returns a zero-length memory segment that has no information about the struct pointed to by that memory. It needs to be reinterpreted as the struct itself using the MemoryLayout that corresponds to the struct. Then the struct can be interpreted using offsets in the reverse of the process used to set data.

Then I worked on the code to translate them back to Java objects:

 1long numeratorOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("numerator"));
 2long denominatorOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("denominator"));
 3long height = (long) RMatrix_height_handle.invoke(mPtr);
 4long width = (long) RMatrix_width_handle.invoke(mPtr);
 5JRashunal[] data = new JRashunal[Math.toIntExact(height * width)];
 6for (long i = 1; i <= height; ++i) {
 7    for (long j = 1; j <= width; ++j) {
 8        MemorySegment elementZero = (MemorySegment) RMatrix_get_handle.invoke(mPtr, i, j);
 9        MemorySegment element = elementZero.reinterpret(RASHUNAL_LAYOUT.byteSize(), arena, null);
10        int numerator = element.get(JAVA_INT, numeratorOffset);
11        int denominator = element.get(JAVA_INT, denominatorOffset);
12        data[Math.toIntExact((i - 1) * width + (j - 1))] = new JRashunal(numerator, denominator);
13    }
14}
15JRashunalMatrix jrm = new JRashunalMatrix(Math.toIntExact(height), Math.toIntExact(width), data);

The offsets are the memory offset within the struct of the field of interest, in this case, the numerator and denominator of the Rashunal struct. In this way I was able to complete a round trip from Java objects to native code and back.

So how do you load the native code? I thought it would be as simple as the guides say.

1var lookup = SymbolLookup.libraryLookup("rmatrix", arena);

Unfortunately, that's not the way it turned out. Many ChatGPT questions and answers followed, but apparently there is a big difference between SymbolLookup.libraryLookup and

1System.loadLibrary("jnirmatrix");

which is how I loaded the native library compiled from the JNI header. That used C tools to find rmatrix and rashunal, which are well-understood and have stood the test of time.

According to ChatGPT, System.loadLibrary does a lot of additional work on behalf of the programmer, including formatting library names correctly, looking for code in platform-specific locations, and handling symlinks. FFM deliberately dials back on that, so SymbolLookup.libraryLookup only calls Java code to load libraries. The Javadoc for SymbolLookup.libraryLookup says it defers to dlopen on POSIX systems and LoadLibrary on Windows systems. This searches the path and some environment variables for libraries, but does none of the name enhancements (libLib.so or libLib.dylib or lib.dll) that System.loadLibrary does. This made a bad first impression, but system-specific code turns out to be the way to do it in .NET too, so it's not too bad. /usr/local/lib is on the search path in Linux, but I installed the libraries in a nonstandard location on Windows, so I had to add those to PATH.

 1String osSpecificLibrary;
 2String osName = System.getProperty("os.name");
 3if (osName.contains("Linux")) {
 4    osSpecificLibrary = "lib" + library + ".so";
 5} else if (osName.contains("Mac OS")) {
 6    osSpecificLibrary = "lib" + library + ".dylib";
 7} else if (osName.contains("Windows")) {
 8    osSpecificLibrary = library + ".dll";
 9} else {
10    throw new IllegalStateException("Unsupported OS: " + osName);
11}
12return SymbolLookup.libraryLookup(osSpecificLibrary, arena);
1> $env.PATH+=";C:/Users/john.todd/local/rmatrix/bin/rmatrix.dll;C:/Users/john.todd/local/rashunal/bin/rashunal.dll"
2> ./gradlew ...

Trying to get this to work on a Mac was an odyssey on its own. Modern versions of MacOS (since OS X El Capitan) have something called System Integrity Protection (SIP), which the developers in Cupertino have wisely put into place to protect us all from ourselves. The Google AI answer for "what is sip macos" says it "Prevents unauthorized code execution: SIP prevents malicious software from running unauthorized code on your Mac", which I guess includes loading dependent libraries from the JVM.

I could load RMatrix using an absolute path to the dylib, but I couldn't load Rashunal from there because RMatrix uses rpaths (relative paths?) to refer to libraries it depends on. rpaths can be supplied in other situations (like the JNI application) by DYLD_LIBRARY_PATH or DYLD_FALLBACK_LIBRARY_PATH, but SIP restricts that from working in certain contexts, such as the JVM (invoked in a particular way). After many big detours into rewriting rpaths to loader_paths or absolute paths and granting the JVM entitlements that allowed loading paths from DYLD_LIBRARY_PATH I finally discovered that java and /usr/bin/java on my Mac are not the same as /Library/Java/JavaVirtualMachines/jdk-24.jdk/Contents/Home/bin/java. Specifically, the first two have the SIP restrictions, but the last one doesn't and it just works with the osSpecificLibrary defined above. Having already spent a lot of time trying to discover how to bypass SIP I wasn't going to look any further into how to get the /usr/bin/java shim to work. So the following command worked from the command line in Mac. Gradle could probably be convinced to do it too, but it didn't by default and I wasn't interested in investigating this any further.

 1$ /Library/Java/JavaVirtualMachines/jdk-24.jdk/Contents/Home/bin/java \
 2  -cp app/build/classes/java/main \
 3  --enable-native-access=ALL-UNNAMED \
 4  org.jtodd.ffm.ffmrmatrix.App \
 5  /Users/john/workspace/rmatrix/driver/example.txt
 6Input matrix:
 7[ {-2} {1/3} {-3/4} ]
 8[ {6} {-1} {8} ]
 9[ {8} {3/2} {-7} ]
10
11
12PInverse:
13[ {1} {0} {0} ]
14[ {0} {0} {1} ]
15[ {0} {1} {0} ]
16
17
18Lower:
19[ {1} {0} {0} ]
20[ {-3} {1} {0} ]
21[ {-4} {0} {1} ]
22
23
24Diagonal:
25[ {-2} {0} {0} ]
26[ {0} {17/6} {0} ]
27[ {0} {0} {23/4} ]
28
29
30Upper:
31[ {1} {-1/6} {3/8} ]
32[ {0} {1} {-60/17} ]
33[ {0} {0} {1} ]

Cleaning up

Like Java's good old garbage collector, the Arena will clean up any memory directly allocated in it, like the Rashunal array or the pointer array in the code segments above. But memory that is allocated in the native code is opaque to the Java code, and will leak if it's not cleaned up. To do that, you need handles to any library-specific cleanup code or to the stdlib free method. FFM has a special Linker method to look up the language standard libraries, and note the special-purpose FunctionDescriptor.ofVoid method to describe native methods that return void:

1MemorySegment freeRMatrixLocation = lookup.find("new_RMatrix").orElseThrow();
2MethodHandle freeRMatrixHandle = linker.downcallHandle(newRMatrixLocation, FunctionDescriptor.ofVoid(ADDRESS));
3
4var clib = linker.defaultLookup();
5MemorySegment freeLocation = clib.find("free").orElseThrow();
6MethodHandle freeHandle = linker.downcallHandle(freeLocation, FunctionDescriptor.ofVoid(ADDRESS));
7
8freeRMatrixHandle.invoke(rmatrixPtr);
9freeHandle.invoke(rashunalElement);

I briefly looked at using Valgrind to verify that I wasn't leaking anything further. Apparently the JVM itself spawns a lot of false (?) alarms. I grepped the output for any mentions of librmatrix or librashunal and didn't find any, so hopefully this approach doesn't leak too badly.

Reflection

My first impression of FFM was pretty bad. I had to do a lot more investigating and ChatGPT querying to get this to work on all my platforms than I did with JNI. I'm not sure if any further improvements to Java, FFM, or the operating systems will take away some of the pain. Maybe just time, experience, and more bloggers will make this easier for future developers.

It is nice being able to write all your marshaling and unmarshaling code in a single language, rather than having to write both Java and C code to do it. Nevertheless, an FFM developer still needs to keep C concepts in mind, particularly freeing natively-allocated memory and linking to the libraries. But that seems to be the common thread when connecting to native code.

Code repository

https://github.com/proftodd/GoingNative/tree/main/ffm_rmatrix

This post was originally hosted at https://the-solitary-programmer.blogspot.com/2025/09/going-native-foreign-function-memory.html.

Posts in this series