The R FFI Implementation
Introduction
FastR can interface to native C and Fortran code in a number of ways, for example, access to C library APIs not supported by the Java JDK, access to LaPack functions, and the .Call
, .Fortran
, .C
builtins. Each of these are defined by a Java interface,e.g. CallRFFI
for the .Call
builtin. To facilitate experimentation and different implementations, the implementation of these interfaces is defined by a factory class, RFFIFactory
, that is chosen at run time via the fastr.ffi.factory.class
system property, or the FASTR_RFFI
environment variable.
The factory is responsible for creating an instance of the RFFI
interface that in turn provides access to implementations of the underlying interfaces such as CallRFFI
. This structure allows
for each of the individual interfaces to be implemented by a different mechanism. Currently the default factory class is TruffleNFI_RFFIFactory
which uses the Truffle NFI system to implement the transition to native code.
No native code mode
FastR can be configured to avoid running any unmanaged code coming from GNU R or packages. It is described in more detail here.
Native Implementation
The native implementation of the R FFI is contained in the fficall
directory of
the com.oracle/truffle.r.native
project. It's actually a bit more than that as it also contains code copied from GNU R, for example code that is sufficiently simple that it is neither necessary nor desirable to implement in Java. As this has evolved a better name for
fficallwould probably be
main`
for compatibility with GNU R.
There are five sub-directories in fficall/src
:
include
common
truffle_nfi
truffle_llvm
fficall/include
directory
The include
should be thought as analgous to GNU R's src/include
, i.e. internal headers needed by the code in src/main
.
What we are trying to do by redefining them here is provide a boundary so that we don`t accidently capture code from GNU R that
is specific to the implementation of GNU R that is different in FastR, e.g., the representation of R objects. Evidently not every
piece of GNU R code or an internal header has that characteristic but this strategy allows us some control to draw the boundary as
tight as possible. Obviously we want to avoid duplicating (copying) code, as this requires validating the copy when migrating GNU R versions,
so there are three levels of implementation choice for the content of the header in this directory:
- Leave empty. This allows a
#include
to succeed and, if code does not actually use any symbols from the header, is ok. - Indirect to the real GNU R header. This is potentially dangerous but a simple default for code that uses symbols from the header.
- Extract specific definitions from the GNU R header into a cut-down version. While this copies code it may be necessary to avoid unwanted aspects of the GNU R header. In principle this can be done by a "copy with sed" approach.
The indirection requires the use of the quote form of the #include
directive. To avoid using a path that is GNU R version dependent,
the file ``gnurheaders.mkprovides a make variable
GNUR_HEADER_DEFS` with a set of appropriate -`D CFLAGS`.
Ideally, code is always compiled in such a way that headers are never implicitly read from GNU R, only via the include
directory.
Unfortunately this cannot always be guaranteed as a directive of the form include "foo.h" (as opposed to include <foo.h>) in the
GNU R C code will always access a header in the same directory as the code being compiled. I.e., only the angle-bracket form can be controlled
by the -I
compiler flag. If this is a problem, the only solution is to "copy with sed" the .c
file and convert the quote form to the
angle bracket form.
common
directory
The common
contains code that has no explicit dependencies specific to Truffle NFI or Truffle LLVM and has been extracted for reuse in other implementations.
This code is mostly copied/included from GNU R. N.B. Some modified files have a _fastr
suffix to avoid a clash with an existing file in GNU R that would match
the Makefile rule for compiling directly from the GNU R file.
truffle_nfi
directory.
The truffle_nfi
contains the implementation that is based on the Truffle Native Function Interface.
truffle_llvm
directory
The truffle_llvm
contains the native side of the variant that is based on the Truffle LLVM implementation. It is described in more detail here
RFFI Initialization
Not all of the individual interfaces need to be instantiated on startup. The getXXXRFFI()
method of RFFI
is responsible for instantiating the XXX
interface (e.e.g Call
).
However, the factory can choose to instantiate the interfaces eagerly if desired. The choice of factory class is made by RFFIFactory.initialize()
which is called when the
initial RContext
is being created by PolyglotEngine
. Note that at this stage, very little code can be executed as the initial context has not yet been fully created and registered with PolyglotEngine
.
In general, state maintained by the RFFI
implementation classes is RContext
specific and to facilitate this RFFIFactory
defines a newContextState
method that is called by RContext
.
Again, at the point this is called the context is not active and so any execution that requires an active context must be delayed until the initialize
method is called on the ContextState
instance.
Typically special initialization may be required on the initialization of the initial context, such as loading native libraries, and also on the initialization of a SHARED_PARENT_RW
context kind.
Sharing data structures from Java with the native code
Every subclass of RObject
abstract class, notably RVector<ArrayT>
subclasses, may have a so called NativeMirror
object associated with it.
This object is created once the RObject
is passed to the native code. Initially FastR assigns a unique number (ID) to such RObject
and keeps this
ID in the NativeMirror
object, the ID is then passed to the native code as opaque pointer. The native code then may call R-API function,
e.g. Rf_eval
passing it an opaque pointer, this transitions back to Java and FastR finds RObject
corresponding to the value stored in the
opaque pointer and passes this RObject
the FastR implementation of Rf_eval
in JavaUpCallsRFFIImpl#Rf_eval
. Note that any opaque pointer can only
be obtained using R-API function, e.g. allocVec
, which up-calls to Java and FastR creates RObject
, corresponding NativeMirror
with ID and
passes that as the opaque pointer back to the native code.
Every subclass of RVector<ArrayT>
can be materialized into native memory. In such case, its data
field that normally holds a reference to
the managed data, e.g. int[]
, is set to null
and its NativeMirror
object will hold address to the off-heap data as a long
value.
All the operations, e.g. getDataAt
, on such vector will now reach to the native memory instead to the managed array.
This materialization happens, for example, when the native code calls INTEGER
R-API function, which is supposed to return a
pointer to the backing (native) array. The finalizer of the NativeMirror
object is responsible for freeing the native memory.