@@ -41,7 +41,7 @@ FastR is available in two forms:
FastR is intended eventually to be a drop-in replacement for GNU R. Currently, however, the implementation is incomplete. Notable limitations are:
1. Graphics support is mostly missing, limited to output to the "pdf" device.
1. Graphics support: FastR supports only grid and grid-based packages, graphics package is not supported. The FastR grid package implementation is purely Java based, see its [documentation](documentation/graphics.md) for more details and limitations.
2. Many packages either do not install, particularly those containing native (C/C++) code, or fail tests due to bugs and limitations in FastR. In particular popular packages such as `data.table` and `Rcpp` currently do not work with FastR.
## Running FastR
...
...
@@ -56,7 +56,7 @@ FastR is primarily aimed at long-running applications. The runtime performance b
Building FastR from source is supported on Mac OS X (El Capitan onwards), and various flavors of Linux.
FastR uses a build tool called `mx` (cf `maven`) which can be downloaded from [here](http://github.com/graalvm/mx).
`mx` manages software in _suites_, which are normally one-to-one with a `git` repository. FastR depends fundamentally on the [truffle](http://github.com/graalvm/truffle) suite. However, performance also depends on the [Graal compiler](http://github.com/graalvm/graal) as without it, FastR operates in interpreted mode only. The conventional way to arrange the Git repos (suites) is as siblings in a parent directory, which we will call `FASTR_HOME`.
`mx` manages software in _suites_, which are normally one-to-one with a `git` repository. FastR depends fundamentally on the [truffle](http://github.com/graalvm/graal) suite. However, performance also depends on the [Graal compiler](http://github.com/graalvm/graal) as without it, FastR operates in interpreted mode only. The conventional way to arrange the Git repos (suites) is as siblings in a parent directory, which we will call `FASTR_HOME`.
## Pre-Requisites
FastR shares some code with GnuR, for example, the default packages and the Blas library. Therefore, a version of GnuR (currently
FastR can interface to native C and Fortran code in a number of ways, for example, access to C library APIs not supported by the Java JDK, access to LaPack functions, and the `.Call`, `.Fortran`, `.C` builtins. Each of these are defined by a Java interface,e.g. `CallRFFI` for the `.Call` builtin. To facilitate experimentation and different implementations, the implementation of these interfaces is defined by a factory class, `RFFIFactory`, that is chosen at run time via the `fastr.ffi.factory.class` system property, or the `FASTR_RFFI` environment variable.
The factory is responsible for creating an instance of the `RFFI` interface that in turn provides access to implementations of the underlying interfaces such as `CallRFFI`. This structure allows
for each of the individual interfaces to be implemented by a different mechanism. Currently the default factory class is `JNI_RFFIFactory` which uses the Java JNI system to implement the transition to native code.
for each of the individual interfaces to be implemented by a different mechanism. Currently the default factory class is `TruffleNFI_RFFIFactory` which uses the Truffle NFI system to implement the transition to native code.
# No native code mode
FastR can be configured to avoid running any unmanaged code coming from GNU R or packages. It is described in more detail [here](managed_ffi.md).
# Native Implementation
The native implementation of the [R FFI](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) is contained in the `fficall` directory of
the `com.oracle/truffle.r.native` project`. It's actually a bit more than that as it also contains code copied from GNU R, for example that supports graphics or is sufficiently
the `com.oracle/truffle.r.native` project`. It's actually a bit more than that as it also contains code copied from GNU R, for example code that is sufficiently
simple that it is neither necessary nor desirable to implement in Java. As this has evolved a better name for `fficall` would probably be `main`
for compatibility with GNU R.
There are five sub-directories in `fficall/src`:
* `include`
* `common`
* `jni`
* `truffle_nfi`
* `truffle_llvm`
...
...
@@ -44,16 +43,12 @@ by the `-I` compiler flag. If this is a problem, the only solution is to "copy w
angle bracket form.
## The `common` directory
`common` contains code that has no explicit JNI dependencies and has been extracted for reuse in other implementations. This code is mostly
copied/included from GNU R. N.B. Some modified files have a `_fastr` suffix to avoid a clash with an existing file in GNU R that would match
`common` contains code that has no explicit dependencies specific to Truffle NFI or Truffle LLVM and has been extracted for reuse in other implementations.
This code is mostly copied/included from GNU R. N.B. Some modified files have a `_fastr` suffix to avoid a clash with an existing file in GNU R that would match
the Makefile rule for compiling directly from the GNU R file.
## The `jni` directory
`jni` contains the implementation that is based on and has explicit dependencies on Java JNI. It is described in more detail [here](jni_ffi.md). This is the default implementation.
## The `truffle_nfi` directory.
`truffle_nfi` contains the implementation that is based on the Truffle Native Function Interface. It is enabled by setting `FASTR_RFFIU=nfi` and doing a clean build.
The implementation is currently incomplete.
`truffle_nfi` contains the implementation that is based on the Truffle Native Function Interface.
## The `truffle_llvm` directory
...
...
@@ -64,4 +59,23 @@ Not all of the individual interfaces need to be instantiated on startup. The `ge
However, the factory can choose to instantiate the interfaces eagerly if desired. The choice of factory class is made by `RFFIFactory.initialize()` which is called when the
initial `RContext` is being created by `PolyglotEngine`. Note that at this stage, very little code can be executed as the initial context has not yet been fully created and registered with `PolyglotEngine`.
In general, state maintained by the `RFFI` implementation classes is `RContext` specific and to facilitate this `RFFIFactory` defines a `newContextState` method that is called by `RContext`. Again, at the point this is called the context is not active and so any execution that requires an active context must be delayed until the `initialize` method is called on the `ContextState` instance. Typically special initialization may be required on the initialization of the initial context, such as loading native libraries, and also on the initialization of a `SHARED_PARENT_RW` context kind.
In general, state maintained by the `RFFI` implementation classes is `RContext` specific and to facilitate this `RFFIFactory` defines a `newContextState` method that is called by `RContext`.
Again, at the point this is called the context is not active and so any execution that requires an active context must be delayed until the `initialize` method is called on the `ContextState` instance.
Typically special initialization may be required on the initialization of the initial context, such as loading native libraries, and also on the initialization of a `SHARED_PARENT_RW` context kind.
# Sharing data structures from Java with the native code
Every subclass of `RObject` abstract class, notably `RVector<ArrayT>` subclasses, may have a so called `NativeMirror` object associated with it.
This object is created once the `RObject` is passed to the native code. Initially FastR assigns a unique number (ID) to such `RObject` and keeps this
ID in the `NativeMirror` object, the ID is then passed to the native code as opaque pointer. The native code then may call R-API function,
e.g. `Rf_eval` passing it an opaque pointer, this transitions back to Java and FastR finds `RObject` corresponding to the value stored in the
opaque pointer and passes this `RObject` the FastR implementation of `Rf_eval` in `JavaUpCallsRFFIImpl#Rf_eval`. Note that any opaque pointer can only
be obtained using R-API function, e.g. `allocVec`, which up-calls to Java and FastR creates `RObject`, corresponding `NativeMirror` with ID and
passes that as the opaque pointer back to the native code.
Every subclass of `RVector<ArrayT>` can be materialized into native memory. In such case, its `data` field that normally holds a reference to
the managed data, e.g. `int[]`, is set to `null` and its `NativeMirror` object will hold address to the off-heap data as a `long` value.
All the operations, e.g. `getDataAt`, on such vector will now reach to the native memory instead to the managed array.
This materialization happens, for example, when the native code calls `INTEGER` R-API function, which is supposed to return a
pointer to the backing (native) array. The finalizer of the `NativeMirror` object is responsible for freeing the native memory.
The R FFI is rather baroque and defined in large set of header files in the `include` directory that is a sibling of `fficall`.
In GNU R, the implementation of the functions is spread over the GNU R C files in `src/main`. To ease navigation of the FastR implementation,
in general, the implementation of the functions in a header file `Rxxx.h` is stored in the file `Rxxx.c`.
The points of entry from Java are defined in the file `rfficall.c`. Various utility functions are defined in `rffiutils.{h,c}`.
## JNI References
Java object values are passed to native code using JNI local references that are valid for the duration of the call. The reference protects the object from garbage collection. Evidently if native code holds on to a local reference by storing it in a native variable,
that object might be collected, possibly causing incorrect behavior (at best) later in the execution. It is possible to convert a local reference to a global reference that preserves the object across multiple JNI calls but this risks preventing objects from being collected. The global variables defined in the R FFI, e.g. `R_NilValue` are necessarily handled as global references. Other values are left as local references, with some risk that native code might capture a value that would then be collected once the call completes.
## Vector Content Copying
The R FFI provides access to vector contents as raw C pointers, e.g., `int *`. This requires the use of the JNI functions to access/copy the underlying data. In addition it requires that multiple calls on the same SEXP always return the same raw pointer.
Similar to the discussion on JNI references, the raw data is released at the end of the call. There is currently no provision to retain this data across multiple JNI calls.