Skip to content
Snippets Groups Projects
Commit 58b84044 authored by Mick Jordan's avatar Mick Jordan
Browse files

Merge pull request #539 in G/fastr from ~MICK.JORDAN_ORACLE.COM/fastr:feature/doc to master

* commit '5d5a6d49':
  move more README files into documentation, add fledgling section on project structure
  doc: add developer section
  doc: add deeloper section
  update README, add documentation/.project
parents 0284e9fb 5d5a6d49
No related branches found
No related tags found
No related merge requests found
......@@ -30,7 +30,7 @@ FastR is primarily aimed at long-running applications. The runtime performance b
Building FastR from source is supported on Mac OS X (El Capitan onwards), and various flavors of Linux.
FastR uses a build tool called `mx` (cf `maven`) which can be downloaded from [here](http://github.com/graalvm/mx).
`mx` manages software in _suites_, which are normally one-to-one with a `git` repository. FastR depends fundamentally on the [truffle](http://github.com/graalvm/truffle) suite. However, performance also depends on the [graal compiler](http://github.com/graalvm/graal-core) as without it, FastR operates in interpreted mode only. The conventional way to arrange the Git repos (suites) is as siblings in a parent directory, which we will call `FASTR_HOME`.
`mx` manages software in _suites_, which are normally one-to-one with a `git` repository. FastR depends fundamentally on the [truffle](http://github.com/graalvm/truffle) suite. However, performance also depends on the [Graal compiler](http://github.com/graalvm/graal-core) as without it, FastR operates in interpreted mode only. The conventional way to arrange the Git repos (suites) is as siblings in a parent directory, which we will call `FASTR_HOME`.
## Pre-Requisites
FastR shares some code with GnuR, for example, the default packages and the Blas library. Therefore, a version of GnuR (currently
......@@ -74,7 +74,14 @@ Use the following sequence of commands to download and build an interpreted vers
$ cd fastr
$ mx build
The build will clone the Truffle repository and also download various required libraries.
The build will clone the Truffle repository and also download various required libraries, including GNU R, which is built first. Any problems with the GNU R configure step likely relate
to dependent packages, so review the previous section. For FastR development, GNU R only needs to be built once, but an `mx clean` will, by default remove it. This can be prevented by setting
the `GNUR_NOCLEAN` environment variable to any value.
It is possible to build FastR in "release mode" which builds and installs the GNU R "recommended" packages and also creates a `fastr-release.jar` file that contains everything that is needed to
run FastR, apart from a Java VM. In particular it captures the package dependencies, e.g., `pcre` and `libgfortran`, so that when the file is unpacked on another system it will work regardless of whether the packages are installed on that system. For some systems that depend on FastR, e.g., GraalVM, it is a requirement to build in release mode as they depend on this file. To build in release mode, set the `FASTR_RELEASE` environment variable to any value. Note that this can be done at any time without doing a complete clean and rebuild. Simply set the variable and execute `mx build`.
## Running FastR
After building, running the FastR console can be done either with `bin/R` or with `mx r` or `mx R`. Using `mx` makes available some additional options that are of interest to FastR developers.
FastR supports the same command line arguments as R, so running an R script is done with `bin/R -f <file>` or `bin/Rscript <file>`.
......@@ -85,7 +92,7 @@ FastR supports the same command line arguments as R, so running an R script is d
## Further Documentation
Further documentation on FastR, its limitations and additional functionality is [here](Index.md).
Further documentation on FastR, its limitations and additional functionality is [here](documentation/Index.md).
## Contributing
......
fficall contains the implementation of the R FFI, as described in https://cran.r-project.org/doc/manuals/r-release/R-exts.html.
It's actually a bit more than that as it also contains code copied from GnuR, for example that supports graphics or is sufficiently
simple that it is neither necessary nor desirable to implement in Java. As this has evolved a better name for 'fficall' would be 'main'
for compatibility with GnuR.
There are four sub-directories:
include
common
jni
variable_defs
include
=======
'include' should be thought as analgous to GnuR's src/include, i.e. internal headers needed by the code in 'src/main'.
What are trying to do by redefining them here is provide a boundary so that we don't accidently capture code from GnuR that
is specific to the implementation of GnuR that is different in FastR, e.g., the representation of R objects. Evidently not every
piece of GnuR code or an internal header has that characteristic but this strategy allows us some control to draw the boundary as
tight as possible. Obviously we want to avoid duplicating (copying) code, as this requires validating the copy when migrating GnuR versions,
so there are three levels of implementation choice for the content of the header in this directory:
* Leave empty. This allows a #include to succeed and, if code does not actually use any symbols from the header, is ok.
* Indirect to the real GnuR header. This is potentially dangerous but a simple default for code that uses symbols from the header.
* Extract specific definitions from the GnuR header into a cut-down version. While this copies code it may be necessary
to avoid unwanted aspects of the GnuR header. In principle this can be done by a 'copy with sed' approach.
The indirection requires the use of the quote form of the #include directive. To avoid using a path that is GnuR version dependent,
the file gnurheaders.mk provides a make variable GNUR_HEADER_DEFS with a set of appropriate -D CFLAGS.
Ideally, code is always compiled in such a way that headers never implicitly read from GnuR, only via the 'include' directory.
Unfortunately this cannot always be guaranteed as a directive of the form include "foo.h" (as opposed to include <foo.h>) in the
GnuR C code will always access a header in the same directory as the code being compiled. I.e., only the angle-bracket form can be controlled
by the -I compiler flag. If this is a problem, the only solution is to 'copy with sed' the .c file and convert the quote form to the
angle bracket form.
common
======
'common' contains code that has no explicit JNI dependencies and has been extracted for reuse in other implementations. This code is mostly
copied/included from GnuR. N.B. Some modified files have a "_fastr" suffix to avoid a clash with an existing file in GnuR that would match
the Makefile rule for compiling directly from the GnuR file.
jni
===
'jni' contains the implementation that is based on and has explicit dependencies on Java JNI.
The R FFI is rather baroque and defined in large set of header files in the 'include' directory that is a sibling of 'fficall'.
In GnuR, the implementation of the functions is spread over the GnuR C files in 'src/main'. To ease navigation of the FastR implementation,
in general, the implementation of the functions in a header file 'Rxxx.h' is stored in the file 'Rxxx.c'.
The points of entry from Java are defined in the file rfficall.c. Various utility functions are defined in rffiutils.{h,c}.
variable_defs
=============
The GnuR FFI defines a large number of (extern) variables the defintiions of which, in GnuR, are scattered across the source files.
In FastR these are collected into one file, variable_defs.h. However, the actual initialization of the variables is, in general, implementation
dependent. In order to support a JNI and a non-JNI implementation, the file is stored in a seperate directory.
This is a multi-step process to build GnuR in such a way that FASTR can use some of the libraries.
After building GnuR we extract configuration information for use in building packages in the FastR environment.
This goes into the file platform.mk, which is included in the Makefile's for the standard packages built for FastR.
The main change is to define the symbol FASTR to ensure that some important modifications to Rinternals.h are made
(e.g. changing an SEXP to a JNI jobject).
The header files that are included when compiling the code of native packages in the FastR environment.
The starting position is that these files are identical to those in GnuR and that the FastR implementation
differences are entirely encapsulated in the method implementations in the fficall library. It is TBD whether
this can be completely transparent but,if not, the goal would be for minimal changes to the standard header files.
This directory tree contains the default packages for FastR. Each package directory contains a '.gz' file that was
created from the corresponding GnuR 'library' directory, plus necessary C source and header files, most notably 'init.c',
also copied from GnuR. Since these files reference functions in the GnuR implementation, 'init.c' is recompiled
in the FastR environment and the resulting '.so' replaces the one from the '.gz' file in the FastR 'library' directory.
Absolutely minimal changes are made to the C source, typically just to define (as empty functions), rather than reference,
the C functions that are passed to R_registerRoutines. This step is still necesssary in FastR as it causes R symbols that are'
referenced in the R package code to become defined.
Note that 'datasets' and 'fastr' don't actually have any native code, but it is convenient to store them here. Note also that
'fastr', obviously, does not originate from GnuR, so its build process is completely different.
Given that we only support MacOS/Linux, it is expedient to just store the tar'ed content of the GnuR library directories
for those targets as 'source' files in the distribution. In time, when FastR can create packages directly, the build will
change to work that way.
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>FastR Documentation</name>
<comment></comment>
<projects>
<project>mx</project>
<project>mx.graal</project>
<project>mx.jvmci</project>
</projects>
<buildSpec>
<buildCommand>
<name>org.python.pydev.PyDevBuilder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.python.pydev.pythonNature</nature>
</natures>
</projectDescription>
......@@ -4,3 +4,4 @@
[Limitations](Limitations.md)
[For Developers](dev/Index.md)
# FastR Developer Documentation
## Index
* [Project Structure](structure.md)
* [Building](building.md)
* [R FFI Implementation](ffi.md)
# Introduction
This section contains more information regarding the build process. The `mx build` command will build both the Java projects and the native projects.
# Details on Building the Native Code
## Building GNU R
The `com.oracle.truffle.r.native/gnur` directory contains the `Makefile` for building GNU R in such a way that
parts are reusable by FastR. The GNU R source code is download by
It is a multi-step process to build GNU R in such a way that FASTR can use some of the libraries.
After building GNU R we extract configuration information for use in building packages in the FastR environment.
This goes into the file `platform.mk`, which is included in the `Makefile``s for the standard packages built for FastR.
The main change is to define the symbol `FASTR` to ensure that some important modifications to `Rinternals.h` are made
(e.g. changing a `SEXP` to a `void*`).
## Building the Standard GNU R Packages
This directory tree contains the default packages for FastR. Most packages contain native (C/Fortran) code that
must be recompiled for FastR to ensure that the FFI calls are handled correctly. The regenerated `package.so` file overwrites
the file in the `library/package/libs` directory; otherwise the directory contents are identical to GNU R.
As far as possible the native recompilation reference the corresponding source files in the `com.oracle.truffle.r.native/gnur`
directory. In a few case these files have to be modified but every attempt it made to avoid wholesale copy of GNU R source files.
Note that `datasets` doesn`t actually have any native code, but it is convenient to store it here to mirror GNU R.
# The R FFI Implementation
# Introduction
The implementation of the [R FFI](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) is contained in the `fficall` directory of
the `com.oracle/truffle.r.native` project`. It`s actually a bit more than that as it also contains code copied from GNU R, for example that supports graphics or is sufficiently
simple that it is neither necessary nor desirable to implement in Java. As this has evolved a better name for `fficall` would probably be `main`
for compatibility with GNU R.
There are four sub-directories in `fficall/src`:
* `include`
* `common`
* `variable_defs`
* `jni`
## The `fficall/include` directory
`include` should be thought as analgous to GNU R's `src/include`, i.e. internal headers needed by the code in `src/main`.
What we are trying to do by redefining them here is provide a boundary so that we don`t accidently capture code from GNU R that
is specific to the implementation of GNU R that is different in FastR, e.g., the representation of R objects. Evidently not every
piece of GNU R code or an internal header has that characteristic but this strategy allows us some control to draw the boundary as
tight as possible. Obviously we want to avoid duplicating (copying) code, as this requires validating the copy when migrating GNU R versions,
so there are three levels of implementation choice for the content of the header in this directory:
* Leave empty. This allows a `#include` to succeed and, if code does not actually use any symbols from the header, is ok.
* Indirect to the real GNU R header. This is potentially dangerous but a simple default for code that uses symbols from the header.
* Extract specific definitions from the GNU R header into a cut-down version. While this copies code it may be necessary to avoid unwanted aspects of the GNU R header. In principle this can be done by a "copy with sed" approach.
The indirection requires the use of the quote form of the `#include` directive. To avoid using a path that is GNU R version dependent,
the file ``gnurheaders.mk` provides a make variable `GNUR_HEADER_DEFS` with a set of appropriate -`D CFLAGS`.
Ideally, code is always compiled in such a way that headers are never implicitly read from GNU R, only via the `include` directory.
Unfortunately this cannot always be guaranteed as a directive of the form include "foo.h" (as opposed to include <foo.h>) in the
GNU R C code will always access a header in the same directory as the code being compiled. I.e., only the angle-bracket form can be controlled
by the `-I` compiler flag. If this is a problem, the only solution is to "copy with sed" the `.c` file and convert the quote form to the
angle bracket form.
## The `common` directory
`common` contains code that has no explicit JNI dependencies and has been extracted for reuse in other implementations. This code is mostly
copied/included from GNU R. N.B. Some modified files have a `_fastr` suffix to avoid a clash with an existing file in GNU R that would match
the Makefile rule for compiling directly from the GNU R file.
## The `variable_defs` directory
The GNU R FFI defines a large number of (extern) variables the definitions of which, in GNU R, are scattered across the source files.
In FastR these are collected into one file, `variable_defs.h`. However, the actual initialization of the variables is, in general, implementation
dependent. In order to support a JNI and a non-JNI implementation, the file is stored in a separate directory.
## The `jni` directory
`jni` contains the implementation that is based on and has explicit dependencies on Java JNI. It is described in more detail [here](jni_ffi.md)
# Notes on the JNI implementation
# Introduction
The R FFI is rather baroque and defined in large set of header files in the `include` directory that is a sibling of `fficall`.
In GNU R, the implementation of the functions is spread over the GNU R C files in `src/main`. To ease navigation of the FastR implementation,
in general, the implementation of the functions in a header file `Rxxx.h` is stored in the file `Rxxx.c`.
The points of entry from Java are defined in the file `rfficall.c`. Various utility functions are defined in `rffiutils.{h,c}`.
## JNI References
Java object values are passed to native code using JNI local references that are valid for the duration of the call. The reference protects the object from garbage collection. Evidently if native code holds on to a local reference by storing it in a native variable,
that object might be collected, possibly causing incorrect behavior (at best) later in the execution. It is possible to convert a local reference to a global reference that preserves the object across multiple JNI calls but this risks preventing objects from being collected. The global variables defined in the R FFI, e.g. R_NilValue are necessarily handled as global references. However, by default, other values are left as local references, although this can be changed by setting the variable alwaysUseGlobal in rffiutils.c to a non-zero value.
that object might be collected, possibly causing incorrect behavior (at best) later in the execution. It is possible to convert a local reference to a global reference that preserves the object across multiple JNI calls but this risks preventing objects from being collected. The global variables defined in the R FFI, e.g. `R_NilValue` are necessarily handled as global references. Other values are left as local references, with some risk that native code might capture a value that would then be collected once the call completes.
## Vector Content Copying
The R FFI provides access to vector contents as raw C pointers, e.g., int *. This requires the use of the JNI functions to access/copy the underlying data. In addition it requires that multiple calls on the same SEXP always return the same raw pointer.
The R FFI provides access to vector contents as raw C pointers, e.g., `int *`. This requires the use of the JNI functions to access/copy the underlying data. In addition it requires that multiple calls on the same SEXP always return the same raw pointer.
Similar to the discussion on JNI references, the raw data is released at the end of the call. There is currently no provision to retain this data across multiple JNI calls.
# Introduction
The FastR codebase is structured around IDE `projects`, which are contained in directories beginning with `com.oracle.truffle.r`.
The expectation is that source code will be viewed and edited in an IDE (we will use Eclipse as the example) and the `mx` tool
has support for automatically generating the IDE project metadata via the `ideinit` command. N.B. if you run this before you have built the system with `mx build`
do not be surprised that it will compile some Java classes. It does this to gather information about Java annotation processors that is necessary for
correct rebuilding within the IDE.
The majority of the projects are "Java" projects, but any project with `native` in its name contains native code, e.g. C code, and is (ultimately) built
using `make`. `mx` handles this transparently. Note, however, that editing and building the native code in an IDE requires support for C development to have
been installed. E.g. for Eclipse, the CDE plugin.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment