Skip to content
Snippets Groups Projects
Commit f6cbd6f6 authored by Stepan Sindelar's avatar Stepan Sindelar
Browse files

[GR-2798] Update documentation.

PullRequest: fastr/1302
parents 1e395ae9 a4aaf1fc
No related branches found
No related tags found
No related merge requests found
...@@ -42,7 +42,7 @@ FastR is available in two forms: ...@@ -42,7 +42,7 @@ FastR is available in two forms:
FastR is intended eventually to be a drop-in replacement for GNU R. Currently, however, the implementation is incomplete. Notable limitations are: FastR is intended eventually to be a drop-in replacement for GNU R. Currently, however, the implementation is incomplete. Notable limitations are:
1. Graphics support: FastR supports only grid and grid-based packages, graphics package is not supported. The FastR grid package implementation is purely Java based, see its [documentation](documentation/graphics.md) for more details and limitations. 1. Graphics support: FastR supports only grid and grid-based packages, graphics package is not supported. The FastR grid package implementation is purely Java based, see its [documentation](documentation/graphics.md) for more details and limitations.
2. Many packages either do not install, particularly those containing native (C/C++) code, or fail tests due to bugs and limitations in FastR. In particular popular packages such as `data.table` and `Rcpp` currently do not work with FastR. 2. Some packages either do not install, or fail tests due to bugs and limitations in FastR. In particular support for popular packages such as `data.table` and `Rcpp` is work in progress.
## Running FastR ## Running FastR
...@@ -73,8 +73,8 @@ prior to the build. These are: ...@@ -73,8 +73,8 @@ prior to the build. These are:
The xz package, version 5.2.2 or later The xz package, version 5.2.2 or later
The curl package, version 7.50.1 or later The curl package, version 7.50.1 or later
If any of these are missing the GNU R build will fail which will cause the FastR build to fail also. If the build fails, more details can be found in `gnur_configure.log` If any of these are missing the GNU R build will fail which will cause the FastR build to fail also. If the build fails, more details can be found in log files in
file in the `com.oracle.truffle.r.native/gnur/R-{version}` directory. Note that your system may have existing installations of these packages, possibly in standard system locations, the `libdownloads/R-{version}` directory. Note that your system may have existing installations of these packages, possibly in standard system locations,
but older versions. These must either be upgraded or newer versions installed with the package manager on your system. Since different systems use different package but older versions. These must either be upgraded or newer versions installed with the package manager on your system. Since different systems use different package
managers some of which install packages in directories that are not scanned by default by the C compiler and linker, it may be necessary to inform the build of these managers some of which install packages in directories that are not scanned by default by the C compiler and linker, it may be necessary to inform the build of these
locations using the following environment variables: locations using the following environment variables:
......
...@@ -2,9 +2,8 @@ ...@@ -2,9 +2,8 @@
This document describes the process of building FastR with a focus on the GNUR integration. This document describes the process of building FastR with a focus on the GNUR integration.
The description is organized in a top-down manner beginning with outlining the `mx build` command and The description is organized in a top-down manner beginning with outlining the `mx build` command and
then delving into individual scripts that patch and build parts of GNUR. then delving into individual scripts that patch and build parts of GNUR. Last section describes the
"release" build used to create the final release artifact including recommended packages.
See also [building](building.md), [release](../../com.oracle.truffle.r.release/README.md)
## `mx build` ## `mx build`
...@@ -146,12 +145,7 @@ _Other required sources_: ...@@ -146,12 +145,7 @@ _Other required sources_:
It builds `libR` and optionally (JNI) `libjniboot`. It builds `libR` and optionally (JNI) `libjniboot`.
See: See also: [ffi](ffi.md).
* [ffi](ffi.md)
* [managed ffi](managed_ffi.md)
* [truffle llvm ffi](truffle_llvm_ffi.md)
* [truffle nfi](truffle_nfi.md)
The `FASTR_RFFI` variable controls which version of FFI is build: `managed` (i.e. no native), `llvm` and `nfi`. The `FASTR_RFFI` variable controls which version of FFI is build: `managed` (i.e. no native), `llvm` and `nfi`.
...@@ -215,6 +209,10 @@ GNUR library (binary) files to the FastR library directory. It also defines a co ...@@ -215,6 +209,10 @@ GNUR library (binary) files to the FastR library directory. It also defines a co
The package sources are compiled and linked into the corresponding dynamic library (`<package>.so`). The package sources are compiled and linked into the corresponding dynamic library (`<package>.so`).
Finally and optionally (Darwin, non-LLVM), the library is installed using the system tools. Finally and optionally (Darwin, non-LLVM), the library is installed using the system tools.
For each package the source and header files copied from GNUR can be identified by looking at
the git history of `gnur` branch. How and if those were patched can be found out in git history
of the `master` branch. Following packages have some special handling or caveats worth mention.
#### Package `base` #### Package `base`
In the pre-build stage, it changes GnuR's build script `$(GNUR_HOME_BINARY)/src/library/base/makebasedb.R` In the pre-build stage, it changes GnuR's build script `$(GNUR_HOME_BINARY)/src/library/base/makebasedb.R`
...@@ -234,86 +232,15 @@ _Patched files_: ...@@ -234,86 +232,15 @@ _Patched files_:
_Other required sources_: _Other required sources_:
* The headers reachable from `$(GNUR_HOME)/src/library/graphics`
* C sources from `$(GNUR_HOME)/src/library/graphics/src`: `base.c, graphics.c, init.c, par.c, plot.c, plot3d.c, stem.c`
* The headers defined in `fficall/src/include/gnurheaders.mk` * The headers defined in `fficall/src/include/gnurheaders.mk`
#### Package `grDevices` #### Package `grDevices`
_Other required sources_: _Other required sources_:
* The header files reachable from `$(GNUR_HOME)/src/library/grDevices`
* `$(GNUR_HOME)/src/main/gzio.h` * `$(GNUR_HOME)/src/main/gzio.h`
* All Cairo C sources: `$(GNUR_HOME)/src/library/grDevices/src/cairo/*.c`
* Other C sources from `$(GNUR_HOME)/src/library/grDevices/src`: `axis_scales.c, chull.c, colors.c, devCairo.c, devPS.c, devPicTeX.c, devQuartz.c, devices.c, init.c, stubs.c`
* The headers defined in `fficall/src/include/gnurheaders.mk` * The headers defined in `fficall/src/include/gnurheaders.mk`
#### Package `grid`
_Patched files_:
* `grid.c`, `state.c` using sed (`sed_grid`, `sed_state`)
_Other required sources_:
* `grid.h`
* `gpar.c, just.c, layout.c, matrix.c, register.c, unit.c, util.c, viewport.c`
#### Package `methods`
_Other required sources_:
* `init.c`
* `methods.h`
#### Package `parallel`
_Patched files_:
* `glpi.h`, `rngstream.c`
_Other required sources_:
* `init.c`
* `parallel.h`
#### Package `splines`
_Patched files_:
* `splines.c`
#### Package `stats`
_Patched files_:
* `fft.c` using `ed_fft`
* `modreg.h`, `nls.h`, `port.h`, `stats.h`, `ts.h`
_Other required sources_:
* Fortan sources: `bsplvd.f, bvalue.f, bvalus.f, eureka.f, hclust.f, kmns.f, lminfl.f, loessf.f, ppr.f, qsbart.f, sgram.f, sinerp.f, sslvrg.f, stl.f, stxwx.f`
* C sources: `init.c, isoreg.c, kmeans.c, loessc.c, monoSpl.c, sbart.c`
* All headers
#### Package `tools`
_Patched files_:
* `gramRd.c`
_Other required sources_:
* `init.c`
* `tools.h`
#### Package `utils`
_Other required sources_:
* `init.c`
* `utils.h`
### Building `run` ### Building `run`
This build prepares the FastR directory structure mimicking that of GNUR. It creates and This build prepares the FastR directory structure mimicking that of GNUR. It creates and
...@@ -340,7 +267,22 @@ _Other required sources_: ...@@ -340,7 +267,22 @@ _Other required sources_:
* `$(GNUR_HOME)/doc/*` (processed by `configure`) * `$(GNUR_HOME)/doc/*` (processed by `configure`)
* From `$(GNUR_HOME)/share/`: directories `R, Rd, make, java, encodings` * From `$(GNUR_HOME)/share/`: directories `R, Rd, make, java, encodings`
## Installing recommended packages ## Release build
The *FASTR_RELEASE* mx distribution is built only when `FASTR_RELEASE` environment variable is exported.
The building logic for *FASTR_RELEASE* resides in Python class `ReleaseBuildTask` and the output is a jar
file that if unzipped contains a stand-alone FastR distribution including everything that is needed to
run FastR.
This build requires `PKG_LDFLAGS_OVERRIDE` environment variable, for example on MacOS
export PKG_LDFLAGS_OVERRIDE=-L/opt/local/lib
or on some Linux distributions
export PKG_LDFLAGS_OVERRIDE="\"-L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu/\""
### Installing recommended packages
Note: This build resides in a separate project: `com.oracle.truffle.r.native.recommended`. Note: This build resides in a separate project: `com.oracle.truffle.r.native.recommended`.
......
# Introduction
This section contains more information regarding the build process. The `mx build` command will build both the Java projects and the native projects.
# Details on Building the Native Code
## Building GNU R
The `com.oracle.truffle.r.native/gnur` directory contains the `Makefile` for building GNU R in such a way that
parts are reusable by FastR. The GNU R source code is downloaded by [TODO]
It is a multi-step process to build GNU R in such a way that FASTR can use some of the libraries.
After building GNU R we extract configuration information for use in building packages in the FastR environment.
This goes into the file `platform.mk`, which is included in the `Makefile``s for the standard packages built for FastR.
The main change is to define the symbol `FASTR` to ensure that some important modifications to `Rinternals.h` are made
(e.g. changing a `SEXP` to a `void*`).
## Building the Standard GNU R Packages
This directory tree contains the default packages for FastR. Most packages contain native (C/Fortran) code that
must be recompiled for FastR to ensure that the FFI calls are handled correctly. The regenerated `package.so` file overwrites
the file in the `library/package/libs` directory; otherwise the directory contents are identical to GNU R.
As far as possible the native recompilation reference the corresponding source files in the `com.oracle.truffle.r.native/gnur`
directory. In a few cases these files have to be modified but every attempt it made to avoid wholesale copy of GNU R source files.
Note that `datasets` doesn`t actually have any native code, but it is convenient to store it here to mirror GNU R.
# PACKAGE TEST DOCUMENTATION
## INSTALLED PACKAGES CACHE
### Description
Avoids re-installing of packages for every test. Packages are cached for a specific native API version, i.e., checksum of the native header files.
Directory structure:
- pkg-cache-dir
--+- version.table
--+- libraryVERSION0
----+- packageArchive0.gz
----+- packageArchive1.gz
----+- ...
--+- libraryVERSION1
----+- packageArchive0.gz
----+- packageArchive1.gz
----+- ...
--+- ...
The API checksum must be provided because we do not want to rely on some R package to compute it.
### Usage
Run `mx pkgtest --cache-pkgs version=<checksum>,dir=<pkg-cache-dir>,size=<cache-size>`, e.g.
```
mx pkgtest --cache-pkgs version=730e109bd7a8a32b1cb9d9a09aa2325d2430587ddbc0c38bad911525,dir=/tmp/cache_dir
```
The `version` key specifies the API version to use, i.e., a checksum of the header files of the native API (mandatory, no default).
The `pkg-cache-dir` key specifies the directory of the cache (mandatory, no default).
The `size` key specifies the number of different API versions for which to cache packages (optional, default=`2L`).
### Details
The version must be provided externally such that the R script does not rely on any package.
The version must reflect the native API in the sense that if two R runtimes have the same native API version, then the packages can be used for both runtimes.
...@@ -210,4 +210,40 @@ To debug why a test fails requires first that the package is installed locally p ...@@ -210,4 +210,40 @@ To debug why a test fails requires first that the package is installed locally p
First, note that, by default, the `installpkgs` command itself introduces an extra level on sub-process in order to avoid a failure from aborting the entire install command when installing/testing multiple packages. You can see this by setting the environment variable `FASTR_LOG_SYSTEM` to any value. The first sub-process logged will be running the command `com.oracle.truffle.r.test.packages/r/install.package.R` and the second will be the one running `R CMD INSTALL --install-tests` of the digest package. For ease of debugging you can set the `--run-mode` option to `internal`, which executes the first phase of the install in the process running `installpkgs`. Similar considerations apply to the testing phase. By default a sub-process is used to run the `com.oracle.truffle.r.test.packages/r/test.package.R script`, which then runs the actual test using a sub-process to invoke `R CMD BATCH`. Again the first sub-process can be avoided using `--run-mode internal`. N.B. If you run the tests for `digest` you will see that there are four separate sub-processes used to run different tests. The latter three are the specific tests for digest that were made available by installing with `--install-tests`. Not all packages have such additional tests. Note that there is no way to avoid the tests being run in sub-processes so setting the `-d` option to the `installpkgs` command will have no effect on those. Instead set the environment variable `MX_R_GLOBAL_ARGS=-d` which will cause the sub-processes to run under the debugger. Note that you will not (initially) see the `Listening for transport dt_socket at address: 8000` message on the console, but activating the debug launch from the IDE will connect to the sub-process. First, note that, by default, the `installpkgs` command itself introduces an extra level on sub-process in order to avoid a failure from aborting the entire install command when installing/testing multiple packages. You can see this by setting the environment variable `FASTR_LOG_SYSTEM` to any value. The first sub-process logged will be running the command `com.oracle.truffle.r.test.packages/r/install.package.R` and the second will be the one running `R CMD INSTALL --install-tests` of the digest package. For ease of debugging you can set the `--run-mode` option to `internal`, which executes the first phase of the install in the process running `installpkgs`. Similar considerations apply to the testing phase. By default a sub-process is used to run the `com.oracle.truffle.r.test.packages/r/test.package.R script`, which then runs the actual test using a sub-process to invoke `R CMD BATCH`. Again the first sub-process can be avoided using `--run-mode internal`. N.B. If you run the tests for `digest` you will see that there are four separate sub-processes used to run different tests. The latter three are the specific tests for digest that were made available by installing with `--install-tests`. Not all packages have such additional tests. Note that there is no way to avoid the tests being run in sub-processes so setting the `-d` option to the `installpkgs` command will have no effect on those. Instead set the environment variable `MX_R_GLOBAL_ARGS=-d` which will cause the sub-processes to run under the debugger. Note that you will not (initially) see the `Listening for transport dt_socket at address: 8000` message on the console, but activating the debug launch from the IDE will connect to the sub-process.
### INSTALLED PACKAGES CACHE
#### Description
Avoids re-installing of packages for every test. Packages are cached for a specific native API version, i.e., checksum of the native header files.
Directory structure:
- pkg-cache-dir
--+- version.table
--+- libraryVERSION0
----+- packageArchive0.gz
----+- packageArchive1.gz
----+- ...
--+- libraryVERSION1
----+- packageArchive0.gz
----+- packageArchive1.gz
----+- ...
--+- ...
The API checksum must be provided because we do not want to rely on some R package to compute it.
#### Usage
Run `mx pkgtest --cache-pkgs version=<checksum>,dir=<pkg-cache-dir>,size=<cache-size>`, e.g.
```
mx pkgtest --cache-pkgs version=730e109bd7a8a32b1cb9d9a09aa2325d2430587ddbc0c38bad911525,dir=/tmp/cache_dir
```
The `version` key specifies the API version to use, i.e., a checksum of the header files of the native API (mandatory, no default).
The `pkg-cache-dir` key specifies the directory of the cache (mandatory, no default).
The `size` key specifies the number of different API versions for which to cache packages (optional, default=`2L`).
#### Details
The version must be provided externally such that the R script does not rely on any package.
The version must reflect the native API in the sense that if two R runtimes have the same native API version, then the packages can be used for both runtimes.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment