binsegRcpp inside a C++ program
As a part of the RcppDeepState project, which has been graciously funded by R Consortium, we are creating an easy way to use DeepState to fuzz test R packages with C++ code defined using Rcpp. DeepState requires the C++ programmer to define a “test harness” which is C++ source code without a main function, but with tests/expectations. The DeepState framework then compiles that test harness to an executable (with a main function) to which fuzz testing libraries can send input.
Usually R runs as main, so for RcppDeepState we need to instead compile another program and link it to the R shared library. In the previous blog post I showed how to do this using the C interface provided with the R source code. In this blog post I investigate how to go one step further with RInside. The goal will be to compile and run a simple C++ program with a main function that calls one of the C++ functions from the binsegRcpp package.
The first thing to do is download the R source code,
R-3.6.3.tar.gz,
which I saved to ~/R
and then I compiled it using the standard
commands:
cd ~/R
wget https://cloud.r-project.org/src/base/R-3/R-3.6.3.tar.gz
tar xf R-3.6.3.tar.gz
./configure --prefix=$HOME --enable-R-shlib
make
make install
Note in the above I used --prefix=$HOME
to install R under my home
directory, and I used --enable-R-shlib
to get ~/lib/R/lib/libR.so
which is the shared object file for R (necessary for embedding R into
other programs). The next step is to download and install RInside,
> install.packages("RInside")
trying URL 'http://cloud.r-project.org/src/contrib/RInside_0.2.16.tar.gz'
Content type 'application/x-gzip' length 80576 bytes (78 KB)
==================================================
downloaded 78 KB
Loading required package: grDevices
* installing *source* package ‘RInside’ ...
** package ‘RInside’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
/home/tdhock/lib/R/bin/Rscript tools/RInsideAutoloads.r > RInsideAutoloads.h
Loading required package: grDevices
/home/tdhock/lib/R/bin/Rscript tools/RInsideEnvVars.r > RInsideEnvVars.h
Loading required package: grDevices
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I. -I../inst/include/ -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c MemBuf.cpp -o MemBuf.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I. -I../inst/include/ -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c RInside.cpp -o RInside.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I. -I../inst/include/ -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c RInside_C.cpp -o RInside_C.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I. -I../inst/include/ -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I. -I../inst/include/ -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c compiler.cpp -o compiler.o
g++ -std=gnu++11 -shared -L/home/tdhock/lib/R/lib -L/usr/local/lib -o RInside.so MemBuf.o RInside.o RInside_C.o RcppExports.o compiler.o -L/home/tdhock/lib/R/lib -lR
g++ -std=gnu++11 -o libRInside.so MemBuf.o RInside.o RInside_C.o RcppExports.o compiler.o -shared -L/usr/local/lib -L"/home/tdhock/lib/R/lib" -lR
ar qc libRInside.a MemBuf.o RInside.o RInside_C.o RcppExports.o compiler.o
cp libRInside.so ../inst/lib
cp libRInside.a ../inst/lib
rm libRInside.so libRInside.a
installing to /home/tdhock/lib/R/library/00LOCK-RInside/00new/RInside/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Loading required package: grDevices
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Loading required package: grDevices
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
Loading required package: grDevices
** testing if installed package keeps a record of temporary installation path
* DONE (RInside)
The downloaded source packages are in
‘/tmp/Rtmp4Tx4Nx/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
>
Then in the small C++ program below (saved in a file called
binsegRcppInside.cpp) I define a main function that calls
rcpp_binseg_normal
, which is a function defined in the C++ code of
the binsegRcpp R
package.
#include <RInside.h>
#include <iostream>
// this is a prototype which tells the compiler the inputs/outputs of rcpp_binseg_normal.
Rcpp::List rcpp_binseg_normal(const Rcpp::NumericVector data_vec, const Rcpp::IntegerVector max_segments);
void run_tests(Rcpp::List result, int n_data){
// test 1: loss values must be decreasing.
Rcpp::NumericVector loss = result["loss"];
std::cout << loss << " loss\n";
for(int i=1; i<n_data; i++){
if(loss[i-1] < loss[i]){
std::cout << "TEST FAILURE: loss increasing!\n";
}
}
// test 2: vector of segment ends should start with n, n/2 when data
// are 0, 1, ..., N
Rcpp::IntegerVector end = result["end"];
std::cout << end << " segment ends\n";
if(end[0]+1 != n_data){
std::cout << "TEST FAILURE: first end should be last data point!\n";
}
if(end[1]+1 != n_data/2){
std::cout << "TEST FAILURE: second end should be middle data point!\n";
}
}
int main(int argc, char *argv[]){
RInside R(argc, argv);
// Create a sample data set to pass to the binary segmentation
// algorithm: 0, 1, ..., n_data.
int n_data = 4;
Rcpp::NumericVector data_vec(n_data);
for(int i=0; i<n_data; i++){
data_vec[i] = i;
}
// Set max segments to the number of data points.
Rcpp::IntegerVector max_segments(1);
max_segments[0] = n_data;
// Run and test binary segmentation algorithm.
Rcpp::List result = rcpp_binseg_normal(data_vec, max_segments);
run_tests(result, n_data);
}
In the code above the main function begins by creating an instance of
the RInside class, which initializes the embedded R interpreter. That
is provided by the #include <RInside>
which also includes functions
from the Rcpp namespace. The next part of the code creates a synthetic
data_vec
which is a NumericVector
of size n_data=4
, and
initializes max_segments
as an IntegerVector
of length 1 with
value same as n_data
. Finally we run the rcpp_binseg_normal
function and then call run_tests
which prints and checks the
results. To get that code to compile we first need to get a copy of
binsegRcpp, using the shell commands:
cd ~/R
git clone https://github.com/tdhock/binsegRcpp.git
R CMD INSTALL binsegRcpp
which gives me the output:
tdhock@maude-MacBookPro:~/R$ git clone https://github.com/tdhock/binsegRcpp.git
Cloning into 'binsegRcpp'...
remote: Enumerating objects: 56, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 56 (delta 25), reused 50 (delta 19), pack-reused 0
Unpacking objects: 100% (56/56), done.
tdhock@maude-MacBookPro:~/R$ R CMD INSTALL binsegRcpp
Loading required package: grDevices
* installing to library ‘/home/tdhock/lib/R/library’
* installing *source* package ‘binsegRcpp’ ...
** using staged installation
** libs
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c binseg_normal.cpp -o binseg_normal.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c binseg_normal_cost.cpp -o binseg_normal_cost.o
g++ -std=gnu++11 -I"/home/tdhock/lib/R/include" -DNDEBUG -I"/home/tdhock/lib/R/library/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c rcpp_interface.cpp -o rcpp_interface.o
g++ -std=gnu++11 -shared -L/home/tdhock/lib/R/lib -L/usr/local/lib -o binsegRcpp.so RcppExports.o binseg_normal.o binseg_normal_cost.o rcpp_interface.o -L/home/tdhock/lib/R/lib -lR
installing to /home/tdhock/lib/R/library/00LOCK-binsegRcpp/00new/binsegRcpp/libs
** R
** byte-compile and prepare package for lazy loading
Loading required package: grDevices
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Loading required package: grDevices
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
Loading required package: grDevices
** testing if installed package keeps a record of temporary installation path
* DONE (binsegRcpp)
tdhock@maude-MacBookPro:~/R$
After having done that installation, the binsegRcpp/src
directory
now has compiled object files, which we can use to get the compilation to work:
tdhock@maude-MacBookPro:~/R$ ls binsegRcpp/src/
binseg_normal_cost.cpp binseg_normal_cost.o binseg_normal.h binsegRcpp.so RcppExports.o rcpp_interface.o
binseg_normal_cost.h binseg_normal.cpp binseg_normal.o RcppExports.cpp rcpp_interface.cpp
tdhock@maude-MacBookPro:~/R$
My Makefile for compiling binsegRcppInside.cpp looks like
R_HOME=/home/tdhock/lib/R
COMMON_FLAGS=binsegRcppInside.o -L${R_HOME}/library/RInside/lib -Wl,-rpath=${R_HOME}/library/RInside/lib -L${R_HOME}/lib -Wl,-rpath=${R_HOME}/lib -lR -lRInside
binsegRcppInside: binsegRcppInside.o
g++ -o binsegRcppInside ${COMMON_FLAGS} /home/tdhock/R/binsegRcpp/src/*.o
./binsegRcppInside
binsegRcppLinked: binsegRcppInside.o
g++ -o binsegRcppLinked ${COMMON_FLAGS} ${R_HOME}/library/binsegRcpp/libs/binsegRcpp.so
./binsegRcppLinked
binsegRcppInside.o: binsegRcppInside.cpp
g++ -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include -I${R_HOME}/library/RInside/include binsegRcppInside.cpp -o binsegRcppInside.o -c
Note there are three recipes above. The last one is for creating the binsegRcppInside.o object file. The first two are two different ways to compile that object file into an executable (binsegRcppInside or binsegRcppLinked). The first/Inside version uses the object files (binsegRcpp/src/*.o) whereas the second/Linked version uses the shared library (binsegRcpp.so). The first/Inside version compiles and runs with output that looks like:
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$ rm -f *.o && make -f binsegRcppInside.Makefile binsegRcppInside
g++ -I/home/tdhock/lib/R/include -I/home/tdhock/lib/R/library/Rcpp/include -I/home/tdhock/lib/R/library/RInside/include binsegRcppInside.cpp -o binsegRcppInside.o -c
g++ -o binsegRcppInside binsegRcppInside.o /home/tdhock/R/binsegRcpp/src/*.o -L/home/tdhock/lib/R/library/RInside/lib -Wl,-rpath=/home/tdhock/lib/R/library/RInside/lib -L/home/tdhock/lib/R/lib -Wl,-rpath=/home/tdhock/lib/R/lib -lR -lRInside
./binsegRcppInside
-9 -13 -13.5 -14 loss
3 1 0 2 segment ends
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$
Note that doing the above makes a binary/executable binsegRcppInside which contains copies of the machine code / objects defined in the binsegRcpp C++ code. This method only works if we can get access to the package source code (which should be possible for all CRAN packages as long as we have an internet connection).
Another way to achieve the same result is to link our compiled binary to the binsegRcpp.so shared object file (second/Linked version in Makefile recipes above), via:
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$ rm -f *.o && make -f binsegRcppInside.Makefile binsegRcppLinked
g++ -I/home/tdhock/lib/R/include -I/home/tdhock/lib/R/library/Rcpp/include -I/home/tdhock/lib/R/library/RInside/include binsegRcppInside.cpp -o binsegRcppInside.o -c
g++ -o binsegRcppLinked binsegRcppInside.o -L/home/tdhock/lib/R/library/RInside/lib -Wl,-rpath=/home/tdhock/lib/R/library/RInside/lib -L/home/tdhock/lib/R/lib -Wl,-rpath=/home/tdhock/lib/R/lib -lR -lRInside /home/tdhock/lib/R/library/binsegRcpp/libs/binsegRcpp.so
./binsegRcppLinked
-9 -13 -13.5 -14 loss
3 1 0 2 segment ends
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$
The two methods clearly produce the same result. The difference is only in HOW the result is computed (Linked version uses a shared object whereas Inside version does not) and the size of the resulting executables:
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$ ldd binsegRcppInside|grep lib/R
libR.so => /home/tdhock/lib/R/lib/libR.so (0x00007ffb2a2f1000)
libRInside.so => /home/tdhock/lib/R/library/RInside/lib/libRInside.so (0x00007ffb2a0cd000)
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$ ldd binsegRcppLinked|grep lib/R
/home/tdhock/lib/R/library/binsegRcpp/libs/binsegRcpp.so (0x00007fd0e76af000)
libR.so => /home/tdhock/lib/R/lib/libR.so (0x00007fd0e7026000)
libRInside.so => /home/tdhock/lib/R/library/RInside/lib/libRInside.so (0x00007fd0e6e02000)
tdhock@maude-MacBookPro:~/R/R-3.6.3/tests/Embedding$ du binsegRcppLinked binsegRcppInside
92 binsegRcppLinked
1192 binsegRcppInside
In conclusion, we have shown how to use RInside to embed the R
interpreter into a C++ program with a main function. Our main function
called a C++ function rcpp_binseg_normal
which was defined in an R
package, and then ran some tests on the result. For the RcppDeepState
project we will be doing something similar, but there are two key
differences. First, rather than explicitly defining a data set to use
as input (above we used the data set 0, 1, 2, 3), we will use
DeepState_*
functions that ask fuzz testing libraries to generate
random/learned inputs. Second, rather than explicitly defining a main
function, as in the code above, we will define a DeepState test
harness (without a main function), and the DeepState tools will
automatically generate a main function to use with the fuzz testing
tools.