Toby Dylan Hocking | Simple methods for defining small data by row

It is often useful to define a small data frame literally in R code, and we show a simple way to do this in base R.

A few examples

In our recent paper about spatially explicit stochastic disease models (currently in peer review), there is some R code that defines time windows. It was my suggestion to present the data/code as follows, with each time window on a line:

one.window <- function(start, end, r0)data.frame(start, end, r0)
library(lubridate)
(time.window.args <- rbind(# Specify the components of 5 time windows
  one.window(mdy("1-1-20"),mdy("1-31-20"),3.0),
  one.window(mdy("2-1-20"),mdy("2-15-20"),0.8),
  one.window(mdy("2-16-20"),mdy("3-10-20"),0.8),
  one.window(mdy("3-11-20"),mdy("3-21-20"),1.4),
  one.window(mdy("3-22-20"),mdy("5-1-20"),1.4)))

##        start        end  r0
## 1 2020-01-01 2020-01-31 3.0
## 2 2020-02-01 2020-02-15 0.8
## 3 2020-02-16 2020-03-10 0.8
## 4 2020-03-11 2020-03-21 1.4
## 5 2020-03-22 2020-05-01 1.4

Another example comes from code to make a figure showing label errors in an upcoming paper about a Functional Labeled Optimal Partitioning (FLOPART) algorithm.

lab <- function(chromStart, chromEnd, annotation){
  data.frame(chrom="chr11", chromStart, chromEnd, annotation)
}
(new.labels <- rbind(
  lab(100000, 200000, "noPeaks"),
  lab(206000, 207000, "peakStart"),
  lab(208000, 220000, "peakEnd"),
  lab(300000, 308250, "peakStart"),
  lab(308260, 320000, "peakEnd")))

##   chrom chromStart chromEnd annotation
## 1 chr11     100000   200000    noPeaks
## 2 chr11     206000   207000  peakStart
## 3 chr11     208000   220000    peakEnd
## 4 chr11     300000   308250  peakStart
## 5 chr11     308260   320000    peakEnd

A third example comes from code to make a timings figure for our upcoming paper about gradient-based optimization of the Area Under the Minimum (AUM) of false positive and false negative functions.

finfo <- function(Problem, file.csv, col.name, col.value){
  data.frame(Problem, file.csv, col.name, col.value)
}
(csv.file.info <- rbind(
  finfo("Changepoint detection","figure-aum-grad-speed-data.csv","pred.type","pred.rnorm"),
  finfo("Binary classification","figure-aum-grad-speed-binary-cpp-data.csv","prediction.order","unsorted")))

##                 Problem                                  file.csv         col.name  col.value
## 1 Changepoint detection            figure-aum-grad-speed-data.csv        pred.type pred.rnorm
## 2 Binary classification figure-aum-grad-speed-binary-cpp-data.csv prediction.order   unsorted

Remove repetition

In the code above we need to repeat the column names twice: once in the function arguments, another time in the function body. How can we remove this repetition? We can use R meta-programming, as in the function below:

row_fun <- function(...){
  form.list <- as.list(match.call()[-1])
  sym <- sapply(form.list, is.symbol)
  names(form.list)[sym] <- form.list[sym]
  form.list[sym] <- NA
  make_row <- function(){}
  formals(make_row) <- form.list
  symbol.names <- c("data.frame", names(form.list))
  body(make_row) <- as.call(lapply(symbol.names, as.symbol))
  make_row
}

The function above creates and returns a function which outputs a data frame:

(win <- row_fun(start, end, r0))

## function (start = NA, end = NA, r0 = NA) 
## data.frame(start, end, r0)
## <environment: 0x9f39120>

win(1, 2, 3)

##   start end r0
## 1     1   2  3

This may be useful if you want to define several different tables with the same column names:

lab <- row_fun(chromStart, chromEnd, annotation, chrom="chr11") 
(fig1.labels <- rbind(
  lab(100, 200, "noPeaks"),
  lab(300, 350, "peakStart")))

##   chromStart chromEnd annotation chrom
## 1        100      200    noPeaks chr11
## 2        300      350  peakStart chr11

(fig2.labels <- rbind(
  lab(100, 150, "noPeaks"),
  lab(200, 250, "peakEnd")))

##   chromStart chromEnd annotation chrom
## 1        100      150    noPeaks chr11
## 2        200      250    peakEnd chr11

The code above does not have the repetition of column names, but it does require repeating the row-making function name, lab.

Comparison with tribble

There is a similar function,

(fig1.labels <- tibble::tribble(
  ~chromStart, ~chromEnd, ~annotation, ~chrom,
  100, 200, "noPeaks", "chr11",
  300, 350, "peakStart", "chr11"))

## # A tibble: 2 x 4
##   chromStart chromEnd annotation chrom
##        <dbl>    <dbl> <chr>      <chr>
## 1        100      200 noPeaks    chr11
## 2        300      350 peakStart  chr11

(fig2.labels <- tibble::tribble(
  ~chromStart, ~chromEnd, ~annotation, ~chrom,
  100, 150, "noPeaks", "chr11",
  200, 250, "peakEnd", "chr11"))

## # A tibble: 2 x 4
##   chromStart chromEnd annotation chrom
##        <dbl>    <dbl> <chr>      <chr>
## 1        100      150 noPeaks    chr11
## 2        200      250 peakEnd    chr11

The code above requires repeating the column names for each table, and it does not allow for a simple definition of a default column value.