My main contributions to free/open-source software are R packages that provide implementations of the methods described in my research papers (see below).

R and statistical software research community

xgboost: Extreme Gradient Boosting

To support our paper about Survival Regression with Accelerated Failure Time Model in XGBoost (Journal of Computational and Graphical Statistics, 2022) we created the AFT objectives in xgboost, Documentation, Video.

aum: Area Under Min(FP,FN)

To support our paper (in progress) about optimizing the Area Under Min(FP,FN) functions (a differentiable surrogate for ROC-AUC), we created the aum R package which has a C++ implementation of directional derivatives, and a python torch function which can be used with automatic differentiation.

SPARSEMODr: SPAtial Resolution-SEnsitive Models of Outbreak Dynamics

To support our paper about infectious disease modeling, we created the SPARSEMODr R package. Preprint medRxiv, Biology Methods & Protocols (2022).

RcppDeepState: fuzz testing compiled code in R packages

To support our R consortium funded project about fuzz testing C++ functions in R packages that use Rcpp, we created the RcppDeepState R package and github action.

LOPART: Labeled Optimal Partitioning

To support our Computational Statistics (2022) paper about Labeled Optimal Partitioning, we created the LOPART R package.

gfpop: Graph-constrained Functional Pruning Optimal Partitioning

To support our paper about graph-constrained optimal changepoint detection, we created the gfpop and gfpopgui R packages. arXiv:2002.03646

PeakSeg: up-down constrained changepoint detection

The PeakSeg R packages contain algorithms for inferring optimal segmentation models subject to the constraint that up changes must be followed by down changes, and vice versa. This ensures that the model can be interpreted in terms of peaks (after up changes) and background (after down changes).

  • PeakSegDP provides a heuristic quadratic time algorithm for computing models from 1 to S segments for a single sample. This was the original algorithm described in our ICML’15 paper, but it is neither fast nor optimal, so in practice we recommend to use our newer packages below instead.
  • PeakSegOptimal provides log-linear time algorithms for computing optimal models with multiple peaks for a single sample, to support our JMLR’20 paper.
  • PeakSegDisk provides an on-disk implementation of optimal log-linear algorithms for computing multiple peaks in a single sample (same as PeakSegOptimal but works for much larger data sets because disk is used for storage instead of memory). Journal of Statistical Software Vol. 101, Issue 10.
  • PeakSegJoint provides a fast heuristic algorithm for computing models with a single common peak in 0,…,S samples. arXiv:1506.01286.
  • PeakSegPipeline provides a supervised machine learning pipeline for genome-wide peak calling in multiple samples and cell types, as described in our PSB’20 paper.
  • FLOPART, Functional Labeled Optimal Partitioning, provides a supervised peak detection algorithm with label constraints. (paper in progress)
  • CROCS supports our BMC Bioinformatics (2021), paper, and provides an interface to various peak detection models as well as an implementation of our proposed algorithm, Changepoints for a Range Of ComplexitieS.

PeakError: label error computation for peak models

To support our Bioinformatics (2017) paper about a labeling method for supervised peak detection, we created the R package PeakError which computes the number of incorrect labels for a given set of predicted peaks.

clusterpath: convex clustering

To support our ICML’11 paper about the “clusterpath,” a convex formulation of hierarchical clustering, we created the clusterpath R package, available on R-Forge.

rankSVMcompare: support vector machines for ranking and comparing

To support our paper about a Support Vector Machine (SVM) algorithm for ranking and comparing (in preparation, arXiv:1401.8008), we created the rankSVMcompare R package.

animint: animated interactive grammar of graphics

To support our JCGS paper about animated and interactive extensions to the grammar of graphics, and our useR2016 tutorial on interactive graphics, we created the animint R package. The more recent version is animint2.

fpop: functional pruning optimal partitioning

To support our Statistics and Computing (2016) paper about a functional pruning optimal partitioning algorithm, we created the fpop R package.

mmit: max margin interval trees

To support our NeurIPS’17 paper about max margin interval trees, we created the mmit R package and Python module.

penaltyLearning: supervised changepoint detection

To support our ICML’13 paper, useR2017 tutorial, and JCGS’21 paper about learning penalty functions for changepoint detection, we created the penaltyLearning R package.

iregnet: elastic net regularized interval regression

To support our paper about elastic net regularized interval regression models (in preparation), we created the iregnet R package.

binsegRcpp: binary segmentation

To use as a baseline efficient implementation of binary segmentation in various papers such as Labeled Optimal Partitioning and Linear time model selection, we created the binsegRcpp R package.

directlabels: automatic label placement on figures

To support my poster “Adding direct labels to plots” which won Best Student Poster at useR 2011, we created the directlabels R package.

inlinedocs: documentation generation

To support our Journal of Statistical Software (2013) paper about documentation generation for R, we created the inlinedocs R package.

namedCapture: regular expressions for text parsing

To support our R Journal paper about R packages for regular expressions, we created the namedCapture R package, and provided various contributions to base R:

nc: named capture regular expressions for text parsing and data reshaping

To support our R Journal paper about data reshaping using regular expressions, we created the nc R package. To get a more efficient and fully-featured implementation of data reshaping, we contributed C code and the new measure function to the data.table package (since version 1.14.1 in 2021).

Python pandas str.extractall method for regular expressions

I wrote the str.extractall method for regular expressions, which was merged into pandas in 2016.