My main contributions to free/open-source software are R packages that provide implementations of the methods described in my research papers (see below).

R community

  • Since 2012, I am co-administrator and mentor for the R project in Google Summer of Code – I have been helping teach college students all over the world how to write R packages. Because of this work, the R Foundation gave me the email address.
  • I was president of the organizing committee for “R in Montreal 2018,” a local conference for useRs and developeRs of R.
  • I am an editor for the Journal of Statistical Software.


The PeakSeg R packages contain algorithms for inferring optimal segmentation models subject to the constraint that up changes must be followed by down changes, and vice versa. This ensures that the model can be interpreted in terms of peaks (after up changes) and background (after down changes).

  • PeakSegDP provides a heuristic quadratic time algorithm for computing models from 1 to S segments for a single sample. This was the original algorithm described in our ICML’15 paper, but it is neither fast nor optimal, so in practice we recommend to use our newer packages below instead.
  • PeakSegOptimal provides log-linear time algorithms for computing optimal models with multiple peaks for a single sample. arXiv:1703.03352
  • PeakSegDisk provides an on-disk implementation of optimal log-linear algorithms for computing multiple peaks in a single sample (same as PeakSegOptimal but works for much larger data sets because disk is used for storage instead of memory). arXiv:1810.00117
  • PeakSegJoint provides a fast heuristic algorithm for computing models with a single common peak in 0,…,S samples. arXiv:1506.01286
  • PeakSegPipeline provides a supervised machine learning pipeline for genome-wide peak calling in multiple samples and cell types, as described in our PSB’20 paper.


To support our Bioinformatics (2017) paper about a labeling method for supervised peak detection, we created the R package PeakError which computes the number of incorrect labels for a given set of predicted peaks.


To support our ICML’11 paper about the “clusterpath,” a convex formulation of hierarchical clustering, we created the clusterpath R package, available on R-Forge.


To support our paper about a Support Vector Machine (SVM) algorithm for ranking and comparing (in preparation, arXiv:1401.8008), we created the rankSVMcompare R package.


To support our JCGS paper about animated and interactive extensions to the grammar of graphics, and our useR2016 tutorial on interactive graphics, we created the animint R package.


To support our Statistics and Computing (2016) paper about a functional pruning optimal partitioning algorithm, we created the fpop R package.


To support our NeurIPS’17 paper about max margin interval trees, we created the mmit R package and Python module.


To support our ICML’13 paper and useR2017 tutorial about learning penalty functions for changepoint detection, we created the penaltyLearning R package.


To support our paper about elastic net regularized interval regression models (in preparation), we created the iregnet R package.


To support my poster “Adding direct labels to plots” which won Best Student Poster at useR 2011, we created the directlabels R package.


To support our Journal of Statistical Software (2013) paper about documentation generation for R, we created the inlinedocs R package.


To support our R Journal submission about R packages for regular expressions, we created the namedCapture R package.


To support our R Journal submission about data reshaping using regular expressions, we created nc, which is a redesign of our previous namedCapture package.