# Software

##### Free/open-source software for statistical machine learning and data visualization

My main contributions to free/open-source software are R packages that provide implementations of the methods described in my research papers (see below).

### R community

- Since 2012, I am co-administrator and mentor for the R project in Google Summer of Code – I have been helping teach college students all over the world how to write R packages. Because of this work, the R Foundation gave me the toby.hocking@r-project.org email address.
- I am president of the organizing committee for “R in Montreal 2018,” a local conference for useRs and developeRs of R.

### PeakSeg

The PeakSeg R packages contain algorithms for inferring optimal segmentation models subject to the constraint that up changes must be followed by down changes, and vice versa. This ensures that the model can be interpreted in terms of peaks (after up changes) and background (after down changes).

- PeakSegDP provides a heuristic quadratic time algorithm for computing models from 1 to S segments for a single sample. This was the original algorithm described in our ICML’15 paper, but it is neither fast nor optimal, so in practice we recommend to use our newer packages below instead.
- PeakSegOptimal provides log-linear time algorithms for computing optimal models with multiple peaks for a single sample. arXiv:1703.03352
- PeakSegDisk provides an on-disk implementation of optimal log-linear algorithms for computing multiple peaks in a single sample (same as PeakSegOptimal but works for much larger data sets because disk is used for storage instead of memory). arXiv:1810.00117
- PeakSegJoint provides a fast heuristic algorithm for computing models with a single common peak in 0,…,S samples. arXiv:1506.01286
- PeakSegPipeline provides a pipeline for genome-wide peak calling using PeakSeg. (work in progress)

### PeakError

To support our Bioinformatics (2017) paper about a labeling method for supervised peak detection, we created the R package PeakError which computes the number of incorrect labels for a given set of predicted peaks.

### Clusterpath

To support our ICML’11 paper about the “clusterpath,” a convex formulation of hierarchical clustering, we created the clusterpath R package, available on R-Forge.

### rankSVMcompare

To support our paper about a Support Vector Machine (SVM) algorithm for ranking and comparing (in preparation, arXiv:1401.8008), we created the rankSVMcompare R package.

### animint

To support our paper about animated and interactive extensions to the grammar of graphics (in preparation), and our useR2016 tutorial on interactive graphics, we created the animint R package.

### fpop

To support our Statistics and Computing (2016) paper about a functional pruning optimal partitioning algorithm, we created the fpop R package.

### mmit

To support our paper about max margin interval trees (in preparation), we created the mmit R package and Python module.

### penaltyLearning

To support our ICML’13 paper and useR2017 tutorial about learning penalty functions for changepoint detection, we created the penaltyLearning R package.

### iregnet

To support our paper about elastic net regularized interval regression models (in preparation), we created the iregnet R package.

### Directlabels

To support my poster “Adding direct labels to plots” which won Best Student Poster at useR 2011, we created the directlabels R package.

### inlinedocs

To support our Journal of Statistical Software (2013) paper about documentation generation for R, we created the inlinedocs R package.