Toby Dylan Hocking's LASSO Blog

Learning Algorithms, Statistical Software, Optimization

A custom DataLoader for mlr3torch

July 24, 2025 — 00:00

Stratified sampling for imbalanced classification
Creating large imbalanced data benchmarks

July 14, 2025 — 00:00

Tutorial with OpenML
Load-balanced parallel machine learning benchmarks

May 30, 2025 — 00:00

A new approach using filelock and batchtools
Debrief May 2025

May 23, 2025 — 00:00

Zurich, Munich, Mons
Parallel machine learning benchmarks

May 20, 2025 — 00:00

A new approach using targets and crew.cluster
New parallel computing frameworks

May 16, 2025 — 00:00

batchtools, clustermq, rush, mirai, crew, targets
Centralized vs de-centralized parallelization

May 15, 2025 — 00:00

Exploring rush
mlr3 tutorials

April 26, 2025 — 00:00

Links to other blogs
Comparing change-point pruning methods using square loss

April 15, 2025 — 00:00

Pruned Exact Linear Time (PELT) and Functional Pruning Optimal Partitioning (FPOP)
Organizing computational research projects

April 11, 2025 — 00:00

Guide for research students
Volcano plots

April 8, 2025 — 00:00

Redesign of SOAK paper results
Torch learning with binary classification

April 3, 2025 — 00:00

Implementing AUM loss in mlr3torch
Creating imbalanced data benchmarks

March 21, 2025 — 00:00

Tutorial with MNIST
Comparing neural network architectures using mlr3torch

March 20, 2025 — 00:00

Convolutional network versus linear model
Comparing pruning methods for optimal partitioning

February 25, 2025 — 00:00

Pruned Exact Linear Time (PELT) and Functional Pruning Optimal Partitioning (FPOP)
Are pipe operations linear or quadratic?

February 19, 2025 — 00:00

A demonstration of atime on mlr3torch
Bike ride map and time series data viz

January 19, 2025 — 00:00

A demonstration of animint2 and sf
Configuring eduroam

January 7, 2025 — 00:00

On cell phones and linux
Benchmarking data.table with polars, duckdb, and pandas

December 2, 2024 — 00:00

Demonstrating advantages of data.table
Implementing optimal partitioning in R

November 7, 2024 — 00:00

Comparison with Pruned Exact Linear Time (PELT)
Cross-validation experiments with torch learners

October 30, 2024 — 00:00

Demonstration of mlr3torch + mlr3resampling
Overhead of auto-grad in torch

October 28, 2024 — 00:00

Comparison with explicit gradients
AUC and AUM in torch

October 10, 2024 — 00:00

Demonstration of auto-grad
Ordinary least squares algorithms

September 17, 2024 — 00:00

Comparing computation time in R
Collaborations not allowed

September 9, 2024 — 00:00

Parsing a web page with regex
Code of conduct / conduite

September 6, 2024 — 00:00

Lecture obligatoire pour participants du labo LASSO / Required reading for LASSO lab participants
Visualizing prediction error

August 30, 2024 — 00:00

And clearly showing differences between algorithms
History of supervised change-point detection

August 20, 2024 — 00:00

Using git bisect to find a survival bug
Generate publications page

August 15, 2024 — 00:00

Parsing bibtex and generating markdown
Writing comprehensible tests

August 14, 2024 — 00:00

Documenting key code magic numbers in animint2 tests
Rust versus Go

August 7, 2024 — 00:00

Similarities and Differences
How reproducible are benchmarks?

August 6, 2024 — 00:00

Comparing atime results on different computers
Collapse reshape benchmark

August 5, 2024 — 00:00

Comparison with data.table
Porting base R regex code to nc

July 29, 2024 — 00:00

Case study with a complex regex
Benchmarking a change in data.table

July 17, 2024 — 00:00

Progress reporting for group by operations
Mammouth tutorial

July 16, 2024 — 00:00

Cluster computing for students at UdeS
Research student application

June 25, 2024 — 00:00

Please read if you want to do research under my supervision
HTML to Markdown

June 24, 2024 — 00:00

Regex for porting my lab web site
Short bio

June 20, 2024 — 00:00

Some text to use for talk introductions
Directions to my office in Sherbrooke

June 20, 2024 — 00:00

With a map in English!
New code for various kinds of cross-validation

April 15, 2024 — 00:00

Cross-validation in R with mlr3
Capturing regular expressions

February 6, 2024 — 00:00

Extracting data from loosely structured text
The importance of hyper-parameter tuning

January 29, 2024 — 00:00

And parallellizing machine learning experiments in R
When is it useful to train with combined subsets?

January 26, 2024 — 00:00

An exploration using cross-validation
Parsing check logs using regular expressions

January 19, 2024 — 00:00

A demonstration of nc R package
Unable to load shared object, Undefined symbol

January 4, 2024 — 00:00

Creating and explaining a linker error
Reshape performance comparison

January 3, 2024 — 00:00

Demonstration of asymptotic timing comparisons
Cross-validation with variable size train sets

December 28, 2023 — 00:00

Determining how many samples are necessary for optimal prediction
Upgrading R arrow

December 20, 2023 — 00:00

More build debugging
Partial matching on data frame row names

December 19, 2023 — 00:00

Comparing efficiency using atime
Interpretable learning algorithms with built-in feature selection

November 30, 2023 — 00:00

Regularized linear model and decision tree
Generalization to new subsets in R

November 29, 2023 — 00:00

Coding non-standard cross-validation
Comparing machine learning frameworks in R

November 28, 2023 — 00:00

for loop, mlr3, tidymodels
data.table CRAN diffs

November 1, 2023 — 00:00

Verifying consistency between CRAN and github
data.table asymptotic timings

October 8, 2023 — 00:00

Motivational figures
Debugging python code in emacs

September 27, 2023 — 00:00

Fixing a bug and building old emacs
Count unique students

August 29, 2023 — 00:00

Regex and data table summarization
Essential emacs key commands

August 28, 2023 — 00:00

Cheat sheet for my students
Splitting an R package

June 26, 2023 — 00:00

Recommendations from experience with spatstat
Installing Rmpi on the cluster

June 17, 2023 — 00:00

This package needs special treatment on compute nodes
Segfault using R arrow

May 2, 2023 — 00:00

Reproducing and fixing an error
Re-building vignettes on windows

May 1, 2023 — 00:00

Fixing mysterious error
Modifying default gcc compilation flags

April 24, 2023 — 00:00

When compiling R packages
Installing Ubuntu on an old Mac

April 7, 2023 — 00:00

Step by step instructions
spack package manager

April 6, 2023 — 00:00

contrast with conda
Checking R package on M1 Mac

April 3, 2023 — 00:00

Web services for R package developers
Comparing asymptotic timings of CSV read/write functions

March 20, 2023 — 00:00

Some surprising differences
Debugging C code

February 8, 2023 — 00:00

valgrind and gdb are essential tools
CRAN Meta-data

February 6, 2023 — 00:00

Backing up MRAN
Cross-validation experiments on the cluster

November 4, 2022 — 00:00

NAU monsoon tutorial
Generalization to new subsets

November 3, 2022 — 00:00

Cross-validation in python
R Package Release History

November 2, 2022 — 00:00

Extracting and plotting data from CRAN web site
Submitting python jobs on monsoon

November 1, 2022 — 00:00

And anaconda setup
Cloud Storage

October 30, 2022 — 00:00

Different options for internal and external sharing
Indirect reverse dependencies

October 19, 2022 — 00:00

Computing the entire graph, and histogram tutorial
GUI for WSL on Windows 10

September 22, 2022 — 00:00

use cygwin instead of vcxsrv
Reformatting NEWS files

September 15, 2022 — 00:00

Regular expression example
R packages on github

August 17, 2022 — 00:00

How to query CRAN meta-data
Positive and negative log transform

July 14, 2022 — 00:00

Non-linear transformations for heat maps and signed p-values
Research Mentorship Plan

June 17, 2022 — 00:00

Required reading for potential students
Historical reverse imports

May 13, 2022 — 00:00

Analysis of R package usage over time
Learning with Area Under the Min

April 19, 2022 — 00:00

How to use torch with a non-standard loss
Torch randomness

March 7, 2022 — 00:00

Reproducible neural network learning
No argument unpacking in C

March 7, 2022 — 00:00

But there is in R and Python
AUM in Torch

February 21, 2022 — 00:00

Auto-grad of a non-differentiable loss function
Erdos number

February 18, 2022 — 00:00

A distance calculator
Plotting the probability simplex

February 15, 2022 — 00:00

An application of matrix inversion
Link-time optimization

January 25, 2022 — 00:00

Fixing warnings from CRAN checks
Finding symbols in object files

January 18, 2022 — 00:00

Using objdump to find cerr
Ten years of R project in Google Summer of Code

September 3, 2021 — 00:00

Some success stories from my participation
Simple methods for defining small data by row

July 24, 2021 — 00:00

Comparison with base R and tribble
The C book

June 29, 2021 — 00:00

Documentation of stringize macros
Evidential machine learning

June 25, 2021 — 00:00

An alternative to probability
Stress testing reshape operations on list columns

June 23, 2021 — 00:00

Advantages of updated data.table::melt
Defining data by row and regex by sub-pattern

May 29, 2021 — 00:00

Avoiding separation of related concepts in code
Update about data reshaping and visualization in R and python

May 28, 2021 — 00:00

data.table, tidyr, nc, pandas, datatable, plotnine, altair, bokeh
Convex clustering theory

May 19, 2021 — 00:00

Recent results on trees and cluster shapes
R packages that depend on system libraries

May 18, 2021 — 00:00

How to pass CRAN checks
The UCR Time Series Archive

March 23, 2021 — 00:00

A benchmark for classification algorithms
Multi-threaded sorting

March 2, 2021 — 00:00

Thread safety of qsort variants
Faster AUM computation?

February 15, 2021 — 00:00

Log-linear C++ STL containers vs linear time radix sort
New ideas for classification

February 12, 2021 — 00:00

Weston-Watkins multiclass SVM and AUC optimization
Emulating the python interactive console

January 27, 2021 — 00:00

My hack using the code module
New packages for data storage and reshaping

October 23, 2020 — 00:00

tidyfast, tidyfst, fst, arrow, feather, parquet
Computing K-means train/validation error

September 18, 2020 — 00:00

Alternatives to for loops in R
Parsing CRAN maintainers

September 14, 2020 — 00:00

Regular expressions using nc R package
emacspeak

May 11, 2020 — 00:00

Teaching my son to type in emacs
Random train/validation/test assignment

April 22, 2020 — 00:00

Different methods tried by my students
C/C++ completion in emacs

April 16, 2020 — 00:00

Configuration details
Data manipulation libraries

April 9, 2020 — 00:00

Translating between data.table, pandas, dplyr
Custom evaluation metrics in TensorFlow

April 7, 2020 — 00:00

Implementing the exact area under the ROC curve
Fast parameter exploration

April 3, 2020 — 00:00

Caching and parallel execution
binsegRcpp inside a C++ program

March 31, 2020 — 00:00

Embedding Rcpp code into a main function
Embedding R

March 30, 2020 — 00:00

Compiling a program that links to R
R batchtools on Monsoon

February 19, 2020 — 00:00

Cluster computing tutorial for NAU students
Arizona time

January 28, 2020 — 00:00

Why does internet tell people the wrong time?
Emacs local variables

January 10, 2020 — 00:00

Custom configurations for R
Ubuntu setup and LaTeX debugging

January 7, 2020 — 00:00

Installing and configuring a 10 year old Mac
X forwarding on windows

December 13, 2019 — 00:00

Installing and configuring cygwin
Scientific poster suggestions

December 2, 2019 — 00:00

A helpful video
Tinyverse

October 12, 2019 — 00:00

Complex software dependencies considered harmful
useR 2019 debrief

August 5, 2019 — 00:00

Interesting talks I saw in Toulouse
R in Docker on Mac

July 29, 2019 — 00:00

Reproducing valgrind messages using an R-hub image
R package installation on windows considered harmful

May 29, 2019 — 00:00

Warning for unsuccessful DLL copy should be an error
tikzDevice on windows

May 13, 2019 — 00:00

Fixing missing packages
future.batchtools

February 6, 2019 — 00:00

Simple parallel R code on a computer cluster
OpenMP

January 28, 2019 — 00:00

Simple parallel for loops in C++
gdb with R

January 25, 2019 — 00:00

how to find line numbers of assertion errors
Eigen and UNDEBUG

January 22, 2019 — 00:00

Turning on runtime assertion errors for compiled code in R packages
survivalsvm

January 18, 2019 — 00:00

Support vector machine for survival analysis
Testing PeakSegPipeline on Travis with SLURM

November 15, 2018 — 00:00

Also batchtools and texinfo
Tweet when donation received

October 19, 2018 — 00:00

My first google script
Setting default web browser in LXDE

October 17, 2018 — 00:00

Need to create a .desktop file
Keyboard remapping on windows

August 27, 2018 — 00:00

Changing caps lock to control on windows
Training Benchmark for Deep neural networks

April 19, 2018 — 00:00

A new benchmark data set for neural network training
The biglasso package

March 14, 2018 — 00:00

An on-disk implementation for huge data
True reproducibility in R

December 6, 2017 — 00:00

The switchr package and manifests
Seasonal temperature variations where I have lived

November 25, 2017 — 00:00

Using R to download and plot temperature data from wikipedia
Loon

November 17, 2017 — 00:00
What Science Is

October 20, 2017 — 00:00
Compiling R

September 18, 2017 — 00:00
R-GSOC-2017

September 6, 2017 — 00:00

R project in Google Summer of Code 2017
useR! 2017 debrief

August 29, 2017 — 00:00

summary of interesting work I saw in Brussels
Combining data tables in R

August 17, 2017 — 00:00

rbind inside the for loop is much slower than outside
new web site

August 10, 2017 — 00:00

now more complete and informative

Newer

Older

Page of