To support our paper about gradient-based optimization of the Area Under the Minimum (AUM) of the False Positives and Negatives, I recently wrote some C++ code which does the computation. It uses the std::map container from the Standard Template Library. The container is a key-value store which keeps its elements sorted by key (using a red-black tree internally). In my application the keys are prediction thresholds, and the values are total differences in false positives and false negatives which occur at that threshold (summed over all examples). If there are N thresholds with error differences over all examples, then we need to perform N find/insert operations, each of which is a log-time operation, O(log N). The overall time complexity of the algorithm is therefore log-linear O(N log N).

I had previously implemented this algorithm in pure R code which uses the excellent data.table package. I wanted to compare the new C++ implementation to the R implementation (which uses C code under the hood). So I wrote a vignette which does speed comparisons. I expected that the two algorithms would have the same log-linear asymptotic time complexity. The average timings in seconds for the two methods (penaltyLearning, aum) on several data sizes (N.pred) are shown below, along with the speedup (penaltyLearning/aum).

   N.pred penaltyLearning        aum   speedup
1:     10      0.07699782 0.00006470 1190.0745
2:     31      0.07808582 0.00007602 1027.1747
3:    100      0.11383554 0.00013244  859.5254
4:    316      0.22861260 0.00032450  704.5072
5:   1000      0.59400916 0.00097234  610.9068

I was not surprised to observe that the C++ code (aum) was orders of magnitude faster than the R code (penaltyLearning). That is to be expected due to the overhead of interpreting/evaluating R code. However I was surprised to observe that the speedup consistently decreases as the data size (N.pred) increases.

This can be explained by remembering that R/data.table uses the linear time radix sort, which is asymptotically faster than the C++ STL container. For small data sizes (as above) the log-linear C++ code is faster; for large data sizes (1 million or more?) the R/data.table code may actually be faster.

This begs the question, should we use radix sort instead of the log-linear C++ STL map container? Theoretically yes. However I do not know of any off-the-shelf radix sort C/C++ library function. The problem with radix sort is that the implementation is highly dependent on the data type to sort (here a double). I found an interesting tutorial, Radix Sort Revisited, which explains how to implement a radix sort (for float), but it would introduce substantial complexity to the code. Another option would be to use the radix sort code in R/data.table, but I would prefer a pure C/C++ solution (no dependency on R). Maybe boost::sort::spreadsort::float_sort would do the job? It is “an extremely fast hybrid radix sort algorithm” with best, average, and worst case time complexity of N, N sqrt(Log N), min(N log N, N key_length). For now I’ll settle for log-linear time and the STL for simplicity and portability.