C++ STL map versus sort

To support a new research paper I recently implemented efficient C and C++ codes for computing Area Under the Minimum (AUM) of False Positives and False Negatives. The first function I coded was aum_map, which uses the C++ Standard Template Library (STL) map object, which keeps unique thresholds in sorted order.

Is this a good use of the STL map object? In general I recommend using the STL red-black tree objects (map, set, etc) when elements need to be kept sorted in a particular order, and the most extreme element needs to be found after every insert. Thus the AUM computation is an example where the STL map is overkill – in this case we only need to query the sorted order of the data structure after all elements have been inserted (not after intermediate inserts). The STL map would be better suited to an application where the map needs to be queried after every insert, such as in binary segmentation.

In the case of AUM computation we could instead use simpler data structures (arrays) and one call to a sort function. So to pursue this idea, which I presumed should be faster, I coded another version, aum_sort. Originally I coded it using std::sort in C++ using an anonymous function,

#include <algorithm>//std::sort
#include <vector>
double *out_thresh;
int err_N;
std::vector<int> out_indices(err_N);
std::sort(out_indices.begin(), out_indices.end(),
  [&out_thresh](int left, int right){
    return out_thresh[left] < out_thresh[right];
  }
);

Using std::sort in this way, we specify the begin/end as the first two arguments, and then we specify a comparison function as the third argument. The anonymous comparison function gets access to the out_thresh variable by reference via the [&out_thresh] syntax, which is very convenient!

From C++ to C, compiler/doc issues with qsort variants

Then I began to try to simplify so that the code could compile as standard C – the idea would be a more portable code (would not require C++ compiler). Just use qsort from the C standard library instead of std::sort from C++, simple, right? Actually it is a bit more complicated.

On my Ubuntu laptop there was a linux man page for qsort_r – the r is for re-entrant, meaning that it should be thread safe. Its prototype is

void qsort_r(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *, void *),
           void *arg);

The first three arguments are analogous to the first two in std::sort (specify begin/end of array along with size of each element). The fourth argument is a comparison function, and the fifth argument is a pointer to some data which is passed as the third argument of the comparison function. For example in my code I used the following comparison function and qsort_r call,

int compare_indices(const void *left, const void *right, void *ptr){
  double *out_thresh = (double*)ptr;
  return out_thresh[* (int*)left] > out_thresh[* (int*)right];
}
int aum_sort(int err_N, int *out_indices, double *out_thresh){
  qsort_r(out_indices, err_N, sizeof(int), compare_indices, out_thresh);
}

The idea is that the left and right are actually pointers to int (indices from 0 to N-1 used to select elements of several arrays of size N, sorted by values of out_thresh). Note that the sign of the inequality is reversed with respect to the anonymous C++ comparison function we saw earlier.

All of this works fine on Ubuntu, but it did not compile when I tried on windows, using g++ in rtools40. The compiler told me:

"C:/rtools40/mingw64/bin/"g++ -std=gnu++11  -I"C:/PROGRA~1/R/R-40~1.2/include" -DNDEBUG -fopenmp -I'C:/Program Files/R/R-4.0.2/library/Rcpp/include'        -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign -c aum_sort.cpp -o aum_sort.o
aum_sort.cpp:65:3: error: 'qsort_r' was not declared in this scope
   qsort_r(out_indices, err_N, sizeof(int), compare_indices);
   ^~~~~~~
aum_sort.cpp:65:3: note: suggested alternative: 'qsort_s'

Hm, so qsort_r does not exist on windows, but qsort_s does? The s is for safe, and it is documented on cppreference,

errno_t qsort_s( void *ptr, rsize_t count, rsize_t size,
                 int (*comp)(const void *, const void *, void *),
                 void *context );

So I thought, that is basically the same prototype as qsort_r so I can just use qsort_s instead, right?

qsort_s(out_indices, err_N, sizeof(int), compare_indices, out_thresh);

Actually that gave me a compilation error as well,

aum_sort.cpp:65:44: error: invalid conversion from 'int (*)(const void*, const void*)' to 'int (*)(void*, const void*, const void*)' [-fpermissive]
   qsort_s(out_indices, err_N, sizeof(int), compare_indices, out_thresh);
                                            ^~~~~~~~~~~~~~~

The error says that the comparison function I provided has prototype int (*)(const void*, const void*, void*) but qsort_s requires a comparison function with prototype int (*)(void*, const void*, const void*) (non-const pointer for first argument not third). Where does this come from? Why is it inconsistent with the docs?

Well it seems that the qsort_s standard shown on cppreference is not obeyed by all compilers, but the Microsoft docs seem to be consistent with my compiler,

void qsort_s(
   void *base,
   size_t num,
   size_t width,
   int (__cdecl *compare )(void *, const void *, const void *),
   void * context
);

So I got it working on windows by changing the order of the arguments in the comparison function. Then I went back to my Ubuntu laptop and I tried to compile it, but again I got an error, this time saying that qsort_s is not available, try qsort_r instead! So after doing a few web searches, it seems that qsort_r is non-standard, with Mac OS and BSD providing prototypes that are consistent with qsort_s and inconsistent with linux qsort_r:

//Mac
void
qsort_r(void *base, size_t nel, size_t width, void *thunk,
  int (*compar)(void *, const void *, const void *));
//BSD
void
qsort_r(void *base, size_t nmemb, size_t size, void *thunk,
  int (*compar)(void *, const void *, const void *));

How confusing! I hope that a standard is adopted at some point.

Thread-safe C qsort

So to get a portable C code that can be used in all common platforms (mac, windows, linux) that R supports, I decided to use regular qsort instead of the s/r variants. The easiest way to do that is by creating a (file-local) static variable that can be written from my aum_sort function, then read from my comparison function,

#include <stdlib.h>//qsort
#include <stdio.h>//printf
#include <pthread.h>//pthread_self
static double *sort_thresh; 
int compare_indices(const void *left, const void *right){
  return sort_thresh[* (int*)left] > sort_thresh[* (int*)right];
}
int aum_sort(int err_N, int *out_indices, double *out_thresh){
  sort_thresh = out_thresh;
  printf("Thread=%d, ptr=%p\n", pthread_self(), &sort_thresh);
  qsort(out_indices, err_N, sizeof(int), compare_indices);
}

This works fine as well, but there is one potential issue: what happens if aum_sort is called in several threads at the same time? Since the sort_thresh variable is static, it would be shared between threads. If the second thread writes to sort_thresh before the first thread has finished the sort, then the results could be incorrect! How to avoid this potential issue?

First, can we observe the issue? If we call aum_sort from R, it is unlikely, since parallel programming in R usually involves forking rather than threading. That is when you call parallel::mclapply in R, there is no multi-threading, only forking. To observe the issue we can create a new Rcpp interface function with multi-threading via OpenMP,

#include <omp.h> 
// [[Rcpp::export]]
void multithread
(const Rcpp::DataFrame err_df,
 const Rcpp::NumericVector pred_vec,
 int threads){
#pragma omp parallel for
  for(int i=0; i<threads; i++){
    aum_sort_interface(err_df, pred_vec);
  }
}

Note that to compile this code we need to add the following to src/Makevars in the R package:

PKG_CPPFLAGS=-fopenmp
PKG_LIBS=-lgomp -lpthread

So if we call this function from R we get

> aum:::multithread(models, predictions, 2)
Thread=2, ptr=0000000061c8b340
Thread=1, ptr=0000000061c8b340

That shows that the two threads access the same pointer, which is dangerous! How to avoid this? We can use the __thread keyword, as explained in the gcc docs,

static __thread double *sort_thresh;

Re-compiling and running the code then gives us:

> aum:::multithread(models, predictions, 2)
Thread=1, ptr=00000000137536c8
Thread=2, ptr=0000000013753628

The problem has been fixed! Wow, C is complicated! C++ is much easier!