No argument unpacking in C
Today I was wondering if it is possible to do argument unpacking in C. What is argument unpacking? It is a super useful programming technique which allows you to use data instead of code to define how you want to call a function. For example in python we can unpack a list to positional arguments via
def fun(left, right):
return left*2 + right
args = [5, 3]
fun(args[0], args[1])
## 13
fun(*args) # same as above!
## 13
And we can unpack a dict to named arguments via
kwargs = {"right":3, "left":5}
fun(right=kwargs["right"], left=kwargs["left"])
## 13
fun(**kwargs) # same as above!
## 13
In R the equivalent is
fun <- function(left, right)left*2 + right
## Unpacking un-named list to positional arguments.
args.list <- list(5, 3)
fun(args.list[[1]], args.list[[2]])
## [1] 13
do.call(fun, args.list) # same as above!
## [1] 13
## Unpacking named list to named arguments.
kwargs.list <- list(right=3, left=5)
fun(right=kwargs.list$right, left=kwargs.list$left)
## [1] 13
do.call(fun, kwargs.list) # same as above!
## [1] 13
Exercise for the reader: you can do named/positional unpacking at the same time in both R (via a list with named and un-named elements) and python (by using star and double star in the same call). That is probably confusing and I would recommend avoiding if at all possible.
Variable number of arguments
I was wondering, can we do something similar in C? I did not see any mention in the C book Chapter 4, function nor Section 5.6, Pointers to functions. Section 9.9 Variable numbers of arguments discusses the inverse concept: defining a function which allows a variable number of arguments. Whereas argument unpacking is used to turn data into function arguments, variable arguments allows turning code/arguments into data. For example in python we can designate a list variable to capture all positional arguments via single star:
def sum_of_squares(*args):
return sum([x**2 for x in args])
sum_of_squares(1)
## 1
sum_of_squares(2, 3)
## 13
And we can designate a dict variable to capture all keyword arguments using the double star:
def print_names_values(**kwargs):
for name, value in kwargs.items():
print("name=%s, value=%s"%(name,value))
print_names_values(a=1)
## name=a, value=1
print_names_values(b=2, c=3)
## name=b, value=2
## name=c, value=3
Similarly in R we can do
## ... with un-named arguments:
sum_of_squares <- function(...)sum(c(...)^2)
sum_of_squares(1)
## [1] 1
sum_of_squares(2,3)
## [1] 13
## ... with named arguments:
print_names_values <- function(...){
named.vec <- c(...)
out.vec <- sprintf("name=%s value=%s\n", names(named.vec), named.vec)
cat(out.vec, sep="")
}
print_names_values(a=1)
## name=a value=1
print_names_values(b=2, c=3)
## name=b value=2
## name=c value=3
In R it is somewhat more complicated than python as ...
is a unique
data type, which is typically used with c(...)
or list(...)
to
create a (possibly named) vector or list from the arguments.
In C we use stdarg.h
macros/functions va_start(va_list my_list, last_arg_before_dots)
,
value=va_arg(my_list, type)
, va_end(my_list)
. Note that there must
be at least one other argument which defines how many additional
arguments there are, for example an int number of arguments, or the
first arg of printf which is parsed to determine the number of format
substrings such as %d
present.
How does R implement argument unpacking for C functions?
The R system is implemented in C, and it supports calling user-defined
C/C++/FORTRAN functions, each with a user-defined number of
arguments. For example
penaltyLearning/src/interface.cpp
defines several interface functions, each with a different number of
arguments. We use a R_CMethodDef
array to associate each C function
pointer with a name and an expected number of arguments,
R_CMethodDef cMethods[] = {
{"modelSelectionQuadratic_interface", (DL_FUNC) &modelSelectionQuadratic_interface, 5},
{"modelSelectionFwd_interface", (DL_FUNC) &modelSelectionFwd_interface, 6},
{"modelSelection_interface", (DL_FUNC) &modelSelection_interface, 5},
{"largestContinuousMinimum_interface", (DL_FUNC) &largestContinuousMinimum_interface, 4},
{NULL, NULL, 0}
};
Similarly, when we use the excellent Rcpp interface, a
RcppExports.cpp
file is generated with a similar R_CallMethodDef
array:
static const R_CallMethodDef CallEntries[] = {
{"_FLOPART_get_label_code", (DL_FUNC) &_FLOPART_get_label_code, 0},
{"_FLOPART_FLOPART_interface", (DL_FUNC) &_FLOPART_FLOPART_interface, 6},
{NULL, NULL, 0}
};
Again the code above associates each C function pointer with a name and an expected number of arguments. So in the C source code of R, there must be something that calls these functions using these pointers with a list of data from R, similar to argument unpacking. How does that work?
Looking in src/main/dotcode.c in R source code shows that there is a huge block of code with a switch over the number of arguments:
SEXP attribute_hidden R_doDotCall(DL_FUNC ofun, int nargs, SEXP *cargs,
SEXP call) {
VarFun fun = NULL;
SEXP retval = R_NilValue; /* -Wall */
fun = (VarFun) ofun;
switch (nargs) {
case 0:
retval = (SEXP)ofun();
break;
case 1:
retval = (SEXP)fun(cargs[0]);
break;
case 2:
retval = (SEXP)fun(cargs[0], cargs[1]);
break;
...
case 65:
retval = (SEXP)fun(
cargs[0], cargs[1], cargs[2], cargs[3], cargs[4],
cargs[5], cargs[6], cargs[7], cargs[8], cargs[9],
cargs[10], cargs[11], cargs[12], cargs[13], cargs[14],
cargs[15], cargs[16], cargs[17], cargs[18], cargs[19],
cargs[20], cargs[21], cargs[22], cargs[23], cargs[24],
cargs[25], cargs[26], cargs[27], cargs[28], cargs[29],
cargs[30], cargs[31], cargs[32], cargs[33], cargs[34],
cargs[35], cargs[36], cargs[37], cargs[38], cargs[39],
cargs[40], cargs[41], cargs[42], cargs[43], cargs[44],
cargs[45], cargs[46], cargs[47], cargs[48], cargs[49],
cargs[50], cargs[51], cargs[52], cargs[53], cargs[54],
cargs[55], cargs[56], cargs[57], cargs[58], cargs[59],
cargs[60], cargs[61], cargs[62], cargs[63], cargs[64]);
break;
default:
errorcall(call, _("too many arguments, sorry"));
}
return retval;
}
So the C source code of R does not show any evidence of any special unpacking syntax.
How does python do it?
How does Python call C code? For example I recently coded
interface.c
which defines a python/C++ interface function
ModelSelectionInterface. It is declared as using METH_VARARGS
which
means there are always two arguments, as documented in Common Object
Structures:
“This is the typical calling convention, where the methods have the
type PyCFunction. The function expects two PyObject* values. The first
one is the self object for methods; for module functions, it is the
module object. The second parameter (often called args) is a tuple
object representing all arguments.” That doc page explains that Python
allows several other METH_SOMETHING
values, each of which have a
specified number of arguments, one or more of which is a tuple/array
representing any number of arguments. So python does not have a limit
on the number of arguments, except for the size of a tuple/array.
In python the number and types of arguments are defined by
PyArg_ParseTuple
which is arguably cleaner than the R solution.
What about C++?
Does argument unpacking exist in C++? Yes, since C++17 or C++20. There
is some support via
std::apply (for
callables, here is another blog post with a discussion and
example)
and
std::make_from_tuple
(for instantiation/construction).
Could R be re-implemented in C++ to
take advantage of this new feature, and avoid that huge switch block?
For the .Call
interface it could, because it could use std::apply
with a tuple of SEXP. For the .C
interface it could do something
similar with tuple of pointers to int/double/etc.
What about the inverse operation, variable number of arguments? The
C++ equivalent of C’s va_arg
etc is described as variadic on
parameter_pack.