The ugliest C feature: <tgmath.h>

<tgmath.h> is a header provided by the standard C library, introduced in C99 to allow easier porting of Fortran numerical software to C.

Fortran, unlike C, provides “intrinsic functions”, which are a part of the language and behave more like operators. While ordinary (“external”) functions behave similarly to C functions with respect to types (the types of arguments and parameters must match and the restult type is fixed), intrinsic functions accept arguments of several types and their return type may depend on the type of their arguments. For example Fortran 77 provides, among others, an INT function which accepts Integer, Real, Double or Complex arguments and always returns an Integer, and a SIN function which accepts Real, Double or Complex arguments and returns a value of the same type.

This helps the programmer somewhat because the function calls don’t have to be changed if variable types change. On the other hand user-defined functions can’t behave this way, so the additional flexibility is really limited to single subroutines that don’t need to call user-defined functions.

Some C programmers would call the feature ugly from the above description already, for the same reason integrating printf into the language would be ugly.

This functionality was incorporated in C99 together with other features for better support of numerical computation and it is provided in the abovementioned <tgmath.h> header. Provided are goniometric and logarithmic functions, functions for rounding and a few others. The header defines macros that shadow the existing functions from <math.h>; e.g. the cos macro behaves like the cos function when its parameter has type double, like cosf for float, cosl for long double, ccos for double _Complex, ccosf for float _Complex, ccosl for long double _Complex. Finally, when the parameter has any integer type, the cos function is called, as if the parameter were implicitly converted to double.

The second reason why this feature is ugly is that it attempts to imitate functions, but the imitation is imperfect and even dangerous: If you try to pass the generic macro cos as a function parameter, you actually always supply the cos function operating on doubles because the macro expansion doesn’t happen when cos is not followed by a left parenthesis.

The final reason why this feature is ugly is that such macros can’t be implemented in strictly conforming C, they have to rely on some kind of compiler support – and experience (e.g. the speed with which bugs in the glibc implementation are discovered) seems to suggest this features is used very rarely and doesn’t deserve to be a part of the “core language”, especially because the underlying feature is not available. (Contrast this to <stdarg.h>, which is available for portable use.)

Now, if the feature is both ugly and not used in practice, why mention it at all? I’m writing this article because I have examined the glibc implementation and it is such an ingenious hack that I feel it should be recorded for posterity, in some better way than this commit message:

2000-08-01  Ulrich Drepper  <drepper@redhat.com>
            Joseph S. Myers  <jsm28@cam.ac.uk>

* math/tgmath.h: Make standard compliant. Don't ask how.

The most straightforward way to mimic the Fortran compilers is to use a very simple macro: (I’ll be using cos for most examples; the semantics of other macros is similar.)

#define cos(X) __tgmath_cos(X)

The compiler would then treat __tgmath_cos as a built-in operator and transform it to one of the function calls in the frontend.

The cleanest solution I have seen proposed is to add basic function overloading support to the compiler frontend which can be activated using a vendor extension (the compiler would otherwise be required by the C standard to diagnose incompatible declarations of a single identifier):

#define cos(X) __tgmath_cos(X)
#pragma compiler_vendor overload __tgmath_cos
double __tgmath_cos (double x)
{return (cos) (x); }
float __tgmath_cos (float x)
{return cosf (x); }
long double __tgmath_cos (long double x)
{return cosl (x); }

(Warm-up exercise: Why are the parentheses around cos in the definition of __tgmath_cos (double) necessary?)

Of course implementing this for the sole purpose of <tgmath.h> is a lot of work (although it can perhaps share code with the C++ frontend, if any). Nobody should want to use such an unportable extension in their C programs, and there are probably very few programs using <tgmath.h>, so it is probably not worth the effort to extend the compiler this way.

The glibc implementation has had to rely to extensions present in already used gcc versions, so the implementation is rather complex.

First, let’s implement a macro that can choose the right function type. Because C doesn’t support conditional macro expansion, the conditions will have to be in the expanded code. We want something like

#define cos(X) \
  (X is real) ? ( \
    (X has type double \
     || X has an integer type) \
    ? (cos) (X) \
    : (X has type long double) \
    ? cosl (X) \
    : cosf (X) \
  ) : (
    (X has type double _Complex) \
    ?  ccos (X)

and it turns out writing the conditions above is quite easy:

  • X is real” is sizeof (X) == sizeof (__real__ (X))
  • X has an integer type” is (__typeof__ (X))1.1 == 1.

    (Medium-difficulty exercise: (__typeof__ (X))0.1 == 0 is incorrect. Why?)

    (glibc actually uses __builtin_classify_type, an internal gcc built-in, in some cases, and something similar to the above tests in other cases.)

  • X has type double/long double/float” can be determined using sizeof. This won’t work precisely on architectures where some C types are mapped to the same hardware type, but on those architectures there is no difference between operations on those types, and the outside C program can’t recognize the difference. By the C “as-if” rule, this is good enough.

Good, so our cos macro now can select the correct function and call it. Unfortunately it always returns a result of type long double _Complex because the ? : operators above return a value with the type of the “usual arithmetic conversions” between the second and third operands of the operator.

We can avoid these type conversions in favor of a type we choose by using another gcc extension, statement expressions:

#define cos(X) ({ result_type __var; \
  if (X is real) { \
    if ((X has type double) \
        || (X has an integer type)) { \
      __var = (cos) (X); \
    else if (X has type long double) \
      __var = cosl (X); \

  __var; })

Now the result of the macro will always have the type result_type, and the problem is solved.

Is it?

It’s not, actually. How do we define result_type? For floating-point types we can just use __typeof__ (X), but we want double for integer arguments, and C doesn’t have a ? : operator for types, does it?

The two exercises above are not there because I’m a teacher and I’d like to check your progress or something, they are there to prepare you for the final, difficult exercise – or to scare you off before you get to this part (well, I’m sure I have bored everyone to death and nobody is reading this anyway :) ). Although the context of the exercise is a big hint already, it is probably not enough, so here goes:

Difficult exercise: What is the difference between results of

1 ? (int *)0 : (void *)0

and

1 ? (int *)0 : (void *)1

and why does the difference exist?

Unlike the first two exercises, which can be solved with a bit of research and experimentation, this one (especially the “why” part) probably requires reading of the C standard, so I’ll explain it here.

First, it is necessary to explain the following concepts:

  • An integer constant expression is simply an integer expression with a constant value, from the point of view of the compiler: the compiler is able to compute the constant value without any optimization except for constant folding. In particular the expression doesn’t use values of any variables.
  • A null pointer is a pointer value equal to the integer 0. A null pointer can have any pointer type.
  • A null pointer constant is a syntactic construct. The value of a null pointer constant, when converted to a pointer type, is a null pointer (“null pointer, the value”, described above). The NULL macro expands to a null pointer.

    Because a null pointer constant is a syntactic construct, it has a syntactic definition: it is either an integer constant expression with value 0, or such an expression cast to void *. For example 0, 0L, 1 - 1, (((1 - 1))), (void *)0 and (void *)(1 - 1) are null pointer constants, 1, (int *)0 and (void *)1 are not null pointer constants.

    (OK, so it is not really a syntactic construct when it’s definition refers to a value of an expression, but it’s best to think of it as if it were a syntactic construct. In most cases the “integer constant expression with value 0″ is a literal 0 anyway.)

Now we can look at the standard’s section 6.5.15, paragraph 6, which refers to the conditional operator ? : and says, among other things:

If both the second and third operands are pointers …, the result type is a pointer … . Furthermore, if both operands are pointers to compatible types …, the result type is a pointer to … the composite type; if one operand is a null pointer constant, the result has the type of the other operand; otherwise, … the result type is a pointer to … void.

Thus, in

1 ? (int *)0 : (void *)0

the third operand is a null pointer constant, so the result is (int *)0. In

1 ? (int *)0 : (void *)1

the third operand is not a null pointer constant, so the result is (void *)0. This is our conditinal operator for types, now we just need to polish it a bit.

Note that (X has an integer type) is an integer constant expression. Therefore the result of

1 ? (__typeof__ (X) *)0 : (void *)(X has an integer type)

is (void *)0 if X has an integer type and (__typeof__ (X) *)0 otherwise, and the result of

1 ? (int *)0 : (void *)(!(X has an integer type))

is (int *)0 if X has an integer type and (void *)0 otherwise. Note that in each case the result of one of the expressions is (void *)0.

Let’s call the above expressions E1 and E2. Then the result of

1 ? (__typeof__ (E1))0 : (__typeof__ (E2))0

is (int *)0 if X has an integer type and (__typeof__ (X) *)0 otherwise; again, note that one of the second and third expressions is a null pointer constant.

Finally, we can define result_type as

__typeof__ (*(1 ? (__typeof__ (E1))0 : (__typeof__ (E2))0))

That’s it. The rules for macros with more than one argument are a bit more complicated, but all the building blocks are described above.

This entry was posted in Programming. Bookmark the permalink.

17 Responses to The ugliest C feature: <tgmath.h>

  1. Ádám says:

    Hahahahaaaaa, I just LOVED this article :D . Managed to tackle the first two excercises, but was like WTF when saw the third one. That paragraph from the standard is interesting, though.

  2. Drew Thaler says:

    OK, I’ll bite. :-) Why is (typeof (X))0.1 == 0 incorrect?

    It can’t be a positive/negative zero thing because you’re controlling the inputs and you know that 0.1 will become +0 and not -0. I can’t think of any types or special values that it would fail for. I looked for oddball gcc warnings that it might trigger but didn’t find any. What am I missing?

  3. mitr says:

    (typeof X))0.1 == 0 fails to handle a corner case: (_Bool)0.1 == 1

    This is because conversion to _Bool works similarly as interpreting of arbitrary values as the condition of if – anything non-zero (even 0.1) is considered true.

    Positive/negative zeros don’t enter into this, you can verify that (+0.0 == copysign(0.0, -1.0)).

  4. Drew Thaler says:

    D’oh. Missed that one because I was thinking about numbers and cos(_Bool) makes no sense. But you’re right, it’s an integer type. Thanks, that was bugging me.

  5. quark says:

    It’s confusing. Since (int*)0 is also a null pointer,

    1 ? (int *)0 : (void *)0

    can be (void*) 0 as well.

    Seems g++ regard it as (void*): codepad.org/AGMQBCwB

  6. mitr says:

    The post talks about C, not C++; the rules in C++ are different.

    In particular, only an integer constant expression with zero value is a null pointer constant, the “or such an expression cast to void *” clause is missing in C++. Therefore, neither (void *)0 nor (int *)0 is an integer constant in C++.

    The rules for the ? : operator are similar (see 5.16p6 and 5.9p2 in C++98), but the fact that (void *)0 is not a null pointer constant in C++ breaks this technique.

    Of course, in C++ <cmath> always provides the overloaded definitions of trigonometric functions, making <tgmath.h> unnecessary.

  7. quark says:

    Leave C++ aside. I mean “if one operand is a null pointer constant, the result has the type of the other operand” is confusing when both operands are null pointer constants as the example “1 ? (int *)0 : (void *)0″ shows.

  8. mitr says:

    The point is that (int *)0 is not a “null pointer constant” as that term is used in the ISO C standard.

  9. quark says:

    Sorry for my careless understanding. Realize null pointer constant is only (void*) or integer type and is not null pointer despite these two phrases look similar.

    Thanks for your explanation :)

  10. Rhys Ulerich says:

    It would be slick to see this approach used to provide, e.g., precision-agnostic BLAS bindings. From a peek at http://fossies.org/unix/misc/glibc-2.14.1.tar.gz/dox/math2tgmath8h_source.html it appears many of the tgmath.h macros could be adopted for this purpose without too much fuss. The only catch might be the additional separate handling of floats vs doubles and float _Complex vs double _Complex. It would be nice to write “axpy” and get a saxpy, daxpy, caxpy, or zaxpy as appropriate.

  11. mitr says:

    The new ISO C standard (“C11″) allows this (also making <tgmath.h> less ugly in the process), using the new _Generic keyword.

  12. Kyle Horne says:

    This is an interesting article on the implementation of a Fortran feature in C. I spend most of my coding time in Fortran, and can actually add some more modern information to this discussion, since we are now using Fortran 2003, and contemplating Fortran 2008. Neither of these languages resemble Fortran 77 all that closely, but there we are.

    In Fortran versions since Fortran 90, operator and procedure overloading have been supported through the use of interface blocks. The interface is given a generic name by which its routines will be accessed. The selection of routine from the list is handled at compile time and corresponds to an argument type match. For example, if I wanted to implement a function that operated on both integers and reals, I could do so with a module such as the following:

    module example_mod
       implicit none
    
    interface f
          module procedure fi
          module procedure fr
    end interface
    
    contains
    
    real function fr(x)
          real,intent(in)::x
          fr = x**2-1.0
    end function fr
    
    integer function fi(x)
          integer,intent(in)::x
          fi = x**2-1.0
       end function fi
    
    end module example_mod
    

    This module could then be used in a program such as the following:

    program example*prg
       use example*mod
       implicit none
    
    write(*,*) f(1)
    write(*,*) f(1.0)
    end program example_prg
    
  13. Kyle Horne says:

    That last comment I made was not formatted the way that I expected, there should be a lot of newlines in there that are missing… :(

  14. mitr says:

    Thanks for this addition!

    I have tried to restore formatting. If I have broken the example, please tell me and I’ll try to fix it again.

  15. Ed Schouten says:

    Yes. <tgmath.h> is ugly.

    On FreeBSD, we used to implement <tgmath.h> on top of __builtinchooseexpr and __builtintypescompatible_p, but the problem with this is that for most of the macros in this header, the expansion becomes quite big. Already for cos(x), preprocessor expansion causes the code to become 3166 characters long. This also caused GCC/Clang to become quite slow.

    As FreeBSD 10 ships with Clang by default, we also now have a version that is based on top of C11 _Generic. Even though one of the C11 proposals included an example how to implement <tgmath.h> on top of _Generic, we have actually implemented it in a way that’s a bit simpler.

    To briefly summarize it, for functions with multiple arguments, we depend on one of the existing operators (+) to properly promote the type for us. To make this work properly for non-float-types, we use another _Generic to ensure that the inputs are always converted to float types.

    See lines 65-84 for the C11 version of our <tgmath.h>:

    http://svnweb.freebsd.org/base/head/include/tgmath.h?view=markup#l65

  16. kmackay says:

    Since we want the result type to be double if X is an integer type, shouldn’t E2 be 1 ? (double *)0 : (void *)(!(X has an integer type))?

  17. DWG says:

    The worst thing about this feature is it’s a flourish. Before C99, you could port your FORTRAN code to a C compiler, you just had to understand the bugs in type handling in your FORTRAN code. After C99, you could leave all your bugs in your FORTRAN. Since the C99 committee declined to specify how the tgmath.h file would be implemented (hint : there was a chance to add a new and useful type-handling feature), this part of the specification adds nothing and subtracts simplicity from the language. Thanks Fortransients !

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>