Featuring few C features

in #programming3 years ago

Maybe you are a seasoned C programmer, maybe you are not. Anyway, maybe there are few features of modern C (and sometimes also of good old C) that you are not aware about, or that you knew but have forgot because they are just curiosities.

Let's see some of them, as a July reminder.

What about notation to get an element of an array?

You know this:

  char e_of_hello = "hello"[1];

But maybe you don't know that you can get the very same result by the following:

  char e_of_hello = 1["hello"];

Exactly. You can think of a[b] as a shorthand notation for *(a+b). Then, you must remember that a + b = b + a… Wait a moment! — you could say — one is a string and the other is a number! You can't sum apples and oranges!

Indeed, C hasn't string! C has arrays of characters, and "abc" is a special notation to write an array made of the three characters a, b, and c.

But still — you may argue — it's an array, which isn't an integer like 1!

Indeed, an expression like "abc" is a pointer (to an array of char). A pointer is nothing but a number, or, at least C handles it like if it were a number (from a CPU point of view, you can always say that an address is an integer). Since C is such a “low level” language when it comes to pointer, it allows pointer arithmetic: you can add or subtract integers from a pointer, obtaining another pointer to a different memory location. This can be a little bit dangerous, and it breaks strong typing a bit… but C isn't such a strongly typed programming language, if you think of it.

You can write this:

printf("%c\n", *("hello" + 1));

which is exactly like

printf("%c\n", "hello"[1]);

Enough of this: you've got the idea. But please, do not ever write things like 1["hello"] in your code.

Designators

Modern C (C99 and later) has designators. That is, you can initialize an array by “designating” its elements. Like this:

        int a[10] = {
                [5] = 1,
                [9] = 1
        };

The undesignated elements are initialized with 0, while a[5] and a[9] gets 1.

Do not forget also that

    int a[10];

declares an array of ten integers (indexed from 0 to 9), but it doesn't initialize it. You can do it by writing:

    int a[10] = {};

You need to know that this works “always”; also with structs, and that all “undesignated” elements are initialized to “0” (0 don't need to be actually 0 for all type — but this is another story).

Structs also has designators:

        struct {
                const char *n;
                int l;
        } s = {
                .l = 5
        };
        printf("%s : %d\n", s.n ? s.n : "NULL", s.l);

This will print NULL : 5. We initialized l with a designator, but we left n out. Hence, it was initialized with the “0” value for a pointer (which could be an actual binary 0 on many machines).

Struct copy

We tend to forget that structs are treated like values. Let us suppose we had:

        struct h {
                const char *n;
                int l;
        } s = {
                .l = 5
        };

as before. We can overwrite this s with s1 like this:

struct h s1 = { .n = "hello", .l = sizeof "hello" };
s = s1;

Of course you must be aware of the fact that the const char *, which is a pointer, points to the same memory — created by a literal string, hence it needs to be const (on many machines, trying to write into the "hello" memory will bring serious trouble: your program would crash).

If you want memory which can be modified, you need more something like this:

struct { char s[100]; int l; } s2 = { .s = "hello" };

Do not forget, anyway, that we are initializing things; that is, the value used must be known at compile time, “directly”, so to speak. That is, the following is an error:

const char known[] = "hello";
struct { char s[100]; int l; } s2 = { .s = known };

Unfortunately, even if known is an array (whose size is deduced), once you write known, this is “degraded” (it decays in)to a pointer, and it loses its “array nature”; without this, there's no clue of how many array elements to be copied. Briefly, it can't be done.

A common idiom which can be useful, is to have an “null” struct to initialize other structs; e.g.,

struct whatever z0 = {};
struct whatever running;
// ...
// maybe inside a loop
  running = z0; // clean the struct
  running.v = 5;
  if (something) {
    running.x = other;
  }
  do_something(running);

First, we assure that running, which is a struct whatever used and used again, is in an initial “state”, that is, it contains certain values by default (in this example, it is “zero”-ed); then we can assign only the members that are different, according to the program logic.

Overloading!

Often overloading is considered a “OO” features, but it is wrong. Non-OO languages may have function overloading. Modern C has it too, à-la Fortran, using macro _Generic.

À-la Fortran means this: in Fortran, you specify a single symbol-name (the overloaded function) and then which variant must be used for a certain type of argument(s). Users will use the single symbol-name, which is just a front end, and behind the curtains the function actually called will be another one, according to the type of its arguments.

By the way, this approach avoids part of the name mangling chaos: you decide the name of the specific functions. No surprises, no linking problems ahead, at least related to functions' overloads. To understand the problem better, see e.g. on wikipedia.

In modern C you can overload a symbol name using type generic functions — this is actually used by some math functions, for instance sin: users don't need to bother about using the right sin, because the language will figure it out according to the type of the argument.

if you use sin(doubleVar), the sin will be called. If you use sin(floatVar), the function sinf will be called; if you use sin(longDoubleVar), it will be sinl.

This magic is achieved with _Generic. Which looks like this:

#define func(X) _Generic((X), int: funci, \
                              long: funci, \
                              double: funcd, \
                              float: funcf, \
                              default: funcv)(X)

void funcv() {
        puts("i don't know...");
}

void funci(long a) {
        puts("long/int version");
}

void funcd(double a) {
        puts("double version");
}

void funcf(float a) {
        puts("float version");
}

And we can try with this fragment:

        func((int)5);
        func((long)6);
        func((float)1.1);
        func((double)5.5);
        func("hello");

which outputs:

long/int version
long/int version
float version
double version
i don't know...

Notice the void funcv() function. This is a rather controversial feature of C from the ancient time: this is not a function that takes no argument, but a function that takes any argument. In standard modern C, a function that takes no argument must be written as void funcv(void) — see the void in the ()? But in our case we can't do it, because we need a function which is able to digest a parameter (of any type).

As you can imagine, you can play a lot of tricks with _Generic, because well, it is “just a macro”.

define gunc(X, Y) _Generic((X), int: gunci,             \
                                 double: guncd)((X),     \
                   (_Generic((Y), int: 1,                \
                                 default: -1)))


void gunci(int a, int b) {
        printf("%d (%d)\n", a + b, b);
}
void guncd(double a, int b) {
        printf("%lf (%d)\n", a*(double)(b), b);
}

Now gunc looks like a very odd two-arguments function. The actual functions gunci and guncd are selected according to the first argument; then, these actual functions take a second argument which is 1 if the second argument type is integer, and -1 otherwise. The example code of gunci and guncd is meaningless, as the choice of the behaviour on the secondo argument… But you can see that you can play with types to select not just functions, but values. _Generic behaves like a macro which select a “string” according to the type of its argument(s).

C macro can make text which can't compile, but _Generic is a little bit more robust — just a little bit more.

You can write code which fails if the type isn't the one you want to succeed:

#define compil(X) _Generic((X), int: 1, default: abort())

This would leave the expression 1 of X is int, otherwise it will insert a call to abort. Now, you can have

1;

and the compiler will still be happy, but

int c = abort();

is this ok? abort() is void abort(void), and the compiler will complain:

error: void value not ignored as it ought to be

So,

int c = compil(argc);

is ok, because argc is int, and it will be like

int c = 1;

But

int c = compil(argv);

won't compile. Contrast with

compil(argc);
compil(argv);

which compiles, but it will abort when executed because argv isn't int.

Conclusion

Despite this many years, C is still a language we couldn't live without — think about just this: the linux kernel is written in C.

It's a powerful language — maybe too powerful for the web and java generations — yet it's “simple” but has dark, complex corners…


image.png