2008-03-26

Useful C extensions (gcc specific)

You think that knowing ANSI C is enough? Well, okay, it's enough. But I found that some gcc extensions are very useful.

All gcc C extensions are listed in the manual.

Here are the extensions I use:

Labels as values
I remember the old assembler days, when you could jmp to to an address from an array. Nowadays in C, you can create an array of pointers to functions and call pointer from array. But jmp is faster and sometimes you don't want to create many small, almost useless functions.

In gcc you can use a trick that's called labels as values. It's like that:

int main(){
int value = 2;
const void *labels[] = {&&val_0, &&val_1, &&val_2};
goto *labels[value];
val_0:
printf("The value is 0\n");
goto end;
val_1:
printf("The value is 1\n");
goto end;
val_2:
printf("The value is 2\n");
goto end;
end:
return(0);
}

Macros as functions
Here is a standard example of a macro with statements that returns a value. It's also a great example of typeof operator in gcc.
#define MAX(a,b)                    \
({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b ? _a : _b; })

Case ranges
Gcc allows to use ranges in case statement. In ANSI C you'd have to enumerate all the values. This code is much prettier:
int main(){
int value = 7;
switch(value){
case 0 ... 5:
printf("The value is 0 - 5\n");
break;
case 6 ... 10:
printf("The value is 6 - 10\n");
break;
default:
printf("The value is ?\n");
}
return(0);
}

Likely
Have you ever wondered how does gcc convert if statement to assembler, how many branches are created by the compiler? For example here, how many jmp instructions processor must do in the foo() function?
int foo(int i){
if(i == 0)
return(1);
return(2);
}

int main(){
foo(0);
}
In this case you know more than a compiler. You know that all callings to the foo() are going to be with "0" as the parameter. You can suggest compiler to optimize this by using __builtin_expect and possibly reduce number of jumps.
int foo(int i){
if(__builtin_expect(i == 0, 1))
return(1);
return(2);
}
There are often defined macros:
#define likely(x)    __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

Vector extensions
This very powerful extension allows to use mmx/sse code in a portable way. Gcc would choose the best possible extensions during compile time. When you compile code without mmx/sse options, the binary will remain compatible with i386. The downside is that this extension doesn't allow to use all of the features of mmx/sse.

From my point of view the most important feature is to use the bitwise commands. For example this would do binary or on 128 bits:
typedef union{
int v __attribute__ ((vector_size (16)));
int i[4];
} __vec_int;


int main(){
__vec_int a, b, c;
a.i[0] = 1; a.i[1] = 1; a.i[2] = 1; a.i[3] = 1;
b.i[0] = 1; b.i[1] = 2; b.i[2] = 3; b.i[3] = 4;

/* do OR here on 128 bits */
c.v = b.v | a.v;
...
}
Sometimes there is a need of using more advanced mmx commands than vector extensions allow. For example you may want to do four compares at a time. Unfortunately this code is specific for sse2 (-msse2).
int main(){
__vec_int a,b;
a.v = set_vec_int(1,2,3,4);
b.v = set_vec_int(4,3,2,1);

/* compare greater than on __m128i */
__vec_int c;
c.v = _mm_cmpgt_epi32(a.v, b.v); /* 0xFFFFFFFF if gt, 0 else*/

printf("a=[%i,%i,%i,%i]\n", a.s[0], a.s[1], a.s[2], a.s[3]);
printf("b=[%i,%i,%i,%i]\n", b.s[0], b.s[1], b.s[2], b.s[3]);
printf("c=[%u,%u,%u,%u]\n", c.s[0] ? 1 : 0, c.s[1] ? 1 : 0, c.s[2] ? 1 : 0, c.s[3] ? 1 : 0);

return(0);
}
Command used here _mm_cmpgt_epi32 is ported from Intel compiler, the best documentation for this dialect I found on Microsoft sites. You may want to use gcc equivalent. By the way, don't forget that the sse2 registers (__m128i) must be aligned to 16 bytes (-mpreferred-stack-boundary=4).


Cache stuff
Sometimes it's possible to optimize code by prefetching data. It's quite simple to prefetch data to one line of cache:
__builtin_prefetch(&data[offset], 1); /* 0 - read only, 1 - rw */
On the other hand you may want to store the data without polluting the caches. If you only want to write data (and it's not currently loaded to cache) it could be faster than standard storing. I use this defs (oh. it's mmx specific):
#ifndef MMX
#define STORE(ptr, val) \
do{ \
*(ptr) = (val); \
}while(0)
#else
#define STORE(ptr, val) \
do{ \
_mm_stream_si32( (ptr), (val) ); \
}while(0)

#endif

More?
There are more interesting extensions. Like advanced assembler operands or macros with variable number of arguments.


8 comments:

Anonymous said...

Sweet... some cool stuff I never knew about. Thanks!

Anonymous said...

In the MAX macro it should be typeof instead of typedef.

Anyway the whole idea of using incompatible C extensions sucks.

majek said...

Thanks, I fixed the MAX macro.

Anonymous said...

Hopefully this artical can help popularize these to the point they're included in the standard. It was helpful to see new ideas, but I'll not be using them until they're standard.

Justin Wick said...

I personally don't use C code if I want cross-platform compatibility, but many of these tricks can be made safe on supported compilers through the judicious use of defines - anything involving branch prediction, cache prefetching, or vector operations.

I don't like being tied to a single compiler, but if you're only using one compiler for your project, why not get the most you can out of it?

Anonymous said...

for the last example about cache, which version of GCC did you use? I compile with GCC 4.1.2p2, and got error message.
#ifndef MMX
#define STORE(ptr, val) \
do{ \
*(ptr) = (val); \
}while(0)
#else
#define STORE(ptr, val) \
do{ \
_mm_stream_si32( (ptr), (val) ); \
}while(0)

#endif

Anonymous said...

Writing portable code doesn't mean you can't take advantage of platform-specific features. For example, you can safely define macros that use __builtin_expect() or __builtin_prefetch() but do nothing when not compiling with GCC. Some features of GCC are useful to ensure correctness of your code and avoid stupid mistakes, for example, format string checking. Other compilers might not generate the same useful warnings but the code will work nonetheless.

Of course some features are rather gimmicks like the support of ranges with case.

cto_maverick said...

Before I came across your blog I used the labels as values feature to implement dynamic code-reordering which is useful where data/inputs can be evaluated to provide early exits from a block of code.

-David

//
// http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
// uses gcc "labels as values" feature 5.3
//
void reorder_jump_table(unsigned char *schedule) {

//
// in order to skip over unnecessary bitmap comparisons a jump table is created
// this jump-table can then be reordered dynamically to skip trivial bitmap comparisons
// the implementation uses a gcc-specific feature "labels as values" 5.3
// http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
// http://popcnt.org/2008/03/useful-c-extensions-gcc-specific.html
//
static void *jump_table[] = { &&j0, &&j1, &&j2, &&j3, &&j4, &&j5, &&j6, &&j7,
&&j8, &&j9, &&j10, &&j11, &&j12, &&j13, &&j14, &&j15,
&&terminate
};

goto *jump_table[*schedule++]; j0: printf("j0\n"); // row 0
goto *jump_table[*schedule++]; j1: printf("j1\n");
goto *jump_table[*schedule++]; j2: printf("j2\n");
goto *jump_table[*schedule++]; j3: printf("j3\n");
goto *jump_table[*schedule++]; j4: printf("j4\n"); // row 1
goto *jump_table[*schedule++]; j5: printf("j5\n");
goto *jump_table[*schedule++]; j6: printf("j6\n");
goto *jump_table[*schedule++]; j7: printf("j7\n");
goto *jump_table[*schedule++]; j8: printf("j8\n"); // row 2
goto *jump_table[*schedule++]; j9: printf("j9\n");
goto *jump_table[*schedule++]; j10: printf("j10\n");
goto *jump_table[*schedule++]; j11: printf("j11\n");
goto *jump_table[*schedule++]; j12: printf("j12\n"); // row 3
goto *jump_table[*schedule++]; j13: printf("j13\n");
goto *jump_table[*schedule++]; j14: printf("j14\n");
goto *jump_table[*schedule++]; j15: printf("j15\n");
terminate: printf("end\n");

} // reorder_jump_table()