You think that knowing ANSI C is enough? Well, okay, it's enough. But I found that some gcc extensions are very useful.
All gcc C extensions are listed in the manual.
Here are the extensions I use:
Labels as values
I remember the old assembler days, when you could jmp to to an address from an array. Nowadays in C, you can create an array of pointers to functions and call pointer from array. But jmp is faster and sometimes you don't want to create many small, almost useless functions.
In gcc you can use a trick that's called labels as values. It's like that:
int main(){
int value = 2;
const void *labels[] = {&&val_0, &&val_1, &&val_2};
goto *labels[value];
val_0:
printf("The value is 0\n");
goto end;
val_1:
printf("The value is 1\n");
goto end;
val_2:
printf("The value is 2\n");
goto end;
end:
return(0);
}
Macros as functionsHere is a standard example of a macro with statements that returns a value. It's also a great example of
typeof operator in
gcc.
#define MAX(a,b) \
({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b ? _a : _b; })
Case rangesGcc allows to use
ranges in case statement. In ANSI C you'd have to enumerate all the values.
This code is much prettier:
int main(){
int value = 7;
switch(value){
case 0 ... 5:
printf("The value is 0 - 5\n");
break;
case 6 ... 10:
printf("The value is 6 - 10\n");
break;
default:
printf("The value is ?\n");
}
return(0);
}
LikelyHave you ever wondered how does
gcc convert
if statement to assembler, how many branches are created by the compiler? For example here, how many
jmp instructions processor must do in the
foo() function?
int foo(int i){
if(i == 0)
return(1);
return(2);
}
int main(){
foo(0);
}
In this case you know more than a compiler. You know that all callings to the
foo() are going to be with "0" as the parameter. You can suggest compiler to optimize this by using
__builtin_expect and possibly reduce number of jumps.
int foo(int i){
if(__builtin_expect(i == 0, 1))
return(1);
return(2);
}
There are often defined macros:
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
Vector extensionsThis very powerful extension allows to use
mmx/sse code in a portable way.
Gcc would choose the best possible extensions during compile time. When you compile code without mmx/sse options, the binary will remain compatible with i386. The downside is that this extension doesn't allow to use all of the features of mmx/sse.
From my point of view the most important feature is to use the bitwise commands. For example
this would do binary or on 128 bits:
typedef union{
int v __attribute__ ((vector_size (16)));
int i[4];
} __vec_int;
int main(){
__vec_int a, b, c;
a.i[0] = 1; a.i[1] = 1; a.i[2] = 1; a.i[3] = 1;
b.i[0] = 1; b.i[1] = 2; b.i[2] = 3; b.i[3] = 4;
/* do OR here on 128 bits */
c.v = b.v | a.v;
...
}
Sometimes there is a need of using more advanced mmx commands than vector extensions allow. For example you may want to do
four compares at a time. Unfortunately this code is specific for sse2 (
-msse2).
int main(){
__vec_int a,b;
a.v = set_vec_int(1,2,3,4);
b.v = set_vec_int(4,3,2,1);
/* compare greater than on __m128i */
__vec_int c;
c.v = _mm_cmpgt_epi32(a.v, b.v); /* 0xFFFFFFFF if gt, 0 else*/
printf("a=[%i,%i,%i,%i]\n", a.s[0], a.s[1], a.s[2], a.s[3]);
printf("b=[%i,%i,%i,%i]\n", b.s[0], b.s[1], b.s[2], b.s[3]);
printf("c=[%u,%u,%u,%u]\n", c.s[0] ? 1 : 0, c.s[1] ? 1 : 0, c.s[2] ? 1 : 0, c.s[3] ? 1 : 0);
return(0);
}
Command used here
_mm_cmpgt_epi32 is ported from Intel compiler, the best documentation for this dialect I found on
Microsoft sites. You may want to use
gcc equivalent. By the way, don't forget that the sse2 registers (
__m128i) must be aligned to 16 bytes (
-mpreferred-stack-boundary=4).
Cache stuffSometimes it's possible to optimize code by prefetching data. It's
quite simple to prefetch data to one line of cache:
__builtin_prefetch(&data[offset], 1); /* 0 - read only, 1 - rw */
On the other hand you may want to store the data without polluting the caches. If you only want to write data (and it's not currently loaded to cache) it could be faster than standard storing. I use this defs (oh. it's mmx specific):
#ifndef MMX
#define STORE(ptr, val) \
do{ \
*(ptr) = (val); \
}while(0)
#else
#define STORE(ptr, val) \
do{ \
_mm_stream_si32( (ptr), (val) ); \
}while(0)
#endif
More?There are more interesting extensions. Like
advanced assembler operands or
macros with variable number of arguments.