Structs and Unions in C

in #structs7 years ago

In previous threads, we have seen how arrays can contain more than one valoue in one container. However, arrays can only contain values of the same datatype. In C, we can store values of different dataypes in in the same container. To do this kind of operation, storing different types in a single container, we will use structures and unions.

Structures

A structure is an aggregate dataype, like an array. In arrays, each individual item can be selected using its subscript or through a pointer indirection. In structures, each member receives a name to be acessed, since each of them could have a different size, making the acess trough arithmetic very cumbersome.
A struct may contain a tag, which allow its variable members to be called by a name, a body delimited by curly braces containing its members and the variable list.
The prototype of a structure is:


struct tag {members} variables;

The tag and variables fields are optional.
Lets create a simple structure to demonstrate:


struct {
    int x;
    char y;
    double z;
} s;

Here we have a strutucture containing 3 members, a int, a char and a double variables. The variable list is declared as s. Declaring the structure with the s variable is like declaring a variable, lets say an int and giving it a name s:


int s

So, as we can declare more than variables of the same type in one line:


int s1, s2, s3;

we can also do the same with structs:


struct {
    int x;
    char y;
    double z;
} s1, s2, s3;

Now, we have 3 different structs variables containing the same datatypes in its bodies.

The tag field allows us to call subsequent declarations with a name, even if we havent declared a variable list, now they can be declared in separate instructions by using the struct tag to associate the variable with the tagged struct.


struct tagged {
    int x;
    int y;
    double z;
};
struct tagged s1;

struct tagged s2;


Now we have a struct with no variable list but with a tag called tagged.
Arrays of structs, pointers to structs and even structs of structs are allowed, however, a struct cannot contain itself as one of its members, so the following code would be invalid:


struct tagged {
int x;
int y;
double z;
struct tagged pg;
} f;


Now, the arrays of structs, pointers to structs and another struct within a struct, you would have no problem:


struct lolo {
    int po;
} p;

struct tagged {
    int x;
    int y;
    double z;
    struct lolo pg;
} f[20], *g;

struct tagged s1;
return 0;

If you are coming from an object oriented programming language, the structs may remind you of classes and objects you are used to work with. In fact, languages like C# have their own structs datatypes. However they are more advanced structs than C, being similar to classes themselves, being capable of creating methods (methods are like functions which belong to a class), constructors, and even more, depending of the programming language you are working with.

We can also declare subsequent structs without tags, using the keyword typedef(If you remember, typedef will define a new name for a C datatype):


typedef struct {
    int point1;
    int point2;
    int point3;
} triangle;

triangle t1;
triangle t2;

Acessing member of the struct

The members of the struct can be acessed with the dot(.) operator.


typedef struct {
    int point1;
    int point2;
    int point3;
} triangle;

triangle t1;


t1.point1 = 15;
t1.point2 = 30;
t1.point3 = 23;

printf("triangle points are %d, %d, %d\n", t1.point1, t1.point2, t1.point3);

img1.jpg

As we can see, the left operand of the dot is the struct name and the right operand is the name of the member.
Now lets create a struct inside another struct. To acess the members of the struct inside the struct, all we have to dot is keep adding dots for each nested struct:


struct s1 {
    int x;
};

typedef struct {
    struct s1 n ;
} nest;

nest n1;
n1.n.x = 13;

printf("nest member: %d\n", n1.n.x);

img2.jpg

In the last example, we have a struct s1 containig an int x, inside the defined struct nest containing a variable n of the type s1. We then declare a variable of type n1. Since we have to acess a variable inside a struct from n1, all we had to do is add another dot to reach the member inside n from n1.

Structs members can also be initialized when declaring the struct. All you have to do is declare your struct, name it and add the values to each member inside a curly brackets:


struct s1 {
    int a;
    int b;
    int c;
} st = {5, 10, 15};

printf("%d, %d, %d\n", st.a, st.b, st.c);

img3.jpg

We can see also another important behavior if you are perceptive. When you declare a struct, the struct becomes a datatype itself. Again if you are coming from an object oriented language, the same happens when you declare a class. Your struct now can can be used in the same contexts as a datatype, even being able to be passed as a argument to a function.

Acessing structs members indirectly

With the dot operator, we can acess structs members directly. But what happens if we have a pointer to a struct, like the exemple below:


struct s1 {
    int x;
    int y;
};

struct s1 st;
st.x = 88;
st.y = 16;

struct s1 *ps = &st;

In this case, there are 2 possibilities for us. Both perform the same operation, all that changes is the way we write our code, so whatever works for you. We can dereference the pointer with the * operator inside a parenthesis(because we must no forget the precedence), or we could dereference with the arrow operator:


struct s1 {
    int x;
    int y;
};

struct s1 st;
st.x = 88;
st.y = 16;

struct s1 *ps = &st;

printf("struct element 1: %d\n", (*ps).x);
printf("struct element 1: %d\n", (*ps).y);

printf("struct element 1: %d\n", ps->x);
printf("struct element 1: %d\n", ps->y);

return 0;

img4.jpg

But why use a pointer to a a struct afterall? Well, since structs can become big data structures and contain many members, it would be expensive to do certain operations with them, like passing them to functions, everytime a function is called. So like a big array, it would be best to pass just a pointer to the struct instead of the whole struct and all its member elements.

Structs with referenced to themselves

A struct cannot contain another struct of the same type inside itself. Imagine what would happen to the compiler, since it doesnt know yet the size of the struct and what kind of elements it contain, how it could contain a copy of itself inside of it without knowing this kind of information?
If that would happen, the program would recurse the struct forever trying to get information about itself. One solution for this problem, is to insert a pointer which has a known size poiting to the struct itself:


struct s1 {
    int x;
    int y;
    struct s1 *ms;
};

Structs Alignment

To demonstrate the importance of elements alignment in structs, i would like to begin by showing the following sample code:


struct s1 {
int a;
char b;
int c;
char d;
};

struct s2 {
    int a;
    int b;
    char c;
    char d;
};

struct s1 d1;
struct s2 d2;

printf("struct 1 size: %d\n", sizeof(d1));  
printf("struct 2 size: %d\n", sizeof(d2));


return 0;

img5.jpg

In this code, we have 2 structs, with the same datatypes, 2 ints and 2 chars each. However, struct 1 has a size of 16 bytes, and struct 2 has a size of 12 bytes. The only difference between them is the order in which its elements are declared. So why struct 1 is bigger than struct 2? Struct 1 doesnt have its members aligned to optimze its size, while struct 2 does. IA 32 its optimized to acess datatypes trough 1 memory acess, altough it can acess datatypes trough fetching the memory more than once. So we have this situation in our struct 2 where, altough the sum of its elements would be 10 bytes (int = 4 bytes, char = 1 byte), we have its size designated as 12 bytes:

img struct2.jpg

What happens is our elements are allocated in memory for optimal retrieval, with 1 acess according to the memory alignment, which has its memory adress being adressed from 4 to 4 bytes.
Now lets take a look at the memory image of struct 1 elements:

img struct1.jpg

In this poor illustration i made(sorry, not that good designer), we can see its elements are still alligned from 4 to 4 bytes. When we allocate an int it will take its first place at the first memory adress designated for the struct, but when we declare a char after the int, it will take its memory space, reserve the size of 1 byte to allocate its contents there, and then leave a gap of 3 bytes. What would happen if that gap didnt exist? If our next element was an int with 4 bytes for example, it would have to fetch the memory twice:

strut not opmitized.jpg

In the illustration, we can see it would take the 3 free bytes remaining in memory adress A2, plus 1 free byte from memory adress A3, forcing the acess of 2 adresses, A2 and A3. Since IA32 will attemmpt to get the elements trough just 1 acess, it will just leave gaps with free memory like the image of struct 1.
If we want to decrease those gaps as much as we can, we must declare struct elements like we did with struct 2, with the bigger datatypes being declared first, followed by the smallest being declared last.

UNIONS

Unions are similar to structs, being declared in the same way, and being able to contain different datatypes:


union u1 {
    int a;
    int b;
    int c;
};

Union members can be acessed with the dot operator:


union u {
    int a;
    float b;
    
};
union u u1;
u1.a = 21;
u1.b = 27.0;

Unions will allocate space only for the biggest member datatype it contains. If your union contains a char and a int, the union will allocate 4 bytes. If it contains a int and a double, its size will be 8 bytes:


union u {
    int a;
    double b;
    
};

union u u1;
u1.a = 21;
u1.b = 27.0;
printf("u1 size, %d\n", sizeof(u1));

img6.jpg

That means they ocuppy less space than structs if they have more than one member and the same datatypes. In fact this is one of its main characteristics, unions will allocate all its members in the same memory adress, unlike structs.
But here is another different behavior of unions from structs. Since they all occupy the same memory adress, if you asign a different value to more than a member, it will affect the previous members:


union u {
    int a;
    int b;
    
};

union u u1;
u1.a = 21;
u1.b = 29;


printf("%d, %d\n", u1.a, u1.b);

img7.jpg

Unions can also be initialized in one line, like structs:


union u {
    short a;
    int b;
    
} u1 = {21};    

printf("%d, %d\n", u1.a, u1.b); 

return 0;   

img8.jpg

Unions can be a alternative to implement a type of polymorphism in C. In higher level programming languages, you probably very used to work with polymorphism, depending on how a object is called, it can have different behaviors, like implementing a method in different ways, depending on how it was called using one of its parents, interfaces and such. However in C unions, this kind of behavior is very limited.Structs and unions provide a way to not only to group different datatypes in one, but also agregate code and behavior, like passing them to functions in a primitive way.