Strings in C

in #string7 years ago (edited)

As we saw in the lasth thread, strings are stored in arrays of chars in C programming language. It is also possible to store them as string literals, but those are immutable, unlike arrays. The characters in strings are stored in sequence followed by a NUL byte (\0) which terminates it. The C library which allows us to do string manipulation is the "string.h" library working together with "stdio.h" library and its I/O operations

String length

In last thread when we declared strings, we declared the arrays containing its size was declared automatically, but when you declare its size, you must take in consideration its NUL byte:


char word[8] = {'m','e','s','s','a','g','e',0};


You may also have some flexibility in strings declaration, like you do in arrays:


char word[] = "hello\0";
char word[6] = "hello\0";
char word[6] = = {'h','e','l','l','o',0};


We must pay attention with the size in declaration. What if the declaration contains an incorrect size, choping a character or even more than one when you declare the size? Well, if you chop the last character, the compiler will not know exactly where the string ends, so it will print a lot of dirty values, probably in most cases, not initialized values. I write this observation because it happens very often, since when declaring the size of an array, doing it with a value below, accounting for the position 0. But you may also chop a character of the string itself. Lets see what happens:


char word[4] = "hello\0";


img1.jpg

See how much dirty value? Thats why i prefer declaring the size automatically with the [] notation. Lets declare the size correctly, and correct the errors and dirty values:


char word[6] = "hello\0";
printf("%s\n", word);
return 0;


img2.jpg

We may also initialize strings with our old friends, pointers:


char *word = "hello";
printf("%s\n", word);


This declaration already take into account the NUL byte, and you do not have to worry about the size neither.

stdio.h string related functions

gets()

The gets function will take a string input from the standard input stream and store into a string container you specify, like an array, most commonly. It will capture your string until a new line \n is found, generally from the ENTER key when you finish input, but it also captures it when you reach an end of file. Its prototype is:


char *gets(char *str);


You must be careful trying to declare gets() automatically.Lets check the example below:


char word[] = "";

printf("Enter input : ");
gets(word);

printf("%s", word);


In this example, some may think we are declaring the array size automatically. Indeed, at the declaration we are, however, it cannot be resized by the gets function later. So, unless you input nothing at your console, we will then get an reassignment and overflow. We must initialize the size then:


char word[50] = "";

printf("Enter input : ");
gets(word);

printf("%s\n", word);


img3.jpg

You must also take into account the size of the input. If you declare an input too low, and type a huge string, we will face the same overflow problem. We can solve this with dinamically allocating a memory size at runtime as we will see in later tutorials. Then you can solve problems like array sizes being declared dinamically as your program is runnning. Stay tuned :D

fgets()

fgets() looks a lot like gets() function. It is a little safer, since you must pass the buffer size in its declaration, but you may still pass as an argument a size higher than string storage size. Here its is its prototype:


char *fgets(char *str, int n, FILE *stream);


The first argument is the string like we did with gets(). The second is the size of the buffer the function will read. Anything you type after that size will be ignored. The third parameter is the file stream. Our applications can have many file streams, and you can even declare your own file streams to point to a file or network connection and such, however, by default, we will work with the 3 default streams from any application, stdin, stdout and stderr. stdin is our input stream, stdout is the output stream and stderr is the error stream. You can do all sorts of thing including redirect them one to another as you wish. Lets see an example getting a string from the stdin input then printing it to the screen.:


char word[5] = "";

printf("Enter input : ");
fgets(word, 5, stdin);

printf("%s\n", word);


img4.jpg

Like we said, you must be carefull to pass as argument a size compatible with the string storage size. If we declare a size bigger than the storage, overflow can still happen:


char word[5] = "";

printf("Enter input : ");
fgets(word, 10, stdin);

printf("%s\n", word);


puts()

puts looks a lot like the printf() function. It will print a string to stdout filestream, but unlike printf(), it will append a new line \n character for you. Its prototype is:


int puts(const char *str);


Now with an example:


char word[] = "hello puts";
puts(word);
return 0;


img5.jpg

fputs()

fputs() works like puts(), but you can redirect the stream to one of your choice, like a txt file. Then you can type into your console application to a text file directly. But lets see a simpler example, just redirecting to stderr stream instead of stdout from puts:


char word[] = "hello puts";
fputs(word, stderr);
return 0;


fputs will not append a \n at end of your string. So, you may append it at your string declaration if you would like:


char word[] = "hello puts\n";
fputs(word, stderr);
return 0;


img6.jpg

functions of string.h library

string.h is the default string libray for C. It allows us to manipulate strings directly with its functions. Lets see the most important functions of this library.

###strcpy()
strcpy will copy a string into another overwriting whatever is inside the destination string. If your destination string is bigger than the source, the NUL byte will be copied and inform the returned value the end of the string, so it will erase the the rest of the characters from your original string. The destination string must be a datatype that can be modified, like a pointer or array. Here is its prototype:


char *strcpy(char *dst, char *src);


Its implementation is simple, but be careful not trying to copy a bigger string into a smaller one:


char word[] = "hello\0";
char word2[] = "am\0";

printf("%s\n", word);

strcpy(word, word2);
printf("%s\n", word);

return 0;


img7.jpg

This is one of the functions which is a source of problems, like described in "Secure Coding in C and C++" by robert C. Seacord. You must always make sure the destination string is larger than the source, because if your source string is larger, it will overflow its values through the string overwriting whatever is stored in the memory beyond the destination string. You could even overflow until you find a breach then pass a C command inside this same string sending an instruction the program will comprenhend and execute. strcpy will return a value, which is pointer to the destination string. Here is the same code, but passing strcpy directly to printf to show it pointing to the destination string:


char word[] = "hello\0";
char word2[] = "am\0";

printf("%s\n", word);
printf("%s\n", strcpy(word, word2));


If you dont want to worry about the string size, you can then declare your array with a size capable of containing the operation. Even tough you might waste a little more space, it is safer than risking an overflow:


char word[] = "hello\0";
char word2[25] = "am\0";

printf("%s\n", word);

strcpy(word2, word);
printf("%s\n", word);


strcat()

strcat will concatenate a destination string into a source string:


char *strcat(char *dst, char const *src);


strcat also has the same problem of a possible overflow. You must ensure the destination string can be concatenated and not overwrite another memory adress. So now you have 2 or more strings to evaluate size, the sources and the destination. strcat(), like strcpy() will also return a pointer to the destination operation performed by the function:


char word[10] = "hello\0";
char word2[3] = "am\0";

printf("%s\n", word);

printf("%s\n", strcat(word, word2));


img8.jpg

An incorrect size, would generate an oveflow when you concatenate and try to store it into a small container:


char word[7] = "hello\0";
char word2[3] = "am\0";

printf("%s\n", word);

printf("%s\n", strcat(word, word2));

return 0;


strlen

we can get a return of the string size through strlen() function. Here is its prototype:


strlen(str);


This is the most simple function of the library:

char str[] = "Hello World";
printf("%d\n", strlen(str));


img9.jpg

strcmp()

we can compare if 2 strings have the same content by using strcmp(). Since the == operator will not work with strings, we use that function passing as arguments 2 strings to be compared:


int strcmp(char *s1, char *s2);


If they are equal, strcmp() will return 0, if the first string is greater than the second string it will return a value greater than 0, and returns less than 0 is the first string is less than the second. Not very intuitive since many will write:


strcmp(a, b)


and expect true if they are equal. strcmp() offers no dangers of overflowing the array, as long as your string is terminated wil a NUL byte(0).

Restricting the size of string functions and avoiding overflow

strcpy, strcat and strcmp also have a version which perform the very same functionality, except for the reason they receive a length argument to to limit the characters can be copied into another or compared.

strncpy()

strncpy() is the sized version of strcpy() function. Its prototype is:


char *strncpy(char *dst, char const *src, size);

This function will copy characters from the source to destination and fills it overwriting the destination string until it reaches the exact size of the last parameter, the size. If the size argument is bigger than the destination string container size, an overflow will occur(carefull with automatic inicialization!). We must pay attention for the size parameter. Make sure the datatype containing the string is big enough to hold the result of the operation:


char str1[50] = "hellows";
char str2[50] = "my application";

printf("%s\n", strncpy(str1, str2, 5));


img10.jpg

strncat()

strncat unlike strncpy will always append the NUL byte at the final string, so it will append the source string to the destination and also the NUL byte. It works like the strcat function, but you must specify the size. Its prototype is:


char *strncat(char *dest, const char *src, size);


The function will concantenate until the size you pass as argument:


char str1[50] = "hello";
char str2[50] = " my application";

strncat(str1, str2, 7);

printf("%s\n", str1);


img11.jpg

strncmp()

strncmp() will compare two strings up to the size you passed as argument to the function. If they are different after the the size but equal until the size, the function will return 0 like strcmp and a value greater than 0 if the first string is greater than second, less than 0 if the first string is less than the second just like strcmp():


char strncmp(char const *s1, char const *s2, size);


Lets see an example in this case, since it can be a little more sifficult than strcmp():


char str1[50] = "hello my app";
char str2[50] = "hello world";

printf("%d\n", strncmp(str1, str2, 5));


img12.jpg

Note how our function returns 0 as if the 2 strings were equal. They are indeed until the size, but lets increase the size of this comparation to see what happens:


char str1[50] = "hello my app";
char str2[50] = "hello world";

printf("%d\n", strncmp(str1, str2, 10));

return 0;


img13.jpg

There. Now we have a result of -1, since "m" is lesses than "w", the first character they differ.

Working with substrings

strstr()

strstr will receive 2 parameters, the first is the string to perform the search, and the second is the character you must find. It will then return a substring from this first string, starting with the first occurrence found in the string, until the end of the string, if found. Its prototype is:


char *strstr(const char *str1, const char *str2);


Lets extract a substring then:


const char str1[50] = "Hellow substring";
const char str2[50] = "o";

printf("Extracting substring from str1: %s\n", strstr(str1, str2));

return 0;


img14.jpg

We can see how the function performs a search in str1, until it finds the first occurrence of the pattern in str2. It will then return the substring left after the first occurrence which can be stored in a variable. If the function finds nothing, it will just return null.

strchr()

If you must find only a character position, the function strchr() will receive a string and a integer as arguments(the integer contains a character value however), and perform the search for you to find its position, returning the the pointer to the string plus the positon the first occurrence of the character is found. If it finds the first occurrence at position 4, it will take the pointer to the string, performs a pointer arithmetic adding 4 and read from there. If the function cannot find the character in the specified string, it will return a NULL pointer. The search is case sensitive. That means the function will make distinction betwen upper case and lower case characters:


char word[20] = "Rat return to race";
char *returned_value = strchr(word, 'r');
char *noreturn = strchr(word, 'p');

printf("strchr returned: %s\n", returned_value);
printf("strchr returned: %s\n", noreturn);
return 0;


img15.jpg

strrchr()

strrchr works much like strchr(), expet it will begin the reading process from the ending of the string. It will find the first position and read from there also:


char word[20] = "Rat return to race";
char *returned_value = strrchr(word, 'r');
char *noreturn = strrchr(word, 'p');

printf("strchr returned: %s\n", returned_value);
printf("strchr returned: %s\n", noreturn);
return 0;


img16.jpg

strstr()

strstr() will find a substring, much like any high level programming language. We will pass as the second argument the substring we would like to find and not a character anymore. strstr() will force you to work with double quotes and also return NULL if the substring is not found:


char word[20] = "Rat return to race";
char *returned_value = strstr(word, "turn");
char *noreturn = strstr(word, "p");

printf("strchr returned: %s\n", returned_value);
printf("strchr returned: %s\n", noreturn);
return 0;


img17.jpg

The most basic string functions in C have been covered. In programming languages like perl, which has one of its main strenghts string manipulation, you can do all those functions and more, sometimes with less work. Neverthless, its very productive to learn how these functions work in C, to perform fast and adaptable string manipulation. Also, you will find very often while you are debugging in assembly level these functions, since the OS will implement exactly like C does.