Allocating Arrays with calloc(3) Instead of malloc(3)

Allocating Memory for an Array in C

Two ways to allocate memory for an array in C are static allocation and dynamic allocation. With static allocation, the amount of memory that needs to be allocated can be determined at compile time. For example, we can declare an array with a fixed size inside a function:

void my_function(void)
{
    int my_array[10];
    ...
}

With dynamic allocation, on the other hand, the amount of memory that needs to be allocated is determined while the program is running. Allocation is often accomplished using malloc(3):

#include <stddef.h>  /* size_t */
#include <stdlib.h>  /* malloc, free */

void my_function(size_t num_elements)
{
    int *my_array;
    my_array = malloc(num_elements * sizeof(int));
    ...
    free(my_array);
}

But, in this blog post, we’ll explore why the idiom of malloc(3)‘ing an amount of memory equal to the number of elements in the array multiplied by the size of each element should be avoided, and why calloc(3) should be used instead.

The Differences Between malloc(3) and calloc(3)

When we optionally notate a C standard function or a UNIX command with a number in parentheses, the number refers to the section of the UNIX manual where its man(1) page can be found. So, to read the man(1) page for the functions malloc(3) and calloc(3), we can run:

% man 3 malloc
% man 3 calloc

The man(1) pages tell us that there are two key differences between the malloc(3) function and the calloc(3) function.

First, malloc(3) takes a single size_t argument, representing the number of bytes of memory to allocate. The calloc(3) function, on the other hand, takes two size_t arguments: the number of elements worth of memory to allocate, and the size of each element in bytes.

The second difference is that calloc(3) zeros out the contents of memory after it’s allocated, whereas malloc(3) doesn’t make any modifications to whatever content is already in the allocated area in memory.

Arithmetic Overflow with malloc(3)

For this blog post, we’ll focus on the first difference between malloc(3) and calloc(3): that malloc(3) takes a single argument instead of two arguments. That single argument represents the number of bytes to allocate. So, when we allocate memory for an array, we need to multiply the number of elements in the array by the size of each element, as in:

malloc(num_elements * sizeof(int));

If the number of elements in the array, num_elements, is large enough, an arithmetic overflow can occur. That is, the multiplication of the number of elements by the size of each element can result in a number that’s too large to fit in a size_t.

The value of num_elements might be very large because more resources are needed than we anticipated when we were writing our application. Or the value of num_elements might be controlled by a malicious actor — for example, if the value of num_elements is input directly by the user.

Let’s see what happens in the following situation:

int *my_array;
size_t num_elements = 9223372036854775809U;
size_t alloc_size = num_elements * sizeof(int);
printf("We will allocate %zu bytes\n", alloc_size);
my_array = malloc(alloc_size);
...
free(my_array);

On my 64-bit computer, which has 64-bit size_t values and 32-bit int values, the output of this program is:

We will allocate 4 bytes

The number of bytes being allocated is small and incorrect because the number 9223372036854775809, represented in hexadecimal, is:

0x 4000 0000 0000 0001

Because an int is 32 bits on my computer, sizeof(int) is 4 bytes. The above large value multiplied by 4 is, mathematically:

0x 1 0000 0000 0000 0004

However, the leading bit of the multiplication result is truncated, because a size_t on my computer can only hold 64 bits. This truncation causes the result of the multiplication to be:

0x 0000 0000 0000 0004

That value, alloc_size, is the 4 bytes of memory that we incorrectly wind up allocating.

Buffer Overflows with malloc(3)

Let’s consider an application which gets, from the user, the maximum number of values to read in; then, it reads values until it encounters a zero. The structure of this program could look like the following:

#include <stddef.h>  /* size_t */
#include <stdlib.h>  /* malloc, free */

size_t get_max_num_values(void);
int get_value(void);
void do_something(int *values, size_t num_values);

int main(void)
{
    int *values;
    int value;
    size_t max_num_values, alloc_size, num_values;

    max_num_values = get_max_num_values();
    alloc_size = max_num_values * sizeof(int);
    values = malloc(alloc_size);

    printf("Bytes allocated = %zu\n", alloc_size);

    num_values = 0;
    while (num_values < max_num_values) {
        value = get_value();
        if (value == 0) {
            break;
        }
        values[num_values++] = value;
    }

    do_something(values, num_values);
    free(values);
}

As we saw above, if a user (malicious or otherwise) provides an input of 9223372036854775809 to the get_max_num_values(void) function, only 4 bytes of memory will be allocated. The while loop, however, will write values to memory well beyond the allocated 4 bytes, until the user inputs a 0.

A program that writes beyond allocated space, overriding other values in memory, has a type of flaw called a “buffer overflow”.

Because we have no guarantees of where in memory those 4 bytes of dynamically allocated space will be located, the changes to memory caused by writing past the end of the array can possibly cause non-deterministic and irreproducible behaviour in our program, making debugging incorrect behaviour difficult.

Worse, buffer overflows are also a serious security concern, because a malicious user who can override parts of memory beyond an allocated array may be able to cause the program to misbehave in a manner of their choosing.

Allocating Arrays with calloc(3)

Instead of performing an unsafe multiplication operation to determine how much space we need to allocate for an array, we can instead use the standard function calloc(3) to allocate the array. Instead of a single argument, representing how many bytes of space to allocate, calloc(3) takes two arguments: the number of elements in an array to be allocated, and the size of each element in the array. For example,

int *my_array;
my_array = calloc(10, sizeof(int));
...
free(my_array);

is the dynamic equivalent of

int my_array[10];

Just like malloc(3), calloc(3) will return NULL if the memory allocation fails. But, unlike malloc(3), one of the reasons that calloc(3) allocation can fail is if the multiplication of the number of elements by the size of each element overflows.

For example, running:

int *my_array;
size_t num_elements = 9223372036854775809U;
my_array = calloc(num_elements, sizeof(int));

will set my_array to NULL on my 64-bit machine. In our program, we should verify that the return value from a calloc(3) call is non-NULL before using the allocated array. If the return from calloc(3) is non-NULL, we’re guaranteed that an array of the proper size has been allocated.

Missing NULL Checks

But what if we make a programming error, and forget to check that the return value from calloc(3) is non-NULL before accessing the array, like we did in the example above? In this case, attempting to access the unsuccessfully allocated array will cause our program to terminate with a segmentation fault, instead of being allowed to write data to unsafe memory locations.

So, if we were to follow the previous three lines of code with the following:

my_array[0] = 42;

the attempt to dereference the NULL pointer to access the array would trigger a segmentation fault. Running the four lines of code above on my computer, compiled as an executable called segfault_demo, results in the following:

zsh: segmentation fault  ./segfault_demo

Having our program crash with a segmentation fault in this case is a good thing. An incorrect program that crashes is almost always better than an incorrect program that continues running but behaves incorrectly. The former is far easier to debug.

Exercise

Earlier in this blog post, we wrote a program that takes user input until it receives a zero or until it reads a maximum number of input integers. I encourage students to rewrite this program with the following changes:

  1. Use calloc(3) instead of malloc(3) to allocate the array;
  2. Verify that the call to calloc(3) succeeds by checking for a NULL return;
  3. Complete the implementations of get_max_num_values(void) and get_value(void) to read size_t and int inputs from the user on the console, respectively; and,
  4. Complete the implementation of do_something(int *, size_t) to use the input data in some way, such as echoing it back to the console.

Hint: the man(1) pages for printf(3) and scanf(3) tell us that the specifier for a size_t in these two standard functions is "%zu".

Summary

One of the first idioms we usually learn for dynamically allocating arrays in C is to multiply the number of elements in the array by the size of each element. That is, we learn to allocate an array of ints like the following:

int *my_array = malloc(num_elements * sizeof(int));

But, unless we can be absolutely certain that this multiplication won’t cause an arithmetic overflow, we should avoid this idiom. An arithmetic overflow in this multiplication can lead to allocating the incorrect amount of memory. This can cause non-deterministic program behaviour, or even allow a malicious user to make the program misbehave in a manner of their choosing.

Instead of performing this unsafe multiplication, we should use calloc(3) to allocate arrays:

int *my_array = calloc(num_elements, sizeof(int));

If the multiplication causes an arithmetic overflow, the allocation will fail, and calloc(3) will return NULL. We can check for a NULL return from calloc(3) and behave accordingly.

Even if the check for a NULL return is incorrectly missing from our code, at least attempting to dereference a NULL pointer to access the array will cause our program to segmentation fault instead of behave incorrectly.

For more tips, and to arrange for personalized tutoring for yourself or your study group, check out Vancouver Computer Science Tutoring.