Introduction
In this post we're going to understand and implement the linked list data structure in C through a phone book application.
The language was intentionally selected -- it's great that the rich API of languages like Java and C++ provide us very convenient ways to work with linked list and other data structures. However, I believe that it is still extremely valuable to understand what's happening under the hood, what the benefits and disadvantages are so we can always choose the most appropriate data structure or algorithm.
From this perspective C is a perfect language since there's nothing that hides the implementation details of the data structure like in high-level languages. In fact, we have to provide the implementation.
The audience of the this post is somewhere at the beginner level programmers who have a basic understanding of arrays and pointers in C.
The Phone Book Application
Instead of defining the linked list right away, I think it's best to understand and learn something new by an example. That's what we will do. We'll create a phone book program in which first we use arrays to store phone records and discover its limitations. To address the problems found with the arrays we'll meet the linked list.
We'll define some requirements that'll start from simple but will eventually get us to the linked list.
Let's get started! Suppose we are tasked as follows:
Create a command line phone book application in C that's able to:
- Store names and numbers
- Retrieve phone entities
- Delete phone record by name
Given these requirements, the skeleton of our phone book application will look like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct phone_entry {
char name[256];
char number[256];
};
#define MAX_ENTRIES 3
struct phone_entry entries[MAX_ENTRIES];
int cnt_entries = 0;
void insert_to_phonebook() {
}
void list_entries_phonebook() {
}
void delete_from_phonebook() {
}
int main(void) {
int choice;
char tmp[1024];
while (1) {
printf("1. Create new entry\n");
printf("2. List entries\n");
printf("3. Delete entry\n");
printf("4. Exit\n");
printf("Your choice: \n");
choice = atoi(fgets(tmp, 1024, stdin));
switch (choice) {
case 1: {
insert_to_phonebook();
break;
}
case 2: {
list_entries_phonebook();
break;
}
case 3: {
delete_from_phonebook();
break;
}
case 4: {
return 0;
break;
}
}
}
return 0;
}
Now let's go thorough these tasks and see how we can implement them.
Storing Entities
Consider that we have to handle at most 3 entries. That said, you could initiate the phone book as follows:
struct phone_entry entries[MAX_ENTRIES];
Where MAX_ENTRIES
is set to 3. The code to record a new entry could be something like that:
void insert_to_phonebook() {
char name[256];
char number[256];
if (cnt_entries > MAX_ENTRIES) {
printf("Can't add more entires to the phonebook!\n");
return;
}
printf("Name:\n");
fgets(name, 256, stdin);
name[strlen(name) - 1] = 0;
printf("Number:\n");
fgets(number, 256, stdin);
number[strlen(number) - 1] = 0;
strcpy(entries[cnt_entries].name, name);
strcpy(entries[cnt_entries].number, number);
printf("Record added!\n\n");
cnt_entries++;
}
The variable cnt_entries
tracks how many entries we have in the phone book.
We use the fgets
function to get the name and the number from the user via the standard input. As alternatives, the standard library of C also provides the gets
or scanf
functions to get user input. The manual page of the gets
function is pretty clear though: Never use this function. In short, it's not safe and thus must be avoided.
The scanf
is not considered safe either -- that is, buffer overflow can easily happen if the input string is longer than the size of the destination character array.
The fgets
, on the other hand, exactly specifies the maximum amount of characters that'll be read.
It's interesting to note that the fgets
also stores the new-line character. Since we do not need it, we'll just simply overwrite it with the terminating 0
character.
The very first requirement is now completed, we are now able to add entries to the phone book. Great!
Listing Entities
Accessing and printing the elements is also fairly easy:
void list_entries_phonebook() {
printf("\nEntries of the phonebook:\n");
int i = 0;
for (i = 0; i < cnt_entries; i++) {
printf("Record %i\n", i);
printf("Name: %s\n", entries[i].name);
printf("Number: %s\n\n", entries[i].number);
}
}
Nothing special here, we traverse the array of the phone entries and print the name and the number respectively.
At this point our application looks like that:
Removing Entities
Let's think about what happens if we, say, want to remove the second entry from the phone book:
How do we do that? Well, we can't remove elements from an array in C. The array is fixed size. What we can rather do is we overwrite (and re-use) the second entry's slot.
Overwrite with what? We have two options and they depend on whether the order of the element in the array matters. The two options are:
- If the order does not matter, then just move the last entry to the second position.
- If the order matters, shift the whole phone book array from the right to the left to fill in the gap (the entry we intend to remove).
Let's take a look at the first approach. Suppose we do not care about the order. That's good, because it'll ridiculously simplify the work to be done:
The implementation is:
strcpy(entries[idx_to_remove].name, entries[cnt_entries].name);
strcpy(entries[idx_to_remove].number, entries[cnt_entries].number);
cnt_entries--;
Now consider the situation when the order does matter. That is, we want to see the same order when printing the phone entries as they were added.
In this case we'll need to copy element three to element two's position, element fourth to element third's position.
For small size of arrays, we can use this method, it works just fine. However, imagine this for 100k records! It would certainly have negative impact on the performance since moving that amount of data is a very expensive operation.
The reference implementation for the deletion:
void delete_from_phonebook() {
char name[1024];
int record_index = -1;
int i = 0;
printf("\nEnter the name of the record you want to delete:\n");
fgets(name, 1024, stdin);
name[strlen(name) - 1] = 0;
for (i = 0; i < cnt_entries; i++) {
if (!strcmp(entries[i].name, name)) {
record_index = i;
break;
}
}
if (record_index == -1) {
printf("Couldn't find %s\n", name);
return;
}
for (i = record_index + 1; i < cnt_entries; i++) {
strcpy(entries[i - 1].name, entries[i].name);
strcpy(entries[i - 1].number, entries[i].number);
}
printf("Record %s succesfully deleted!\n\n", name);
cnt_entries--;
}
At this point our phone book application is basically feature complete. We can add, list and remove entries. However, we've seen two problems with the array-based solution:
- The size of the entry array is fixed.
- The deletion can be very slow if the order of the phone entries should be preserved.
Luckily, these problems can be addressed with the linked list data structure.
Using Linked List Instead of Array
Our struct stores now two fields. What if in addition to the name and number we also store the reference of the next record? In other words, we'll have a new member (a pointer) in the struct that points to the next phone book entry as shown on the figure:
This also means that we no longer need to group these entities into an array since they can be traversed by dereferencing the next pointer of each element.
How can we add a new element? Just simply set the list's last element's next pointer to our new element.
Let's look at a possible implementation of the insertion operation:
struct phone_entry *head = NULL;
struct phone_entry *tail = NULL;
void insert_to_phonebook_linkedlist() {
struct phone_entry *entry = (struct phone_entry *)malloc(sizeof(struct phone_entry));
struct phone_entry *tmp;
printf("Name:\n");
fgets(entry->name, 1024, stdin);
entry->name[strlen(entry->name) - 1] = 0;
printf("Number:\n");
fgets(entry->number, 1024, stdin);
entry->number[strlen(entry->number) - 1] = 0;
entry->next = NULL;
if (head == NULL) {
head = entry;
tail = entry;
} else {
tail->next = entry;
tail = entry;
}
}
First we allocate memory to store a phone record, then save the name and the number in the record. Note that it's very important to set the next element to NULL
.
We have two auxiliary variables: the head
and tail
. Their name speak for themselves. The former points to the very first element in the list, while the latter stores the reference of the last item.
We could've omitted the tail variable, but remember we said that when inserting to the list we just simply set the last element's next pointer to our new element. Since we always have the reference of the last element in the tail variable, it makes fairly easy to attach a new element to the list. Otherwise we would need to traverse the entire list to get to the end of it.
If the list is empty (which means head is NULL) then the element to insert will be the only entry in the list -- it'll be the head and tail at the same time.
If the list already contains elements, then just add the new element to the end and make sure the tail variable now points to the new (last) element.
Similarly, if we want to delete an item we just set the previous item's next field to point to the subsequent element of the item we want to delete. Say we want to remove the second entry:
We just set the first entry's next member to point to the third entry.
The benefit is clear:
- Using arrays would require shifting the whole array from the right to the left, that may be very expensive.
- Using pointers requires simply changing the previous element's next pointer.
That's a major difference! And the order is preserved.
Here's a sample implementation that shows how to delete from the linked list:
void delete_from_phonebook_linkedlist() {
char name[1024];
struct phone_entry *tmp, *prev;
printf("\nEnter the index of the record you want to delete:\n");
fgets(name, 1024, stdin);
name[strlen(name) - 1] = 0;
tmp = head;
prev = NULL;
while (tmp) {
if (!strcmp(tmp->name, name)) {
tail = prev;
if (prev) {
prev->next = tmp->next;
} else {
head = tmp->next;
}
free(tmp);
printf("Successfully deleted %s\n", name);
break;
}
prev = tmp;
tmp = tmp->next;
}
}
We introduce the tmp
variable that'll be used to traverse the list. We can't use the global head
variable for that purpose because we'd lose the reference to the list's first element.
The prev
variable is another auxiliary variable, we'll make it point to the previous element when iterating through the list.
If the name given by the user matches the name in the current record then we found the element to remove. It's pointed by the tmp
variable.
In case there's a previous element (which basically means this is not the first element in the list) then have it's next member point to tmp's next element.
If there was no previous element (the item to delete is the first element) then we just change the head pointer to point to the second element.
Since we also maintain a tail variable, we also need to check whether tmp was the last element in the list. If so, then the prev entry will take over this role; let's assign tail to prev.
Definition
And we've just arrived at the concept of linked list: A linear data structure in which each record stores a reference to the next member.
Unlike the arrays, we do not have direct access to the entities. We can reach them by traversing the list and dereferencing the pointers.
In short, here are the most important summary of linked list as a general data structure.
Advantages:
- Adding new element happens in constant time (creating the new entity and modifying the last element's next pointer).
- Very efficient when it comes to deleting entities, it only requires to change a single pointer.
Disadvantages:
- There's no direct access to the elements, we have to traverse the list through the next pointers to locate a certain element.
- Costs more memory than arrays due to storing the bookkeeping references.
Comparing the Implementations
I built a test phone book consisting of 30 000 entries and executed each operation (create, retrieve and delete) of the array and linked list implementation on it. There's no big surprise, the test results correspond to what we've discussed so far:
Insertion and retrieving is head-to-head, but for the delete operation there's a huge difference between the two variants. Re-ordering the array has its price.
Source Code
The source code is available here.
Further Read
Summary
Knowing the low-level details of the basic data structures are essential when it comes to choosing the proper one to solve a particular problem.
As almost every other data structure and algorithm, linked list has also some trade-offs. It consumes more memory than arrays and the elements can be accessed sequentially, however, it performs way better when entries have to be removed.