development-programming-data-structures.html
* created: 2025-05-24T13:41
* modified: 2025-06-07T17:37
title
Data structures
description
A higher level abstraction on top of our data types. While data types denote what kind of data it is; data structures focus on how to organize that data to, for example, quickly sort or search it.
related notes
Staying organized
Data structures are a high level abstraction on top of our data-types, which focus on the problem how to efficiently store and access our data. This commonly happens inside some sort of list or tree like structure which enforces some sort of restriction, to allow for certain properties.
Defining what a node is.
I would define a node as an element which has some sort pointer to a previous or following node. This makes nodes, by definition, a tightly coupled structure, because it does not only care about its own value, but also where to find subsequent values.
Building a train (Linked-Lists)
A list is a way of grouping stuff in an ordered way. To represent such a structure we need to define a starting point (the head of our structure), some sort of item (a list node) and a way of traversing it (an iterator).
Let's look at trains.
They make for a great example, because you could easily imagine the locomotive of the train as the head, each consecutive wagon as a node, and the ticket inspector as an iterator. Think about which rules apply to the ticket inspector while he is traversing the train.
Let's say the ticket inspector needs to check if everybody has a valid ticket. How would he do that?
Example:
- Start at the locomotive.
- Enter the following section.
- Check Ticket.
- If the end of the Train is reached, stop.
- Else go back to step 2.
You would probably agree that these steps seem quite verbose when laid out in such a manner, but it is important to understand how these structure behave when searching, traversing or inserting.
So, what if we wanted to insert another wagon somewhere in-between the existing ones. Once you find the right spot, adding a new wagon is quick and simple. You just unlink the next wagon, insert a new one, and link everything back together. This makes inserting wagons in known positions quite efficient.
We could follow this allegory through for all common operations, but I don't want to, so here is a list of common operation and how they behave in terms of time complexity (steps we need to take):
Operation |
Time Complexity |
Search (by value) |
O(n) |
Insertion at head |
O(1) |
Insertion at tail |
O(n) or O(1)* |
Insertion at middle |
O(n) |
Deletion at head |
O(1) |
Deletion at tail |
O(n) or O(1)* |
*If the linked list maintains a tail pointer, insertion at tail can be O(1). |
|
This note should explain time complexity and the big O notation further: development-programming-time-complexity.
Plant a tree and you feed a fish for a day (Trees)
What is a tree, and how would you build one? A tree like structure is constructed out of one main root node branches of into multiple child nodes, which each could have multiple other nodes attached themselves. If they don't have any subsequent child nodes they are commonly refereed to as leaf node.
If we enforced a rule like each node in a binary tree can have zero, one, or two children, and no node can have more than two children. To insert a new value, start at the root and recursively move left or right depending on whether the new value is less than or greater than the current node’s value, placing the new node in the first empty spot found following this rule.
We would end up with something now as a binary tree, which has the following time complexity:
Operation |
Time Complexity |
Search (by value) |
O(log n) or O(n)* |
Insertion |
O(log n) or O(n)* |
Deletion |
O(log n) or O(n)* |
Traversal (in-order, pre/post-order) |
O(n) |
Access min/max |
O(log n) or O(n)* |
You don't need to hold it, just say where I can find it.
Nodes often hold references or smart pointers to a value instead of the value itself. This is could be done via a Box<T>
(heap allocation in rust) or a reference counted pointer like Rc<T>
if multiple values share the same value. This avoids copying large values into such a structure.