Data Structures Semester 2 2020

Monday, May 4, 2020

AVL Tree

AVL Tree

An AVL Tree is a type of a balanced binary tree. We need them so that the height of the BSTs are not that much so that search times are faster.

Insertion Operation:

The basic concept is after we insert the new node like a normal binary tree, we have to balance the tree again. After an insertion, the only nodes that might be unbalanced are the ones that are on the path from the inserted node. By re-balancing the tree at the deepest levels of these nodes, it restores the balance of the tree.

How to re-balance the tree?
Let the node to be re-balanced be T

There are 4 possibilities:

1. The inserted node is located at the left sub tree of the left child of T.

2. The inserted node is located at the right sub tree of the right child of T.

3. The inserted node is located at the right sub tree of the left child of T.

4. The inserted node is located at the left sub tree of the right child of T.

To re-balance, we do what is called rotation. For the 1st and 2nd possibilities can be fixed with a single rotation while the 3rd and 4th possibility are fixed with double rotation.

After inserting a new node, we have to restore the balance condition. We do this by first tracing the the new node to the root of the tree. For each node on this path, check if the heights og the left or right subtree are different at most 1. If yes, proceed to the parent node, however, if the difference is more than 1, we have to fix the subtree by single or double rotation.

LL rotation: the new node is inserted in the left sub-tree of the left sub-tree of the critical node

RR rotation: the new node is inserted in the right sub-tree of the right sub-tree of the critical node.

LR rotation: the new node is inserted in the right sub-tree of the left sub-tree of the critical node

RL rotation: the new node is inserted in the left sub-tree of the right sub-tree of the critical node

Single rotation:

Double rotation:

Deletion Operation:

This process is the same as deleting a node from a binary search tree but with roation afterwards to re-balance the tree.

Monday, April 6, 2020

Summary Semester 1 Data Structures

LINKED LIST

Linked list is a collection of elements. Its difference with an array is that it uses nodes instead of the actual elements to record the collection of data. These nodes are not stored right next to each other but instead can be stored in different locations in the memory. The data stored can only be accessed in a sequential manner.

To allocate memory dynamically in C/C++, we use malloc.

To create a single link list in C/C++, we have to define the node. We do this by doing a making a struct and making a variable of a struct pointer to point to the next element in the list. We also have to initialize the head of the list, where we start reading the list from.

When we want to add a new value to the list, we have to first allocate the memory for the node, assign values to it, then connect it to the list. For the example of putting it in front of the head:

This will insert the new element in front of the head, making it the head or where the list starts.

To delete the value, we have to find the value we want to delete and use the free function in the malloc library. After that, we have to reconnect the elements so that the sequence is not broken. There are two conditions that it can be in: it is a head node, it is not a head node.

3 more types of linked list:

- Circular Single Linked List

The last node (tail) of the tail is connected back to the head so the sequence goes in a circular motion.

- Double Linked List

Instead of only having a head pointer, there is also a prev pointer to point to the node previous of the current one,

- Circular Double Linked List

Similar to the circular single linked list but there is also a prev pointer on each node.

STACK AND QUEUE

A stack is a collection of data with two important functions: push and pop. The data are stored as LIFO (last in first out). We can use both an array and a linked list to make the stack.

Array - There is a top variable to indicate the top of the stack and a max variable for the maximum number of elements in the array. If the top is at NULL, it means the stack is empty, if it is at max-1, the stack is full (arrays end at max-1).

Linked List - Like a regular single linked list, each node will have a value and a pointer to the next node. The head is used at the top. The new nodes are inserted before the head (push) and all deletions are always done at the head (pop). If top = NULL, the stack is empty.

Prefix - operator before operand [+ 1 2]

Infix - operator between operand [1 + 2]

Postfix - operator after operand [ 1 2 +]

We use prefix and postfix so we don't need to use brackets and it is easier for computers to evaluate.

To convert infix to postfix, search for highest precedence operator, put operator behind operands, keep repeating.

Using stack:

To convert infix to postfix, search for highest precedence operator, put operator before operands, keep repeating. To apply to stack, we do the same thing but read the input from right to left.

A queue is a collection of data with two important functions: push and pop. Its difference with stack is that its data is stored as FIFO (first in first out). All new elements are inserted from the rear and all deletion are from the front.

Array - there is a front and rear to indicate the front and back of the collection. We insert behind the rear index.

Linked List - Similar to stack but it has two variables, start and rear.

A circular queue is a queue that wraps around so that more elements can be inserted.

A deque is a queue where you can insert and deete from both the fron and rear.

A priority queue is a queue where the elements inserted have a priority to be processed and so are processed according to that.

TREE AND BINARY TREE

A tree starts with a root node and has lines called edges that connect to the children nodes called leaves. Nodes that come from the same parent are called sibling nodes. The degree of a node is the total sub tree of the node, The height of a tree is the max number of nodes.

A binary tree is a data structure where every node has at most two children.

Types of binary tree:

- Perfect

Every level are of same depth.

- Complete

All levels except for maybe the last are completely filled and all nodes are as far left as possible.

- Skewed

Every node has 1 child at most

- Balanced

No children are farther away from the root than other children.

Implemented using link list:

Implementation of pre/post/infix:

A threaded binary tree is a tree with NULL pointers.

A binary search tree is where there is fast searching, rapid sorting, easy insertion and easy deletion. The left subtree of every node are all elements that are lower value than the node and the right subtree are all the greater value nodes. Its three main functions are find, insert and remove.

HASH AND HASH TABLES

Hashing is method that is usually used to store and retrieve strings that are transformed to shorter length values or keys to represent them. These shortened strings act as indexes or addresses in the hash table for the original string. This method id used as it is easier to find shorter length items than their original length.

A hash table is where we store the string in an array format. The index of the table/array is the hashed key while the value of the index is the original string.

Hash function are functions to hash a string.

Types of hash functions:

- Mid-square
The string is squared and the a specified number of bits from the middle part is taken to get the hash key. If the key is a string, it is converted into a number.

- Division
Get the added up ascii of the string and modulo it by a specified number (usually the table size).

- Folding
Divide the value into equal digit parts, except for the last part which can contain less or more digits. Add all the parts together and ignore the last carry. The result is the hashed value.

- Digit Extraction
Obtain a predefined digit from the original value and the hashed value is obtained.

- Rotating Hash
Use another hash method then reverse the obtained value to obtain the hashed value.

If we continuously use the same hash method for many strings, some of the hash values might be the same.This is called a collision.

There are two way to handle a collision:
- Linear Probing
Look for the next available slot in the table and put the value there instead.

This method though makes it harder to find a specific string as when you calculate the hash key of a string, the hash key might not be where it is stored in the table, causing you to have to loop through the table until you find the wanted value. This makes it very ineffective if there are many collisions.

- Chaining
Make a link list in the slot where collisions occur.

Sunday, March 15, 2020

Hashing and Hash Table

- Division
Get the added up ascii of the string and modulo it by a specified number (usually the table size).

- Digit Extraction
Obtain a predefined digit from the original value and the hashed value is obtained.

- Rotating Hash
Use another hash method then reverse the obtained value to obtain the hashed value.

When a collision happens a link list is used to store the next value with same hash key. This is so that when we want a specified string and it experiences collision, we only need to iterate through the link list in the slot, not the entire table.

In blockchain use in cryptocurrency, hashing is used to encrypt transactions so that the item, when hashed, will produce an output of a fixed length. The output will always be the specified length no matter how long the original input is. This makes it keep track of the transactions as you can recall/trace the hash.