# Data Structures

http://cseweb.ucsd.edu/~kube/cls/100/Lectures

**Terminology**: Child, Descendant, Parent, Ancestor, Node, Root, Leaf, sub-tree, Depth, Height.

Best/Worst/Average performance, and memory cost estimations:

**Big-O**: g(N) = O( f(N)) if g(n>n0) <= k f(n>n0).

**Big-Omega**: f(N) = Omega(g(N))

**Big-Theta** : if both above are true.

**sorting heuristics:**

http://www.sorting-algorithms.com/

**interview questions:**

http://www.geeksforgeeks.org/write-a-c-program-to-print-all-permutations-of-a-given-string/

(follow links on right)

## C++ STL containers:

**vector****deque****list**linked list**stack****queue****priority_queue****set**BST**multiset****map****multimap****bitset**

## Trees

**Linked lists**: A->B->C**Skip lists**A->B->C->D, and A->D accelerator

If node 2^n skips ahead to 2^n+1 then same as a binary search tree. But overkill.

node-level = # of forward ptr. Node level of node inserted is chosen randomly.

Insert/Delete/Search O(log N)**Double-linked**list

**Binary Trees**: c<-a->b

Each node holds a key to manipulate. Height = log2(nodes) => H = O(Log N).**BST**Binary Search Tree: keys k are ordered, all keys left are smaller, right are bigger. A node holds (*p, *l, *r, key/data, property)- Find: compare key vs k.
- Insert: find till NULL (no dupes)
- Delete:
- Find next key: may be right child, or may be a parent linked via a left link.

**Balanced BST****AVL**tree: A balanced BST with depth off by no more than 1.

keep sub-trees balanced with a rotation of the nodes: Swap a child and its parent & the middle sub tree changes parent for l-l and r-r childs. If it’s a l-r or r-l do a double swap to pull up the node. Keep sub-tree height in each node.**Red-black tree**: O(log N) worst case for insert, find, delete. More lax rule than AVL: Root is black. If parent is red, children are black. All paths from a node to NULL must have same number of black.

**Splay tree**: A BST,but on a find, move the node found as root, to improve speed of next search, using splaying.**K-D**K-dimensional trees: Branch logic at level L is done with key L mod k. Used to store nodes with k-dimension coordinates, to allow sorting and region partitions.**Heap**: A partially sorted binary tree or equivalent, where A is parent of B if A is greater (maxHeap), or lesser (minHeap). It’s an efficient implementation of a priority queue. It allows fast find/delete of max or min.

Often implemented with an array, putting the child of n at 2n+1,2n+2, or parent at (i-1)/2, and swaps to move nodes up/down. Insertion is done at 1st free spot and fixing the heap with swaps.**Treaps**: It is a BST for the keys, but node also has a priority, where parent has priority >= all children. In other words, instead of AVL, nodes are swapped based on priority. Inserts are O(Height of tree). Used for split/join operations. To split at key K, insert node (K, infinity). To join, make a dummy root with both trees, and remove it.**RST**: Randomized search tree: Use a priority that is random at insert. Depth = O(long N) = cost of find, delete, insert, split, join…

**B-tree**balanced trees: Nodes have variable m to M children so all leaves are at same level. B-tree node fits in a 4KB block, high branch-out limit number of disk IO.- B+ tree:
**2-3 tree**: M=L=3, a node has 2 or 3 child.

**Tries**(or radix tree):

Finds a key, held in the leaf node by traversing the tree, of which each node represent one letter/bit of the word/number. The digit or letter used is the radix, and tells you how many edges are out of a node (R=2^edges). If an element has at most B digits -> B comparisons/height.**Alphabet trie:**{a..z} -> {a…z} (represent word dictionaries, 26 edges)**Ternary trie**: Each node is a digit, and move down middle on a match and do next digit, else left/right.**Bitwise**tree (represent numbers, 2 edges/binary)

Note the shortcut is to have a root with 32 edges, and encoding the leading zeroes in it to reduce the search depth.- (
**DST**(digital search trees): use the ith bit to choose the ith vertex, each node holds a key)

## Graphs

node is a **vertex**, connected by edge

- DAG Directed Acyclic Graph

edges are directed, and the flow does not loop. Circular graphs are often broken into DAGs in compilers. - Dense graph: V^2 edges

adjacency matrix: boolean 2D array saying if 2 vertices are connected by an edge. - Sparse graph: ~ V edges

adjacency list: each vertex has a list of other adjecent vertices. - Weighted graph: edge has weight

Shortest path in a graph G(V,E) from a node to any other.

- depth first

pre-order traversal of a tree. You need to back track using a stack. If you find a shorter path to a node, you need to retry the paths from that node - breadth first O(E + V)

In a tree it means scan all the same level first. It is better for shortest path detection in a graph. Visit all nodes adjacent using a queue. Once visited don’t revisit a node. - best first (for weighted shortest path)

Shortest path in weighted graph G(V,E)

Dijkstra’s greedy algorithm extends the best path found so far. It uses a priority queue<pointer to vertex, cost>, and marks a vertex with current best cost/distance, previous node on that best path and done if solved.

## Hash Tables & Maps

A map is called table, dictionary or associative memory. Hold <key, value> pairs.

Searching an index is O(1), a sorted list is O(log N), O(N) if not. To implement create B buckets slightly more than the number of keys stored. On collision use a linked list or store in an unused bucket.

load factor lambda: number of entries n / number of buckets m l =n/m.

hash function: must be deterministic, fast, uniform distribution, consistent with equal.

Hash(string) = sum of ascii value *31^pos // use horner’s rule to reduce math work

Searching:

- binary search of sorted data: O(log N)