# Data Structures

http://cseweb.ucsd.edu/~kube/cls/100/Lectures

Terminology: Child, Descendant, Parent, Ancestor, Node, Root, Leaf, sub-tree, Depth, Height.

Best/Worst/Average performance, and memory cost estimations:

Big-O: g(N) = O( f(N)) if g(n>n0) <= k f(n>n0).
Big-Omega: f(N) = Omega(g(N))
Big-Theta : if both above are true.

sorting heuristics:
http://www.sorting-algorithms.com/

interview questions:
http://www.geeksforgeeks.org/write-a-c-program-to-print-all-permutations-of-a-given-string/

## C++ STL containers:

• vector
• deque
• list linked list
• stack
• queue
• priority_queue
• set BST
• multiset
• map
• multimap
• bitset

## Trees

• Linked lists: A->B->C
• Skip lists A->B->C->D, and A->D accelerator
If node 2^n skips ahead to 2^n+1 then same as a binary search tree. But overkill.
node-level = # of forward ptr. Node level of node inserted is chosen randomly.
Insert/Delete/Search O(log N)
• Binary Trees:  c<-a->b
Each node holds a key to manipulate. Height = log2(nodes) => H = O(Log N).

• BST Binary Search Tree: keys k are ordered, all keys left are smaller, right are bigger. A node holds (*p, *l, *r, key/data, property)
• Find: compare key vs k.
• Insert: find till NULL (no dupes)
• Delete:
• Find next key: may be right child, or may be a parent linked via a left link.
• Balanced BST
• AVL tree: A balanced BST with depth off by no more than 1.
keep sub-trees balanced with a rotation of the nodes: Swap a child and its parent & the middle sub tree changes parent for l-l and r-r childs. If it’s a l-r or r-l do a double swap to pull up the node. Keep sub-tree height in each node.
• Red-black tree: O(log N) worst case for insert, find, delete. More lax rule than AVL: Root is black. If parent is red, children are black. All paths from a node to NULL must have same number of black.
• Splay tree: A BST,but on a find, move the node found as root, to improve speed of next search, using splaying.
• K-D K-dimensional trees: Branch logic at level L is done with key L mod k. Used to store nodes with k-dimension coordinates, to allow sorting and region partitions.
• Heap: A partially sorted binary tree or equivalent, where A is parent of B if A is greater (maxHeap), or lesser (minHeap). It’s an efficient implementation of a priority queue. It allows fast find/delete of max or min.
Often implemented with an array, putting the child of n at 2n+1,2n+2, or parent at (i-1)/2, and swaps to move nodes up/down. Insertion is done at 1st free spot and fixing the heap with swaps.
• Treaps: It is a BST for the keys, but node also has a priority, where parent has priority >= all children. In other words, instead of AVL, nodes are swapped based on priority. Inserts are O(Height of tree). Used for split/join operations. To split at key K, insert node (K, infinity). To join, make a dummy root with both trees, and remove it.
• RST: Randomized search tree: Use a priority that is random at insert. Depth = O(long N) = cost of  find, delete, insert, split, join…
• B-tree balanced trees: Nodes have variable m to M children so all leaves are at same level. B-tree node fits in a 4KB block, high branch-out limit number of disk IO.
• B+ tree:
• 2-3 tree: M=L=3, a node has 2 or 3 child.
• Tries (or radix tree):
Finds a key, held in the leaf node by traversing the tree, of which each node represent one letter/bit of the word/number. The digit or letter used is the radix, and tells you how many edges are out of a node (R=2^edges). If an element has at most B digits -> B comparisons/height.

• Alphabet trie: {a..z} -> {a…z} (represent word dictionaries, 26 edges)
• Ternary trie: Each node is a digit, and move down middle on a match and do next digit, else left/right.
• Bitwise tree (represent numbers, 2 edges/binary)
Note the shortcut is to have a root with 32 edges, and encoding the leading zeroes in it to reduce the search depth.
• (DST (digital search trees): use the ith bit to choose the ith vertex, each node holds a key)

## Graphs

node is a vertex, connected by edge

• DAG Directed Acyclic Graph
edges are directed, and the flow does not loop. Circular graphs are often broken into DAGs in compilers.
• Dense graph: V^2 edges
adjacency matrix: boolean 2D array saying if 2 vertices are connected by an edge.
• Sparse graph: ~ V edges
adjacency list: each vertex has a list of other adjecent vertices.
• Weighted graph: edge has weight

Shortest path in a graph G(V,E) from a node to any other.

• depth first
pre-order traversal of a tree. You need to back track using a stack. If you find a shorter path to a node, you need to retry the paths from that node
• breadth first O(E + V)
In a tree it means scan all the same level first. It is better for shortest path detection in a graph. Visit all nodes adjacent using a queue. Once visited don’t revisit a node.
• best first (for weighted shortest path)

Shortest path in weighted graph G(V,E)

Dijkstra’s greedy algorithm extends the best path found so far. It uses a priority queue<pointer to vertex, cost>, and marks a vertex with current best cost/distance, previous node on that best path and done if solved.

## Hash Tables & Maps

A map is called table, dictionary or associative memory. Hold <key, value> pairs.

Searching an index is O(1), a sorted list is O(log N), O(N) if not. To implement create B buckets slightly more than the number of keys stored. On collision use a linked list or store in an unused bucket.

load factor lambda: number of entries n / number of buckets m l =n/m.

hash function: must be deterministic, fast, uniform distribution, consistent with equal.

Hash(string) = sum of ascii value *31^pos // use horner’s rule to reduce math work

Searching:

• binary search of sorted data: O(log N)