- 1. Linear data structures and nonlinear data structures
- 2. Sparse Array and Queue
- 3. Linked List
- 4. Stack
- 5. Recursion
- 6. Sorting Algorithms
- 7. Searching Algorithms
- 8. Hash Table
- 9. Tree
- 10. Heap
The term data structure refers to a data collection with well-defined operations and behavior or properties. A data structure is a unique way of storing or organizing the data in computer memory so that we can use it effectively.
- Linear structures, as the most commonly used data structures, are characterized by a one-to-one linear relationship between data elements.
- Linear structures have two different storage structures, namely sequential storage structure and chain storage structure.
- A linear table with sequential storage is called a sequence table, its characteristic is that the logically adjacent data elements are also adjacent in physical order.
- A linear table with chained storage is called a linked list, where the stored elements are not necessarily physically adjacent and the element nodes hold the data elements and the address information of the adjacent elements.
- There are four types of linear data structures: array, queue, linked list and stack
The non-linear data structure does not arrange the data in a sequential manner as in linear data structures. Non-linear data structures are the multilevel data structure.
Non-linear structures include: two-dimensional array, multi-dimensional array, generalized list, tree and graph.
A sparse array is an array of data in which many elements have a value of zero. This is in contrast to a dense array, where most of the elements have non-zero values or are “full” of numbers. A sparse array may be treated differently than a dense array in digital data handling.。
How to handle sparse arrays:
- record the number of rows and columns of the array and the number of different values .
- Record the row and column of the elements with different values in a small array, thus reducing the size of the program.
- Iterate through the original 2-dimensional array to get the number of non-zero data
sum. - Create the sparse array
SparseArr int[sum+1][3]based onsum. - Store the non-zero data of the two-dimensional array into the sparse array .
- Create the original 2-dimensional array based on the first row of sparse array.
- Continue to read the next rows of the sparse array and recover the original 2D array.
Conversion between two-dimensional array and sparse array
Logically, a queue is a FIFO (First In First Out) data structure and we can physically implement it either as an array or a linked list. Whatever way we use to implement a queue, insertions always take place at the “rear” end and deletions always from the “front” end of the queue.
- In queues which are based on arrays , the
maxSizeis the maximum capacity of the queue . - Since the output and input of the queue are handled from the front and back respectively, two pointers
frontandrearare needed to record the index of the front and back respectively,frontwill change with the data output andrearwill change with the data input. - The process to put data into a queue is called
addQueue, and the processing ofaddQueuerequires two steps:
- Move the rear pointer backwards,
rear+1; whenfront=rear, the queue is empty - If the rear pointer
rearis smaller than the maximum index of the queuemaxSize-1, deposit the data in the position pointed byrear, otherwise it cannot be deposited. Ifrear == maxSize-1, the queue is full.
A circular queue is the extended version of a regular queue where the last element is connected to the first element. Thus forming a circle-like structure.
- The circular queue solves the major limitation of the normal queue. In a normal queue, after a bit of insertion and deletion, there will be non-usable empty space.
- the circular increment is performed by modulo division with the queue size.
- Adjust the meaning of the
frontpointer:fronttracks the first element of the queue, that is,arr[front]is the first element of the queue, and the initial value offrontis 0. - Adjust the meaning of the
rearpointer:reartracks the next position of the last element in the queue, and the initial value ofrearis 0. - The queue is full when
(rear + 1) % maxSize = front - The queue is empty when
rear = front - the number of valid data in the queue
(rear + maxSize - front) % maxSize
Circular Queues based on Arrays
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next.
- Linked list is a data structure consisting of a collection of nodes
- Each node contains data and next (a reference filed to the next node in the sequence)
- The nodes of a linked list are not necessarily stored consecutively
- there is a linked list with a head node and a linked list without a head node
- Create a
headnode as the head of the linked list - Add the subsequent nodes in sequence to the list
- Find the position of the element to be inserted through traversal and auxiliary variable
temp. newNode.next = temp.nexttemp.next = newNode
- Find the node by traversal
- Modify the node information
- Find the previous node
tempof the node to be deleted temp.next = temp.next.next- The deleted node isn't referenced any more and will be recycled by the garbage collection mechanism
The time complexity of deleting a node will be O(N), because we have to traverse the linked list from the head node
to find out the previous node temp.
The space complexity is O(1) because we only need constant space to store our pointers.
Pros of Doubly Linked List:
- Singly linked list can only be traversed in one direction, doubly linked lists can be traversed either forwards or backwards
- Elements are deleted from single-linked list relying on an auxiliary pointer, elements in doubly linked list can be self-deleted.
- Comparison of Time Complexity between linked lists and Array
- Traversal is the same as for a singly linked list, difference: bidirectional
- Add: Add to the end of the list by default
- move to the last node of the doubly linked list
temp.next = newHeroNodenewHeroNode.pre = temp
- Modify: same as in singly linked list
- Deletion:
- doubly linked lists can be self-deleted
- locate the node
tempwhich should be deleted temp.pre.next = temp.nexttemp.next.pre = temp.pre
In computer science and mathematics, the Josephus problem (or Josephus permutation) is a theoretical problem related to a certain counting-out game.
People are standing in a circle waiting to be executed. Counting begins at a specified point in the circle and proceeds around the circle in a specified direction. After a specified number of people are skipped, the next person is executed. The procedure is repeated with the remaining people, starting with the next person, going in the same direction and skipping the same number of people, until only one person remains, and is freed.
Example:
- n=5, 5 people sitting in a circle
- k=1, counting begins from the first person
- m=2, the person who calls 2 is out
- counting-out sequence:
- 2=>4=>1=>5=>3
- Creation of a circular singly linked list
- Create the first node, let the first pointer point to this node, and form a ring
- For each new node created, add the node to the existing ring
- Traversalof circular singly linked list
- Let the auxiliary pointer track the first node
- Traversal with the helf of the while loop (
cur.next == first)
Circular Singly Linked List and Josephus Problem
- Stack is a abstract data type which takes the last in first out (LIFO) ordering
- A stack restricts the insertion and deletion of elements ** to the same end**. The side that allows insertion and deletion is called the Top, and the fixed side is called the Bottom.
- The entire functionality of a stack depends on the push and pop methods.
- Processing Function Calls
- To backtrack to the previous task/state, e.g., in a recursive code
- Expression Evaluation Algorithm: Calculators employing reverse Polish notation use a stack structure to hold values.
- Traversal of the binary trees
- Depth First Search
- Define a
topto represent the top of the stack, initialized to -1 - Push:
top++;stack[top] = data - Pop:
int value = stack[top];top--;return value - Code
- Traverse the expression using an auxiliary pointer
index - If number, push to number stack
- If operator:
- if the operator stack is empty, push the operator to the stack
- if the operator stack is not empty, compared the new operator with the operator at the top of the operator stack. If the priority of the current operator is less than or equal to the operator in the stack, pop two numbers from the number stack and one operator from the operator stack and calculate. push the obtained result into the number stack and the current operator is into the operator stack afterwards. If the priority of the current operator is greater than the operator in the stack, it is directly pushed into the operator stack.
- When the expression is scanned, the corresponding number and operator are popped from the number stack and the operator stack in sequence, and run.
- The number left in the number stack at last is the final result
- Code: Stack based Calculator
- In this notation, operator is prefixed to operands, i.e. operator is written ahead of operands.
- Prefix notation is also known as Polish Notation.
- Example:
(3 + 4) \ * 5 - 6=>- * + 3 4 5 6 - Evaluation of Prefix Expression:
- Put a pointer P at the end of the expression
- If character at P is an operand push it to Stack
- If the character at P is an operator pop two elements from the Stack. Operate on these elements according to the operator, and push the result back to the Stack
- Decrement P by 1 and go to Step 2 as long as there are characters left to be scanned in the expression.
- The Result is stored at the top of the Stack, return it
- End
- Operators are written in-between their operands. This is the usual way we write expressions.like:
(3 + 4 ) * 5 - 6
-
Operators are written after their operands, also called Reverse Polish Notation
-
Example:
(3 + 4 ) \ * 5 - 6=>3 4 + 5 \ * 6 - -
Evaluation of Postfix Expression:
just like the evaluation of prefix expression , but the order of evaluation of operators is always left-to-right
-
Example:
| infix | postfix |
|---|---|
| a+b | ab+ |
| a+(b-c) | abc-+ |
| a+(b-c)*d | abc-d*+ |
| a+d*(b-c) | adbc-*+ |
| a=1+3 | a13+= |
Code: Postfix Expression Calculator
- initialize two stacks, the operator stack
s1and the stacks2that stores intermediate results - scan the index expression from left to right
- push the operand into
s2when it is encountered - When an operator is encountered, compare its priority level with the top-of-stack operator
s1- If
s1is empty, or the top-of-stack operator is a left bracket "(", then the operator is directly pushed to the stack; - if the priority is higher than the stack top operator, push the operator into
s1as well - if the priority is not higher than the top of stack operator, pop
s1top of stack operator and push it intos2, repeat the above comparison
- If
- When brackets are encountered:
- if "(" push to
s1 - if ")", pop
s1top-of-stack operator and push it intos2until a left bracket is encountered, then discard the brackets
- if "(" push to
- Repeat steps 2-5 until the rightmost end of the expression.
- pop the remaining operators of
s1in turn and push intos2 - Pop out the elements of
s2in sequence, the inverse order of the result is the postfix expression
Code: Conversion of Infix Expression to Postfix Expression
Recursion is one of the most important concepts in computer science. Simply speaking, recursion is the process of a function calling itself.
In mathematics and computer science, a class of objects or methods exhibits recursive behavior when it can be defined by two properties:
- A simple base case (or cases) — a terminating scenario that does not use recursion to produce an answer
- A recursive step — a set of rules that reduces all successive cases toward the base case.
- When a program executes a method, a separate space is opened. (stack)
- the data (local variables) in each space are independent and do not affect each other
- If a non-primitive data type (e.g. array) is used in a method, the data of that reference type is shared.
- recursion must approach the exit condition, otherwise it will be infinitely recursive and
StackOverFlowErrorwill occur - When a method finishes executing, or encounters
return, it returns, following the principle of who calls the result back to whom. Also, when the method finishes executing or returns, the method also finishes executing.
Code: Maze
The eight queens puzzle is the problem of placing eight chess queens on an 8×8 chessboard so that no two queens threaten each other; thus, a solution requires that no two queens share the same row, column, or diagonal. There are 92 solutions. The problem was first posed in the mid-19th century. In the modern era, it is often used as an example problem for various computer programming techniques.
- The first queen is placed in the first column of the first row
- The second queen is placed in the first column of the second row, determine whether the requirements are met. If not, continue to put the second queen in the second column of the second row, the third column... Until the second queen is in the eligible position.
- Place the third queen in the first column of the third row, the second column... Then determine if the condition is met, until it is in the eligible position. Repeat the same steps until the eighth queen is in the correct position.
- When the first correct solution is obtained, backtracking starts when the stack is backed up to the previous stack. That is, get all correct solutions when the first queen is in the first column.
- placing the first queen in the second column of the first row, repeat steps 1 to 4.
Sorting is the process of arranging items systematically. In computer science, sorting algorithms put elements of a list in a specific order.
- Internal Sorting: Sorting algorithms that use main memory exclusively during the sort are called internal sorting algorithms.
- External Sorting: Sorting algorithms that use external memory during the sort
Time complexity measures the time taken to execute each statement of code in an algorithm.
- Post-hoc statistical approach: This approach is feasible, but faces two problems:
- the need to actually run the program
- The time statistics obtained depends on the computer hardware, software and other environmental factors.
- ex ante estimation method: the algorithm's time complexity is calculated to evaluate the algorithms
The time required for an algorithm is directly related to the number of statements executed in the algorithm. The more statements are executed in an algorithm, the more time it takes. The number of times a statement is executed in an algorithm is called Time Complexity and is denoted as T(n).
Calculate the sum of 1 to 100:
- for loop,T(n) = n + 1:
int total=0;
int end=100;
for(i=0;i<end; i++){
total+=i;
}- direct calculation,T(n) = 1:
int total=0;
int end=100;
total=(1+end)*end/23 characteristics:
- constant terms can be ignored
- low order terms can be ignored
- hihher order coefficients can be ignored
- In general, the time complexity of an algorithm is a function of the input size n, denoted by T(n). If there is some auxiliary function f(n) such that n tends to infinity, the limit of T(n)/f(n) is a constant not equal to zero, denoted as T(n)=O(f(n)), O(f(n)) is the asymptotic time complexity of the algorithm, abbreviated as time complexity.
- T(n) can be different, but the time complexity may be the same.
- The method to calculate the time complexity:
- Replace all constants with the constant 1
- Keep only the highest order term
- Remove the coefficients of the highest order term
- The common time complexity, from smallest to largest, is listed below. As n increases, the execution efficiency of
the algorithm decreases as the time complexity increases.
- Constant Time Complexity O(1)
- Logarithmic Time Complexity O(log2n)
- Linear Time Complexity O(n)
- Linear Logarithmic Time Complexity O(nlog2n)
- Quadratic Time Complexity O(n^2)
- Cubic Time Complexity O(n^3)
- K-th Order Time ComplexityO(n^k)
- Exponential Time Complexity O(2^n)
- Average case scenario is the running time of the algorithm with equal probability of occurrence for all input instances.
- Worst case analysis gives the maximum number of basic operations that have to be performed during execution of the algorithm. It assumes that the input is in the worst possible state and maximum work has to be done to put things right.
- Whether the average time complexity is consistent with the worst-case time complexity depends on the algorithm.
- The space complexity of an algorithm or a computer program is the amount of memory space required to solve an instance of the computational problem as a function of characteristics of the input.
- Similar to time complexity, space complexity is often expressed asymptotically in big O notation, such as O(n), O( nlog n), O(2^n), etc., where n is a characteristic of the input influencing space complexity.
- When doing algorithm analysis, the essential discussion is time complexity. From a user experience perspective, it is more about the speed of program execution. Some caching products (redis, memcache) and algorithms (radix sorting) is the traditional exchange of space for time.
Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted. The algorithm, which is a comparison sort, is named for the way smaller or larger elements "bubble" to the top of the list.
- Optimization: an optimization algorithm for bubble sort is to set a flag that indicates that the sequence is ordered when no exchange of elements occurs during a traverse to the end.
The algorithm divides the input list into two parts: a sorted sublist of items at the front (left) of the list and a sublist of the remaining unsorted items at the right of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest element in the unsorted sublist, swapping it with the leftmost unsorted element, and moving the sublist boundaries one element to the right.
Insertion sort is a simple sorting algorithm that works similar to the way you sort playing cards in your hands. The array is virtually split into a sorted and an unsorted part. Values from the unsorted part are picked and placed at the correct position in the sorted part.
Shell sort is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange (bubble sort) or sorting by insertion (insertion sort). The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far apart elements, it can move some out-of-place elements into position faster than a simple nearest neighbor exchange. Donald Shell published the first version of this sort in 1959. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their time complexity remains an open problem.
Quicksort is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. For this reason, it is sometimes called partition-exchange sort. The sub-arrays are then sorted recursively. This can be done in-place, requiring small additional amounts of memory to perform the sorting.
Conceptually, a merge sort works as follows:
- Divide the unsorted list into n sublists, each containing one element (a list of one element is considered sorted).
- Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. This will be the sorted list.
- In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix.
- For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered.
- radix sort has also been called bucket sort and digital sort.
A linear search or sequential search is a method for finding an element within a list. It sequentially checks each element of the list until a match is found or the whole list has been searched.
Prerequisite: the elements in the Array are in sorted order. The binary search approach:
- First determine the index of the middle value of the array:
mid = (left + right) / 2 - compare the value to be found
findValwitharr[mid]findVal > arr[mid]recursion to the rightfindVal < arr[mid]recursion to the leftfindVal == arr[mid]value found, return
- after the recursion for the whole array, if
findValis still not found, end (left > right)
In computing, a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.
Hashing is a process used to uniquely identify objects and store each object at some pre-calculated unique index called its key. So the object is stored in the form of a key-value pair, and the collection of such items is called “Dictionary”. Each object can be searched using that key in O(1).
The major target of Hash Tables is to minimize the searching time of any sort of data that needs to be fetched.
Hash Tables are generally implemented using Arrays, as they are the only data structures that provide access to elements in constant O(1) time.
In computer science, a binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child.
- Pro: fast accessing of elements by index. For sorted arrays, binary search can be used to increase the speed.
- contra: inefficient when inserting values, the whole array needs to be moved
- Pro: optimize the array storage to some extent. efficient insertion and deletion compared to array storage
- contra: inefficient when searching values, which needs a traversal from the head node
Improve the efficiency of data storage and reading, such as binary trees, which can ensure the speed of data retrieval, but also ensure the speed of data insertion, deletion and modification.
- Nodes: Hold data
- Root: The uppermost node of a tree
- Parent Node: A node which is connected to one or more nodes on the lower level (Child Nodes).
- Child Node: A node which is linked to an upper node (Parent Node)
- Sibling Node: Nodes that have the same Parent Node
- Leaf Node: A node that doesn’t have any Child Node
- Sub-tree: A subtree is a portion of a tree that can be viewed as a complete tree on its own. Any node in a tree, together with all the connected nodes below it, comprise a subtree of the original tree.
- Degree: The degree of a node refers to the total number of sub-trees of a node
- Depth: The number of connections (edges) from the root to a node is known as the depth of that node.
- Level: (Depth Of Node) + 1
- Create the binary tree
- Pre-Order Traversal
- Visit the current node and display its data.
- Traverse the left sub-tree of currentNode, recursively using the PreOrder() function.
- Traverse the right sub-tree of currentNode, recursively using the PreOrder() function.
- In-Order Traversal
- Traverse the left sub-tree of currentNode, recursively using the InOrder() function.
- Visit the current node and read.
- Traverse the right sub-tree of currentNode, recursively using the In-Order() function.
- Post-Order Traversal
- Traverse the left sub-tree of currentNode, recursively using PostOrder() function.
- Traverse the right sub-tree of currentNode, recursively using PostOrder() function.
- Visit current node and print its value.
- Pre-Order Search
- determine whether the value of the current node is equal to the value to be found, if equal, return, if not:
- if the left child node is not empty, recursive pre-order search, if found, return, if not:
- if the right child node is not empty, recursive pre-order search
- In-Order Search
- if the left child node is not empty, recursive in-order search, if found, return, if not:
- determine whether the value of the current node is equal to the value to be found, if equal, return, if not:
- if the right child node is not empty, recursive in-order search
- Post-Order Search
- if the left child node is not empty, recursive in-order search, if found, return, if not:
- if the right child node is not empty, recursive in-order search, if found, return, if not:
- determine whether the value of the current node is equal to the value to be found, if equal, return, if not
return
null
- if the node to be deleted is a leaf node, just delete the node
- if the node to be deleted is a non-leaf node, delete the whole sub-tree
Consider: If the tree is empty, there is only a root node and set it to be null
- Since the binary tree is unidirectional, we should determine whether the children of the current node need to be deleted, rather than determining whether the current node is a node to be deleted.
- If the left child of the current node is not empty and is the node to be deleted,
this.left = null, return - if the right child of the current node are not empty and is the node is to be deleted,
this.right = null, return - If no nodes have been deleted during the second and the third step, then recursive deletion to the left subtree
- If the fourth step does not delete the node, recursive deletion to the right subtree
- Heaps are advanced data structures based on Binary Trees, which is why they are commonly known as Binary Heaps.
- All the nodes are ordered according to the rules listed below:
- A Heap tree must be a Complete Binary Tree.
- The nodes must be ordered according to the Heap Property.
Complete Binary Tree:
is a tree where each node has a max. of two children and nodes at all levels are completely filled (except the leaf nodes). But the nodes at the last level must be structured in such a way that the left side is never empty.
Heap Property:
- Max Heap Property: All the parent node keys must be greater than or equal to their child node keys. So the root node will always contain the largest element present in the Heap.
- Min Heap Property: All the parent node keys are less than or equal to their child node keys. So the root node will always contain the smallest element present in the Heap.
- Heaps can be implemented using Arrays. The contents of a heap with n nodes are stored in such a way that all the parent nodes occur in the first half of array (n/2), while the leaves are present at the last n/2 positions. So the last parent will be at the n/2-th position.
- The node at the kth position will have its children placed as follows:
- The Left child at 2k+1
- The Right child at 2k+2
The major uses of Heaps are:
- Order statistics: Heaps are primarily used for efficiently finding the smallest or largest element in an array.
- Priority Queues: Priority queues can be efficiently implemented using Binary Heap because it supports insert(), delete(), extractmax(), and decreaseKey() operations in O(logn) time.
- Sorting: HeapSort uses the Heap data structure to sort values in exactly the same way as in a Binary Tree.
- Build a max heap from the array to be sorted
Array = {1, 3, 5, 4, 6, 13, 10, 9, 8, 15, 17}
Corresponding Complete Binary Tree is:
1 3 5 / \ / \ 4 6 13 10 / \ / \ 9 8 15 17The task to build a Max-Heap from above array.
Total Nodes = 11.
Total Nodes = 11.
Last Non-leaf node index = (11/2) - 1 = 4.
Therefore, last non-leaf node = 6.
To build the heap, heapify only the nodes:
[1, 3, 5, 4, 6] in reverse order.
Heapify 6: Swap 6 and 17.
1 / \ 3 5 / \ / \ 4 17 13 10 / \ / \ 9 8 15 6Heapify 4: Swap 4 and 9.
1 / \ 3 5 / \ / \ 9 17 13 10 / \ / \ 4 8 15 6Heapify 5: Swap 13 and 5.
1 / \ 3 13 / \ / \ 9 17 5 10 / \ / \ 4 8 15 6Heapify 3: First Swap 3 and 17, again swap 3 and 15.
1 / \ 17 13 / \ / \ 9 15 5 10 / \ / \ 4 8 3 6Heapify 1: First Swap 1 and 17, again swap 1 and 15, finally swap 1 and 6.
17 / \ 15 13 / \ / \ 9 6 5 10 / \ / \ 4 8 3 1
- The max value now is the key at the root node
- Swap the node with the last element in the array
- Reconstruct the max heap from the n - 1 elements left, repeat step 3
- Repeat step 3 and step 4 until we get the sorted array













