Time complexity, big-O notation
- General idea: time it takes for an algorithm to run.
- Complexity: the number of steps involved in a process or pieces that make up a system ?
Complexity makes a difference in everyday use:
Fast and high quality services vs
those that lag, crash, and are unreliable
- In data analytics, we will often work with large amounts of data
- Performance of algorithms matter more with large data
Does this algorithm scale up when it is given a large input?
- Therefore, $O()$ notation indicates growth of time with an input, n
Definitions: Big-O notation
- $O(1)$: constant time, independent of input size, $n$
(e.g. getting an item from an array with an index)
- $O(\log(n))$: less than linear, exponent of $n$’s growth
(e.g., binary tree search; $log_2(4)=2$, $log_2(16)=4$)
- $O(\sqrt{n})$: more than log, less than $n$ (e.g., process one row of matrix data)
- $O(n)$: linear time with size (e.g. summation of every element in list)
- $O(n^k)$: polynomial time ($k$: constant) (e.g., nested loops over whole input)
- $O(2^n)$: exponential time (worst case; e.g., finding best route between two points)
Common algorithms
- Index lookup: $O(1)$
- Hash lookup: $O(1)$ (in Python: dict and sets)
- Binary search: $O(\log_2 n)$
- Search in array: $O(n)$
- Find prime numbers: $O(\sqrt{n})$
- Nested loops: $O(n^k)$
- Find diFFeRent cAPiTAlizations of words: $O(2^n)$
Fork & practice!
Share REPL to work together with teammates. Submit team fork link once on Piazza.
Bitwise operators
Can you solve word capitalizations using binary numbers?
Take advantage of binary representation of computers!
Bitwise operators in Python
:
x & 4 # will be non-zero only if x has 3rd bit on
x | 15 # will turn on lower 4 bits
x >> 4 # shift x by 4 bits to the right (faster than dividing by 2)
x & ~x # will always be 0
x ^ x # will always be 0
Big-O can also apply memory size
- Same idea: $O(1)$, $O(n)$, etc
- Hashing usually requires additional space
- Do not use complex data structures unnecessarily!
Constant time: The holy grail
Why does indexing an array take constant time, $O(1)$?
- In homogeneous arrays, each item takes the same memory space!
- You can multiply the index by item size to find offset of item:
$$ \mathsf{Item~at~index~} i = \mathsf{array~starting~address} + i * \mathsf{size~of~one~item}$$
Hint: In Python, list
can contain heterogeneous items, but each item is
an object reference that take up equal space.
What’s hashing?
- hash is a math function that can calculate a number out anything
- the number would be unique (or close to) for each input
hash("hello") -> 46487
- hash number can be used (with some tricks) as an index
- then you can use the index instead of searching with $O(1)$
Other uses:
- Comparing complex objects
- Sorting (only for values)
- Checksumming internet download packages
About sets
Pros:
- faster lookup times
- ensures uniqueness of items
- awesome methods (union, intersection, difference)
Cons:
- uses more memory
- slower to build and append to than a list
Exercise time! Choose one:
Compare speed of:
- the programs you wrote in last exercise; compare different input sizes
- list vs set operations:
- Searching an item OR
- Appending an item
- finding common elements between two lists by:
- writing an $O(n^2)$ algorithm AND
- using set operations (hint:
intersection
)
To find speed:
- Initialize a long list, then create a set from it
- Start a timer
- Loop a large number of times (e.g. 100,000)
- Put only one operation inside the loop to measure (e.g.,
"hello" in my_list
)
- Stop the timer and find elapsed time
- Report time per operation by dividing with repeat multiplier you selected above
OR use defbench
that GGC graduate Shaun Mitchell made!