(License: CC BY-SA 4.0)
Prev - Data structures and performance, Next - Data input/output and cleaning
anaconda-navigator
on command lineFollow sections on the official Notebook Examples tutorial:
Do ONE of the following:
From Python for Data Analysis, 2nd Ed, chapter 4 :
1 dimension: 1 bracket, 1 index
arr1d = np.array([1, 2, 3])
arr1d[x]
2 dimensions: 2 brackets, 2 indices
arr2d = np.array([[1, 2, 3],
[4, 5, 6]])
arr2d[x, y]
3 dimensions: …
arr3d = np.array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
arr3d[x, y, z]
ndarray
objects and doing math on themStart by working in teams on the whiteboard and
then submit
individually by forking
this
or create an
online notebook.
Solve ONE of these problems (thanks math people!):
Make sure to:
Do 3 examples of each below and briefly explain each with one sentence:
Vectors: $\vec{x} = [ 1, 2, 3 ]$
Can do bulk operations using math magic:
inner/dot product : $$ \vec{x} \cdot \vec{y} = \sum x_i y_i $$
Example: sum of products to find total price
quantity = np.array([1, 1, 5, 2])
prices = np.array([10, 15, 1.25, 20])
total = np.dot(prices, quantity)
Result: 71.25
$$ \vec{x} \times \vec{y} = [x_i y_j]_{ij} $$
Example:
item_prices = np.full((1,5), 50) # all $50
inflation_per_month = np.array([1.1, 1.3, 1.3, 1.4]) # monthly inflation
new_prices_per_month = np.outer(inflation_per_month, item_prices)
Result:
array([[55., 55., 55., 55., 55.],
[65., 65., 65., 65., 65.],
[65., 65., 65., 65., 65.],
[70., 70., 70., 70., 70.]])
$$ A=\left[ \begin{array}{ccc} a_{11} & \cdots & a_{1n} \newline \vdots & \ddots & \vdots \newline a_{m1} & \cdots & a_{mn} \newline \end{array} \right] $$
Uses:
Must have matching inner dimensions, results in a matrix: $$ A_{m\times n} \times B_{n\times o} = C_{m\times o} $$
Each element of output matrix is the result of one inner product: $$ c_{ij} = \sum_k a_{ik} b_{kj} $$
Rows of $A$ matched to columns of $B$ to create single elements of $C$: $$ \left[ \begin{array}{c} \bbox[5px,yellow,border:2px solid red]{\begin{array}{ccc} a_{11} & \cdots & a_{1n} \end{array}} \newline \begin{array}{ccc} \vdots & \ddots & \vdots \newline a_{m1} & \cdots & a_{mn} \newline \end{array} \end{array} \right] \times \left[ \begin{array}{cc} \bbox[5px,yellow,border:2px solid red]{\begin{array}{c} b_{11} \newline \vdots \newline b_{n1} \end{array}} & \begin{array}{cc} \cdots & b_{1o} \newline \ddots & \vdots \newline \cdots & b_{no} \newline \end{array} \end{array} \right] = \left[ \begin{array}{ccc} \bbox[5px,yellow,border:2px solid red]{c_{11}} & \cdots & c_{1o} \newline \vdots & \ddots & \vdots \newline c_{m1} & \cdots & c_{mo} \newline \end{array} \right] $$
[…] calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. (from original paper )
Loop through your data and calculate mean and standard deviation (or regression, min, max, etc.).
vector = [1,2,3]
sum = 0
for element in vector:
sum += element
mean = sum / len(vector)
Use vector operations to do it shorter and more efficiently. $$ \mu = \sum_{i=1..N} x_i / N $$
import numpy as np
vector = np.array([1,2,3])
mean = np.sum(vector) / len(vector)
Use vectorized numpy operations to calculate standard deviation $$ \sigma = \sqrt{ \sum_{i=1..N} ( x_i - \mu )^2 / ( N - 1 ) } $$ where $N$ is the number of elements in $ \vec{x} $ and $ \mu $ is its mean.
Practice with team on whiteboard/laptop this and the two exercises below.
Use the dot product to calculate total miles covered by all cars:
road_miles
gives a list of different road segments and their
lenght in miles.cars_roads
give the number of cars that passed on
each of the road segments.Example:
road_miles = [108, 5, 10, 52]
cars_roads = [543, 433, 104, 390]
We expect the population to increase by 3% every year. Make a matrix of predictions for each county for the next three years by using:
ga_population
is a list of population numbers (in thousands) for each county.Example:
# dekalb, fulton, gwinnett
ga_population = [757, 1065, 964]
np.std(vector, ddof=1)
in your notebookSeries
and DataFrame
.loc
, and .iloc
)Dataframe
object from any dataSeries
object from your DataFrame
ndarray
object and show indexing exampleDataFrame