Data plotting
- See Python for Data Analysis, Chapter 9
Main library is matplotlib
:
- Controls how figures are laid out and decorated (axes, labels, colors, line styles, etc)
- Can be used directly
- Pandas uses it indirectly, but allows using
matplotlib
commands
Anatomy of a plot in matplotlib
data:image/s3,"s3://crabby-images/8a16c/8a16c0637829610326b0b5ec5dd12260bf9be1dc" alt=""
- figure: The graphical area for one figure
- axis: Labeled axis for XY, can be superimposed or tiled
- plot: Individual shapes (lines, rectangles, etc) drawn on an axis
Data wrangling
- See Python for Data Analysis, Chapter 8
Hierarchical indexing:
- partial indexing
unstack()
method converts to DataFrame
stack()
is the reverseswaplevel()
for reordering hierarchical indicessort_index()
for sorting by one index- Summary statistics with vector operators, such as
sum(level=, axis=)
Combining and merging
merge()
by using keys (indices) like the SQL join operator- inner, left, right, and outer joins possible
concat()
for stacking objects
Reshape and pivot
stack/
vs unstack
reshape
- “long” vs “wide” format
Data aggregation
- See Python for Data Analysis, Chapter 10
Groupby: split-apply-combine
- selecting column(s) and index levels
- aggregation functions
apply()
arbitrary functions
Pivot tables and cross-tabulation