Data plotting
- See Python for Data Analysis, Chapter 9
Main library is matplotlib:
- Controls how figures are laid out and decorated (axes, labels, colors, line styles, etc)
- Can be used directly
- Pandas uses it indirectly, but allows using
matplotlib commands
Anatomy of a plot in matplotlib
- figure: The graphical area for one figure
- axis: Labeled axis for XY, can be superimposed or tiled
- plot: Individual shapes (lines, rectangles, etc) drawn on an axis
Data wrangling
- See Python for Data Analysis, Chapter 8
Hierarchical indexing:
- partial indexing
unstack() method converts to DataFrame
stack() is the reverse
swaplevel() for reordering hierarchical indices
sort_index() for sorting by one index
- Summary statistics with vector operators, such as
sum(level=, axis=)
Combining and merging
merge() by using keys (indices) like the SQL join operator
- inner, left, right, and outer joins possible
concat() for stacking objects
Reshape and pivot
stack/ vs unstack
reshape
- “long” vs “wide” format
Data aggregation
- See Python for Data Analysis, Chapter 10
Groupby: split-apply-combine
- selecting column(s) and index levels
- aggregation functions
apply() arbitrary functions
Pivot tables and cross-tabulation