Data plotting
- See Python for Data Analysis, Chapter 9
Main library is matplotlib
:
- Controls how figures are laid out and decorated (axes, labels, colors, line styles, etc)
- Can be used directly
- Pandas uses it indirectly, but allows using
matplotlib
commands
Anatomy of a plot in matplotlib
- figure: The graphical area for one figure
- axis: Labeled axis for XY, can be superimposed or tiled
- plot: Individual shapes (lines, rectangles, etc) drawn on an axis
Data wrangling
- See Python for Data Analysis, Chapter 8
Hierarchical indexing:
- partial indexing
unstack()
method converts to DataFrame
stack()
is the reverse
swaplevel()
for reordering hierarchical indices
sort_index()
for sorting by one index
- Summary statistics with vector operators, such as
sum(level=, axis=)
Combining and merging
merge()
by using keys (indices) like the SQL join operator
- inner, left, right, and outer joins possible
concat()
for stacking objects
Reshape and pivot
stack/
vs unstack
reshape
- “long” vs “wide” format
Data aggregation
- See Python for Data Analysis, Chapter 10
Groupby: split-apply-combine
- selecting column(s) and index levels
- aggregation functions
apply()
arbitrary functions
Pivot tables and cross-tabulation