Data plotting, wrangling, and aggregation

ITEC 3160 Python Programming for Data Analysis,
Cengiz Günay

(License: CC BY-SA 4.0)

Prev - Data input/output and cleaning, Next - Instructor Materials

Data plotting

  • See Python for Data Analysis, Chapter 9

Main library is matplotlib:

  • Controls how figures are laid out and decorated (axes, labels, colors, line styles, etc)
  • Can be used directly
  • Pandas uses it indirectly, but allows using matplotlib commands

Anatomy of a plot in matplotlib

  1. figure: The graphical area for one figure
  2. axis: Labeled axis for XY, can be superimposed or tiled
  3. plot: Individual shapes (lines, rectangles, etc) drawn on an axis

Types of Plots

(see plotting in pandas using matplotlib )

Data wrangling

  • See Python for Data Analysis, Chapter 8

Hierarchical indexing:

  • partial indexing
  • unstack() method converts to DataFrame
  • stack() is the reverse
  • swaplevel() for reordering hierarchical indices
  • sort_index() for sorting by one index
  • Summary statistics with vector operators, such as sum(level=, axis=)

Combining and merging

  • merge() by using keys (indices) like the SQL join operator
    • inner, left, right, and outer joins possible
  • concat() for stacking objects

Reshape and pivot

  • stack/ vs unstack
  • reshape
  • “long” vs “wide” format
    • pivot vs melt

Data aggregation

  • See Python for Data Analysis, Chapter 10

Groupby: split-apply-combine

  • selecting column(s) and index levels
  • aggregation functions
  • apply() arbitrary functions

Pivot tables and cross-tabulation

  • pivot_table
  • crosstab
Home