Data wrangling
- See Python for Data Analysis, Chapter 8
Hierarchical indexing:
- partial indexing
unstack() method converts to DataFrame
stack() is the reverse
swaplevel() for reordering hierarchical indices
sort_index() for sorting by one index
- Summary statistics with vector operators, such as
sum(level=, axis=)
Combining and merging
merge() by using keys (indices) like the SQL join operator
- inner, left, right, and outer joins possible
concat() for stacking objects
Reshape and pivot
stack/ vs unstack
reshape
pivot vs melt
Data cleaning
- Missing data with N/A, NaN, and NULL values
- Filtering missing data out
- Filling in missing data values
- Eliminating duplicates
- Replacing values
- Adding new calculated columns
- Cosmetics (axis labels, etc)
- Discretization
- Outliers
- Random sampling and shuffling
- String manipulation and regular expressions