- See Python for Data Analysis, Chapter 6 and 7
Some topics:
- Loading and saving in different data formats
- Common options for loading
- Handling exceptions in formatting
- Selecting index columns
- Reading from URLs
- Reading from databases
- Binary formats (e.g., HDF5)
Data cleaning
- Missing data with N/A, NaN, and NULL values
- Filtering missing data out
- Filling in missing data values
- Eliminating duplicates
- Replacing values
- Adding new calculated columns
- Cosmetics (axis labels, etc)
- Discretization
- Outliers
- Random sampling and shuffling
- String manipulation and regular expressions