This is the next part of the series after the first article. Before diving into analysis, it is often necessary to clean and transform the source data before any meaningful analysis can be done. This can be viewed as six activities: discover, structure, clean, enrich, validate and publish.
a) Importing Data
- DataCamp tutorial and cheat sheets on how to import data using Python and pandas.
b) Data Management
- Towards Data Science article examining the append function in pandas: Access this link.
- DataCamp provides videos examining merges and joins in more detail.
- DataCamp blog post on joining data.
- A breakdown of the different types of joins with further examples.
c) Identifying Data Quality Errors
- Towards Data Science article discussing the how pandas can be used to cleanse data including handling duplicates, missing data and additional information on outliers.
- How a command in Python works and the parameters it takes. Use this link to access information on the duplicated function.
Do you think cleaning data is important? Let me know your thoughts in the comments below.

