Data Preprocessing Overview
Once a data set has been sufficiently cleaned and explored for the task at hand, it needs to be processed in such a way that a machine learning model can use it to solve the problem outlined. This involves several steps, outlined below:
- Import libraries for machine learning, and load in data set.
- Perform a validation split (train/test split).
- Transform features to be standardized and become numerical data types.
- Use ETL style pipelines to create a preprocessing object that can be reused.
- Finally, use the preprocessor to transform the data into the needed format.
After completing these steps, the data will be ready to plug into a machine learning model!