Preprocessing

Data Preprocessing Overview

Once a data set has been sufficiently cleaned and explored for the task at hand, it needs to be processed in such a way that a machine learning model can use it to solve the problem outlined. This involves several steps, outlined below:

Import libraries for machine learning, and load in data set.
Perform a validation split (train/test split).
Transform features to be standardized and become numerical data types.
Use ETL style pipelines to create a preprocessing object that can be reused.
Finally, use the preprocessor to transform the data into the needed format.

After completing these steps, the data will be ready to plug into a machine learning model!

CodeBuzz Helpers

Data Preprocessing Overview