CodeBuzz Helpers

dark-mode-icon

Data Preprocessing Overview

Once a data set has been sufficiently cleaned and explored for the task at hand, it needs to be processed in such a way that a machine learning model can use it to solve the problem outlined. This involves several steps, outlined below:

  1. Import libraries for machine learning, and load in data set.
  2. Perform a validation split (train/test split).
  3. Transform features to be standardized and become numerical data types.
  4. Use ETL style pipelines to create a preprocessing object that can be reused.
  5. Finally, use the preprocessor to transform the data into the needed format.

After completing these steps, the data will be ready to plug into a machine learning model!