SK-Legos

Utilities to do common ML tasks

You can find 1. train_test_split which also resets the dataframes’ indexes 2. MakeFrame 3. ImputeMisingValues 4. Cat2Num 5. Other scikit-lego blocks that I use a lot


MakeFrame

 MakeFrame (column_names)

Convert sklearn’s output to a pandas dataframe Especially useful when working with an ensemble of models

Usage

Call MakeFrame as the last component in your pipeline with the desired column names.

pipeline = Pipeline([
    ...,
    ('output', MakeFrame(['outlier', 'class'])),
])

ImputeMissingValues

 ImputeMissingValues (num_mode=<function mean at 0x121057570>,
                      cat_mode='MISSING')

*DataFrame input - DataFrame output During fit - 1. Store imputable value for each column During transform - 2. Impute missing values with imputable value 3. Create a ’{col}_na’ boolean column to tell if cells contained missing value*


LambdaTransformer

 LambdaTransformer (fn)

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

  • setting and getting parameters used by GridSearchCV and friends;
  • textual and HTML representation displayed in terminals and IDEs;
  • estimator serialization;
  • parameters validation;
  • data validation;
  • feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*


MakeFrame

 MakeFrame (column_names)

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

  • setting and getting parameters used by GridSearchCV and friends;
  • textual and HTML representation displayed in terminals and IDEs;
  • estimator serialization;
  • parameters validation;
  • data validation;
  • feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*


Cat2Num

 Cat2Num ()

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

  • setting and getting parameters used by GridSearchCV and friends;
  • textual and HTML representation displayed in terminals and IDEs;
  • estimator serialization;
  • parameters validation;
  • data validation;
  • feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*


SplitDateColumn

 SplitDateColumn (column_names, has_date, has_time, date_format=None)

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

  • setting and getting parameters used by GridSearchCV and friends;
  • textual and HTML representation displayed in terminals and IDEs;
  • estimator serialization;
  • parameters validation;
  • data validation;
  • feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*