SK-Legos

Utilities to do common ML tasks

{}

You can find 1. train_test_split which also resets the dataframes’ indexes 2. MakeFrame 3. ImputeMisingValues 4. Cat2Num 5. Other scikit-lego blocks that I use a lot

MakeFrame

 MakeFrame (column_names)

Convert sklearn’s output to a pandas dataframe Especially useful when working with an ensemble of models

Usage

Call MakeFrame as the last component in your pipeline with the desired column names.

pipeline = Pipeline([
    ...,
    ('output', MakeFrame(['outlier', 'class'])),
])

Refer to this notebook for an example

ImputeMissingValues

 ImputeMissingValues (num_mode=<function mean at 0x106a801b0>,
                      cat_mode='MISSING')

*DataFrame input - DataFrame output During fit - 1. Store imputable value for each column During transform - 2. Impute missing values with imputable value 3. Create a ’{col}_na’ boolean column to tell if cells contained missing value*

LambdaTransformer

 LambdaTransformer (fn)

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

setting and getting parameters used by GridSearchCV and friends;
textual and HTML representation displayed in terminals and IDEs;
estimator serialization;
parameters validation;
data validation;
feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*

MakeFrame

 MakeFrame (column_names)

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

setting and getting parameters used by GridSearchCV and friends;
textual and HTML representation displayed in terminals and IDEs;
estimator serialization;
parameters validation;
data validation;
feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*

Cat2Num

 Cat2Num ()

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

setting and getting parameters used by GridSearchCV and friends;
textual and HTML representation displayed in terminals and IDEs;
estimator serialization;
parameters validation;
data validation;
feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*

SplitDateColumn

 SplitDateColumn (column_names, has_date, has_time, date_format=None)

*Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:

setting and getting parameters used by GridSearchCV and friends;
textual and HTML representation displayed in terminals and IDEs;
estimator serialization;
parameters validation;
data validation;
feature names validation.

Read more in the :ref:User Guide <rolling_your_own_estimator>.*