SK-Legos
Utilities to do common ML tasks
You can find 1. train_test_split which also resets the dataframes’ indexes 2. MakeFrame 3. ImputeMisingValues 4. Cat2Num 5. Other scikit-lego blocks that I use a lot
MakeFrame
MakeFrame (column_names)
Convert sklearn’s output to a pandas dataframe Especially useful when working with an ensemble of models
Usage
Call MakeFrame as the last component in your pipeline with the desired column names.
= Pipeline([
pipeline
...,'output', MakeFrame(['outlier', 'class'])),
( ])
- Refer to this notebook for an example
ImputeMissingValues
ImputeMissingValues (num_mode=<function mean at 0x121057570>, cat_mode='MISSING')
*DataFrame input - DataFrame output During fit - 1. Store imputable value for each column During transform - 2. Impute missing values with imputable value 3. Create a ’{col}_na’ boolean column to tell if cells contained missing value*
LambdaTransformer
LambdaTransformer (fn)
*Base class for all estimators in scikit-learn.
Inheriting from this class provides default implementations of:
- setting and getting parameters used by
GridSearchCV
and friends; - textual and HTML representation displayed in terminals and IDEs;
- estimator serialization;
- parameters validation;
- data validation;
- feature names validation.
Read more in the :ref:User Guide <rolling_your_own_estimator>
.*
MakeFrame
MakeFrame (column_names)
*Base class for all estimators in scikit-learn.
Inheriting from this class provides default implementations of:
- setting and getting parameters used by
GridSearchCV
and friends; - textual and HTML representation displayed in terminals and IDEs;
- estimator serialization;
- parameters validation;
- data validation;
- feature names validation.
Read more in the :ref:User Guide <rolling_your_own_estimator>
.*
Cat2Num
Cat2Num ()
*Base class for all estimators in scikit-learn.
Inheriting from this class provides default implementations of:
- setting and getting parameters used by
GridSearchCV
and friends; - textual and HTML representation displayed in terminals and IDEs;
- estimator serialization;
- parameters validation;
- data validation;
- feature names validation.
Read more in the :ref:User Guide <rolling_your_own_estimator>
.*
SplitDateColumn
SplitDateColumn (column_names, has_date, has_time, date_format=None)
*Base class for all estimators in scikit-learn.
Inheriting from this class provides default implementations of:
- setting and getting parameters used by
GridSearchCV
and friends; - textual and HTML representation displayed in terminals and IDEs;
- estimator serialization;
- parameters validation;
- data validation;
- feature names validation.
Read more in the :ref:User Guide <rolling_your_own_estimator>
.*