1. What is ColumnTransformer?
The scikit-learn library has special function called 'ColumnTransformer'. It applies transformers to columns of an array or pandas DataFrame.
This estimators allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.
2. How to use ColumnTransformer?
ColumnTransfomer
- sklearn.compose.ColumnTransfomer(transformers, *, remainder='drop', ...)
- Parameters
- transformer : list of tuples
- name : Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using set_params and searched in grid search.
- transformer : Estimator must support fit and transform. (e.g. MinMaxScaler, StandardScaler, OneHotEnocder, ...)
- columns : Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name.
- transformer : list of tuples
# Select categorical columns with relatively low cardinality (convenient but arbitrary)
categorical_cols = [cname for cname in X_train_full.columns if X_train_full[cname].nunique() < 10 and X_train_full[cname].dtype == 'object']
# Select numerical columns
numerical_cols = [cname for cname in X_train_full.columns if X_train_full[cname].dtype in ['int64', 'float64']]
# Import libraries
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
# Preprocessing for numerical data
numerical_transformer = SimpleImputer(strategy = 'constant')
# Preprocessing for categorical data
categorical_transformer = Pipeline(steps = [('imputer', SimpleImputer(strategy = 'most_frequent')),
('onehot', OneHotEncoder(handle_unknown = 'ignore'))])
# Bundle preprocessing for numercal and categorical data
preprocessor = ColumnTransformer(
transformers = [('num', numerical_transformer, numerical_cols),
('cat', categorical_transformer, categorical_cols)])
Source from : https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html
'Data Science > Scikit-Learn' 카테고리의 다른 글
[Sklearn] Hyperparameter Tuning using Grid Search (0) | 2022.09.20 |
---|---|
[Sklearn] Pipeline (0) | 2022.09.20 |
[Sklearn] Scalers (0) | 2022.09.20 |
[Sklearn] Feature Engineering Skills (0) | 2022.09.20 |
[Sklearn] Dealing Categorical Variables : Encoders (0) | 2022.09.20 |