본문 바로가기

Data Science/Scikit-Learn

[Sklearn] Hyperparameter Tuning using Grid Search

1. What is Hyperparameter Tuning?

Hyperparameter is a characteristic of a model that is external to the model and whose value cannot be estimated from data. The value of the hyperparameter has to be set before the learning process begins. For example, c in Support Vector Machines, k in K-Nearest Neighbors, the number of hidden layers in Neural Networks.

 

Grid Search is exploratory way to find hyperparameters making best score of model.

 

2. How to do GridSearching?

The scikit-learn library has a function called GridSearchCV. It helps searching specified parameter values for an estimator. GridSearchCV implements a "fit" and a "score" method. It also implements "score_samples", "predict", "predict_proba" if they are implemented in the estimator used.

 

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

 

GridSearchCV

  • sklearn.model_selection.GridSearchCV(estimator, param_grid, , scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, pre_dispatch='2n_jobs', error_score=nan, return_train_score=False)
  • Parameters
    • estimator : This is assumed to implement the scikit-learn estimator interface
    • param_grid : Dictionary with parameters names as keys and lists of parameters setting to try as value.
    • n_jobs : Number of jobs to run in parallel
    • cv : Determines the cross-validation splitting strategy
  • Attributes
    • cv_results : A dict keys as column headers and values as columns, that can be imported into a pandas DataFrame.
    • best_estimator__ : Estimator that was chosen by the search
    • best_score__ : Mean cross-validation score of the best_estimator
    • best_params__ : Parameter setting that gave the best results on the hold out data

 

from sklearn.model_selection import GridSearchCV
forest_params = {"max_depth": range(6, 12), "max_features": range(4, 19)}
forest_grid = GridSearchCV(forest, forest_params, cv=5, n_jobs=-1, verbose=True)

forest_grid.fit(X_train, y_train)
forest_grid.best_params_, forest_grid.best_score_  
# ({'max_depth': 9, 'max_features': 6}, 0.951)

best_accuracy_model = forest_grid.best_estimator_

 

 

Source from :

'Data Science > Scikit-Learn' 카테고리의 다른 글

[Sklearn] Pipeline  (0) 2022.09.20
[Sklearn] Transforming Columns by its Type  (0) 2022.09.20
[Sklearn] Scalers  (0) 2022.09.20
[Sklearn] Feature Engineering Skills  (0) 2022.09.20
[Sklearn] Dealing Categorical Variables : Encoders  (0) 2022.09.20