1. What is Scaling?
Scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the preprocessing step.
Since the range of values of raw data varies widely, in some machine learning algorihtms, objective functions will not work properly without normalization. Another reason why feature scaling is applied is that gradient dseceent converges much faster with feature scaling than without it.
2. Approach for using Scaler
Rescaling(Min-Max Scaler)
Rescaling is the simplest method and consists in rescaling the range of features to scale the range [0, 1] or [-1, 1]. The general formula for a min-max of [0, 1] is given as :
$$x' = \frac{x - min(x)}{max(x) - min(x)}$$
Mean Normalization
$$x' = \frac{x-\bar{x}}{max(x) - min(x)}$$
Standardization
Feature standardization makes the values of each feature in the data have zero-mean and unit-variance. This method is widely used for normalization in many machine learning algorithms. The general method of calculation is to determine the distribution mean and standard deviation for each feature.
$$x' = \frac{x-\bar{x}}{\sigma}$$
3. How to use Scaler?
MinMaxScaler
The scikit-learn library has min-max scaling function called MinMaxScaler. It transform feature by scaling each feature to a given range.
- sklearn.preprocessing.MinMaxScaler(feature_range = (0, 1))
- Parameters
- feature_range : Desired range of transformed data.
- Methods
- fit(X) : Compute the minimum and maximum to be used for later scaling.
- fit_transfrom(X) : Fit to data, then transform it.
- transform(X) : Scale features of X according to feature_range.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
StabdardScaler
The scikit-learn library has standardization scaling function called StandardScaler. It standardize features by removing the mean and scaling to unit variance.
- sklearn.preprocessing.StandardScaler()
- Methods
- fit(X) : Compute the minimum and maximum to be used for later scaling.
- fit_transfrom(X) : Fit to data, then transform it.
- transform(X) : Scale features of X according to feature_range.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
Source from :
- https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02
- https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
- https://www.kaggle.com/code/rtatman/data-cleaning-challenge-scale-and-normalize-data
'Data Science > Scikit-Learn' 카테고리의 다른 글
[Sklearn] Pipeline (0) | 2022.09.20 |
---|---|
[Sklearn] Transforming Columns by its Type (0) | 2022.09.20 |
[Sklearn] Feature Engineering Skills (0) | 2022.09.20 |
[Sklearn] Dealing Categorical Variables : Encoders (0) | 2022.09.20 |
[Sklearn] Dealing Missing Values : Imputers (0) | 2022.09.20 |