[Tensorflow] Overfitting and Underfitting
1. Interpreting the Learning Curves
We might think about the information in the training data as being of two kinds : signal and noise. The signal is the part that generalizes, the part that can help our model make predictions from new data. The noise is that part is only true of the training data; the noise is all of the random fluctuation that comes from data in the real-world or all of the incidental, non-informative patterns that can't actually help the model make predictions. The noise is the part might look userful but really isn't.
We train a model by choosing weights or parameters that minimize the loss on a training set. You might know, however, that to accurately assess a model's performances, we need to evaluate it ona new set of data, the validation data.
When we train a model we've been plotting the loss on the training set epoch by epoch. To this we'll add a plot the validation data too. These plots we call the learning curves. To train deep learning effectively, we need to be able to interpret them.
Now, the training loss will go down either when the model learns signal or when it learns noise. But the validation loss will go down only when the model learns signal. (Whatever noise the model learned from the training set won't generalize to new data.) So, when a model learns signal both curves go down, but when it learns noise a gap is created in the curves. The size of the gap tells us how much noise the model has learned.
Ideally, we would create model that learn all of the signal and none of the noise. This will practically never happen. Instead we make a trade, we can get the model to learn more signal at the cost of learning more noise. So long as the trade is in our favor, the validation loss will continue to decrease. After the certain point, however, the trade can turn agains us, the cost exceeds the benefit, and the validation loss begins to rise.
This trade-off indicates that there can be two problems that occur when training a model : not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough signal. Overfitting the training set is when the loss is not as low as it could be because the model learned too much noise. The trick to training deep learning model is finding the best balance between the two.
2. Capacity
A model's capacity refers to the size and complexity of the patterns it is able to learn. For nueral networks, this will largely be determined by how many neurons it has and how they are connected together. If it appears that our network is underfitting the data, we should try increasing its capacity.
We can increase the capacity of a network either by making it wider (more units to existing layers) or by making it deeper (adding more layers). Wider networks have an easier time learning more linear relationship, while deeper networks prefer more non-linear ones.
model = keras.Sequential([
layers.Dense(16, activation='relu'),
layers.Dense(1),
])
wider_model = keras.Sequential([
layers.Dense(32, activation='relu'),
layers.Dense(1),
])
deeper_model = keras.Sequential([
layers.Dense(16, activation='relu'),
layers.Dense(16, activation='relu'),
layers.Dense(1),
])
3. Early Stopping
We mentioned that when a model is too eagerly learning noise, the validation loss may start to increase during training. To prevent this, we can simply stop the training whenever it seems the validation loss isn't decrease anymore. Interupting the training this way is called early stopping.
Once we detect that the validation loss is starting to rise again, we can reset the weights back to where the minimum occur. This ensures that the model won't continue to learn noise and overfit the data.
Training with early stopping also means we're in less danger of stopping the training too early, before the network has finished learning signal. So besides preventing overfitting from training too long, early stopping can also prevent underfitting from not training long enough. Just set our training epochs to some large number, and early stopping will take care of the reset.
4. Apply Early Stopping
In Keras, we can include early stopping in our training through a call back. A callback is just a function we want run every so often while the network trains.
from tensorflow import keras
from tensorflow.keras import layers, callbacks
early_stopping = callbacks.EarlyStopping(
min_delta=0.001, # minimium amount of change to count as an improvement
patience=20, # how many epochs to wait before stopping
restore_best_weights=True,
)
model = keras.Sequential([
layers.Dense(512, activation='relu', input_shape=[11]),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(1),
])
model.compile(
optimizer='adam',
loss='mae',
)
history = model.fit(
X_train, y_train,
validation_data=(X_valid, y_valid),
batch_size=256,
epochs=500,
callbacks=[early_stopping], # put your callbacks in a list
verbose=0, # turn off training log
)
Source from : https://www.kaggle.com/learn