Transfer learning with VGG16 architecture

Brayan Florez
5 min readSep 26, 2020

Abstract

I used the transfer learning technique using vgg16 architecture to CIFAR10 dataset which consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. On the test data, I achieved loss: 0.0579 and accuracy: 0.9108.

Introduction

Current approaches in deep learning make essential use of machine learning methods, to improve the performance and save some time there is a technique called transfer learning where pre-trained models are used as the starting point on some tasks like image recognition, natural language processing, and so on which saves a lot of time training the model since it is possible to freeze some layers of some of the architectures that have been previously made and tested.

I was given a task which was the following “train a convolutional neural network to classify the CIFAR 10 dataset using transfer learning”

Materials and methods

I was allowed to use any of the Keras applications so I decided to user the VGG16 architecture which is the following

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes.

Here are the classes in the dataset, as well as 10 random images from each:

The model had to have a validation accuracy of over 87% which I fulfilled thanks to the VGG16 architecture.

Results

Case 1:

Applying batch normalization just once before the VGG16 architecture was implemented to the model I got the following results

I froze almost every single layer of the VGG16 leaving only two trainable which allowed me to pass the weights to those trainable layers so the values my model had were trained in only those two layers. I started the model and first I used a UpSampling2D which helped me a lot to improve the values of the model then I used BatchNormalization in order to get the values in a range between 0 and 1. Then, I built the fully connected layers with a Dropout (consists of setting to zero the output of each hidden neuron with a probability of 20% avoiding overfitting).

Epoch 1/30 100/100 [==============================] - 11s 113ms/step - loss: 0.2090 - accuracy: 0.5227 - val_loss: 0.1188 - val_accuracy: 0.7578 
Epoch 2/30 100/100 [==============================] - 11s 111ms/step - loss: 0.1288 - accuracy: 0.7392 - val_loss: 0.1035 - val_accuracy: 0.7867
Epoch 3/30 100/100 [==============================] - 11s 112ms/step - loss: 0.1051 - accuracy: 0.7909 - val_loss: 0.0896 - val_accuracy: 0.8195
Epoch 4/30 100/100 [==============================] - 12s 123ms/step - loss: 0.0939 - accuracy: 0.8169 - val_loss: 0.0796 - val_accuracy: 0.8375
Epoch 5/30 100/100 [==============================] - 11s 114ms/step - loss: 0.0760 - accuracy: 0.8537 - val_loss: 0.0719 - val_accuracy: 0.8633
...Epoch 30/30 100/100 [==============================] - 11s 114ms/step - loss: 0.0114 - accuracy: 0.9802 - val_loss: 0.0748 - val_accuracy: 0.9133

When evaluating the dataset the results I got were the following

79/79 [==============================] - 3s 41ms/step - loss: 0.0736 - accuracy: 0.9089

Case 2:

Applying batch normalization just every single time before the VGG16 architecture and dense layer was implemented to the model I got the following results

Also, it is clear to notice that I also changed the inputs of the dense layers to 128 and 64 respectively.

Epoch 1/30 100/100 [==============================] - 23s 234ms/step - loss: 0.2678 - accuracy: 0.3877 - val_loss: 0.1896 - val_accuracy: 0.6187 
Epoch 2/30 100/100 [==============================] - 23s 228ms/step - loss: 0.1651 - accuracy: 0.6716 - val_loss: 0.1092 - val_accuracy: 0.7828
Epoch 3/30 100/100 [==============================] - 23s 227ms/step - loss: 0.1301 - accuracy: 0.7628 - val_loss: 0.0995 - val_accuracy: 0.8203
Epoch 4/30 100/100 [==============================] - 25s 249ms/step - loss: 0.1156 - accuracy: 0.8004 - val_loss: 0.0904 - val_accuracy: 0.8336
Epoch 5/30 100/100 [==============================] - 23s 227ms/step - loss: 0.0962 - accuracy: 0.8416 - val_loss: 0.0802 - val_accuracy: 0.8570
...
Epoch 30/30 100/100 [==============================] - 23s 227ms/step - loss: 0.0143 - accuracy: 0.9864 - val_loss: 0.0560 - val_accuracy: 0.9109

While training the model, we can see it starts a little bit worse than the previous one implemented but as it increases in epochs it gets better.

In the end, we can see it barely improves the validation accuracy but it really improves the validation loss.

When evaluating the dataset the results I got were the following.

79/79 [==============================] - 7s 89ms/step - loss: 0.0579 - accuracy: 0.9108

In both cases, the evaluation is done over 79 tests since the batch size is of 128.

I realized in both cases that the learning rate played a big role here since as I changed it all the values of the model (training loss, training accuracy, validation loss, and validation accuracy) changed a lot, and the best learning rate value I got by testing like 100 times was 3e-5 using the RMSprop optimizer. I also tried with Adam but RMSprop seemed to work better. In the end, the results I got after every epoch was completed were the following ones.

Conclusions

By adding batch normalization to every single dense layer before it gets added to the model helps us to improve the model a bit. I also tried with more than 3 layers but it doesn’t get the same amazing results, so we can say 3 layers is enough to get a model of over 90% of validation accuracy which is not bad.

Literature Cited

--

--