Unit III — Neural Networks in Practice
Chapter 11 — Transfer Learning & Fine-tuning
Framework
Fill the placeholders below with your full content.
Chapter 11 — Transfer Learning & Fine-tuning
Unit III · Neural Networks in Practice
Objectives
Understand transfer learning strategies · Fine-tune MobileNetV2 and VGG16 · Know deployment considerations for production models
1. Transfer Learning Concept
Transfer learning repurposes a model pre-trained on a large source task (e.g., ImageNet) for a different but related target task. The pre-trained model has already learned rich feature representations (edges, textures, shapes, objects) that transfer well.
Two main strategies:
- Feature extraction: freeze all pre-trained layers; replace and train only the classification head. Fast, works well with small target datasets.
- Fine-tuning: unfreeze some (or all) pre-trained layers and train with a low learning rate. Better accuracy when target dataset is larger or domain is different.
When to use which strategy
Small dataset + similar domain → feature extraction only.
Small dataset + different domain → fine-tune top layers only.
Large dataset + similar domain → fine-tune more layers.
Large dataset + different domain → fine-tune all layers (or train from scratch).
Small dataset + different domain → fine-tune top layers only.
Large dataset + similar domain → fine-tune more layers.
Large dataset + different domain → fine-tune all layers (or train from scratch).
2. MobileNetV2 Workflow
MobileNetV2 uses inverted residuals + depthwise separable convolutions. Very efficient (3.4M params) — designed for mobile/edge deployment.
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, Model
base = MobileNetV2(weights='imagenet', include_top=False,
input_shape=(224,224,3))
base.trainable = False # freeze base
x = base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.3)(x)
out = layers.Dense(num_classes, activation='softmax')(x)
model = Model(base.input, out)
model.compile('adam', 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_ds, epochs=10, validation_data=val_ds)
# Fine-tuning: unfreeze last 20 layers
base.trainable = True
for layer in base.layers[:-20]:
layer.trainable = False
model.compile(keras.optimizers.Adam(1e-5), 'sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_ds, epochs=5, validation_data=val_ds)
3. VGG16 Workflow
VGG16 is a heavier architecture (138M params) with 13 conv + 3 FC layers, all using 3×3 kernels. Good baseline; widely used in research for feature extraction.
from tensorflow.keras.applications import VGG16
base = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))
base.trainable = False
# Note: VGG16 does not use BatchNorm — add it in the head
x = layers.Flatten()(base.output)
x = layers.Dense(512, activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.5)(x)
out = layers.Dense(num_classes, activation='softmax')(x)
model = Model(base.input, out)
VGG16 is large (528 MB) — prefer MobileNetV2 for deployment; use VGG16 as a feature extractor for research/benchmarking.
4. Deployment Considerations
- Model size: MobileNetV2 (14 MB) vs VGG16 (528 MB) — matters for mobile/edge.
- Quantisation: convert float32 → int8 weights (TensorFlow Lite) — 4× smaller, ~2× faster inference.
- Pruning: remove near-zero weights → sparse model, compressible.
- TFLite / ONNX: cross-platform inference formats; TFLite for Android/iOS; ONNX for cross-framework.
- Preprocessing pipeline: must match training: same normalization, resize, color format (RGB vs BGR for VGG).
# Save for deployment
model.save('my_model.h5') # Keras HDF5
model.save('my_model') # SavedModel format
# Convert to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model('my_model')
tflite_model = converter.convert()
open('model.tflite', 'wb').write(tflite_model)
Worked Example — Dog vs Cat with MobileNetV2
# Data pipeline with augmentation
train_ds = tf.keras.utils.image_dataset_from_directory(
'data/train', image_size=(224,224), batch_size=32)
val_ds = tf.keras.utils.image_dataset_from_directory(
'data/val', image_size=(224,224), batch_size=32)
# Normalise to [-1,1] as MobileNetV2 expects
preprocess = tf.keras.applications.mobilenet_v2.preprocess_input
train_ds = train_ds.map(lambda x,y: (preprocess(x), y)).prefetch(2)
val_ds = val_ds.map(lambda x,y: (preprocess(x), y)).prefetch(2)
# ... build model as above, compile, fit
# Typically achieves >97% val accuracy on dogs-vs-cats in 10 epochs
Viva Questions
- What is negative transfer and when does it occur?
- Why should you use a lower learning rate when fine-tuning pre-trained layers?
- What is the difference between feature extraction and fine-tuning in transfer learning?
- Why does MobileNetV2 preprocess_input expect inputs in [-1,1] range?
- Name two model compression techniques for deployment on mobile devices.