Unit III — Neural Networks in Practice

Chapter 9 — CNNs: Convolution, Pooling, Architectures

Framework Fill the placeholders below with your full content.

Chapter 9 — CNNs: Convolution, Pooling, Architectures

Unit III · Neural Networks in Practice

Objectives

Understand convolution and pooling operations · Know key CNN architectural patterns (LeNet → ResNet) · Apply CNNs to image classification tasks

1. Convolution Layer

A convolutional layer applies learned filters (kernels) across the spatial dimensions of the input. For a 2D input \(I\) and kernel \(K\) of size \(f\times f\):

\[(I * K)[i,j] = \sum_{m=0}^{f-1}\sum_{n=0}^{f-1} I[i+m, j+n]\cdot K[m,n]\]

Key parameters: filters (number of output channels), kernel_size, stride (step size), padding (same vs valid). Output spatial size:

\[\text{out} = \left\lfloor\frac{n - f + 2p}{s}\right\rfloor + 1\]

Why convolutions?

Parameter sharing: one filter detects the same feature anywhere in the image.
Local connectivity: each output depends on a small region — efficient, exploits spatial structure.

layers.Conv2D(32, kernel_size=3, strides=1, padding='same', activation='relu')

Exam-ready points

A conv layer with 32 filters of size 3×3 on a 28×28×1 input: 32×(3×3×1+1) = 320 parameters.
"same" padding: output has same spatial size as input. "valid": no padding, output shrinks.
1×1 convolutions: channel mixing without spatial processing — used in Inception/ResNet.

2. Pooling and Flattening

Max Pooling: takes the maximum value in each pool window — retains the most prominent feature activation. Provides spatial invariance.

Average Pooling: takes the mean — smoother representation.

Global Average Pooling (GAP): reduces each channel's spatial map to a single number — replaces Flatten + Dense in modern architectures, reducing overfitting.

layers.MaxPooling2D(pool_size=2, strides=2)   # halves spatial dimensions
layers.GlobalAveragePooling2D()               # modern alternative to Flatten

Exam-ready points

Pooling has no learnable parameters.
After GAP on a 7×7×512 feature map → 512-dim vector (no 7×7×512 = 25,088 flattening needed).

3. CNN Architecture Patterns

Architecture	Year	Key innovation	Depth
LeNet-5	1998	First successful CNN (digits)	5
AlexNet	2012	ReLU, Dropout, GPU training	8
VGGNet	2014	All 3×3 kernels, uniform design	16/19
GoogLeNet	2014	Inception modules, GAP	22
ResNet	2015	Skip connections → 152 layers	50/101/152
MobileNet	2017	Depthwise separable conv → mobile	28

4. Practical CNN Checklist

Input: normalise to [0,1] or zero-mean/unit-var; use data augmentation (flip, crop, rotate).
Architecture: Conv → BN → ReLU blocks; reduce spatial size gradually; increase channels gradually.
Regularisation: Dropout after FC layers; L2 weight decay; data augmentation.
Optimiser: Adam or SGD with momentum + cosine LR decay.
Evaluation: confusion matrix, per-class precision/recall for imbalanced datasets.

data_aug = keras.Sequential([
    layers.RandomFlip('horizontal'),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
])

Worked Example — Small CNN for CIFAR-10

model = keras.Sequential([
    layers.Conv2D(32, 3, padding='same', activation='relu', input_shape=(32,32,3)),
    layers.BatchNormalization(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(2), layers.Dropout(0.2),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.BatchNormalization(),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(2), layers.Dropout(0.3),
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'), layers.Dropout(0.4),
    layers.Dense(10, activation='softmax')
])
model.compile('adam', 'sparse_categorical_crossentropy', metrics=['accuracy'])
# Achieves ~80% val accuracy on CIFAR-10 in 30 epochs

Viva Questions

How many parameters does a Conv2D layer with 64 filters (3×3) applied to a 32-channel input have?
What is the benefit of parameter sharing in CNNs?
Explain how ResNet's skip connections solve the vanishing gradient problem.
What is depthwise separable convolution (used in MobileNet) and why is it efficient?
Compare Global Average Pooling vs Flatten before the output Dense layer.