Unit III — Neural Networks in Practice

Chapter 9 — CNNs: Convolution, Pooling, Architectures

Framework Fill the placeholders below with your full content.

Chapter 9 — CNNs: Convolution, Pooling, Architectures

Unit III · Neural Networks in Practice

Objectives
Understand convolution and pooling operations · Know key CNN architectural patterns (LeNet → ResNet) · Apply CNNs to image classification tasks

1. Convolution Layer

A convolutional layer applies learned filters (kernels) across the spatial dimensions of the input. For a 2D input \(I\) and kernel \(K\) of size \(f\times f\):

\[(I * K)[i,j] = \sum_{m=0}^{f-1}\sum_{n=0}^{f-1} I[i+m, j+n]\cdot K[m,n]\]

Key parameters: filters (number of output channels), kernel_size, stride (step size), padding (same vs valid). Output spatial size:

\[\text{out} = \left\lfloor\frac{n - f + 2p}{s}\right\rfloor + 1\]

Why convolutions?
Parameter sharing: one filter detects the same feature anywhere in the image.
Local connectivity: each output depends on a small region — efficient, exploits spatial structure.
layers.Conv2D(32, kernel_size=3, strides=1, padding='same', activation='relu')
Exam-ready points
  • A conv layer with 32 filters of size 3×3 on a 28×28×1 input: 32×(3×3×1+1) = 320 parameters.
  • "same" padding: output has same spatial size as input. "valid": no padding, output shrinks.
  • 1×1 convolutions: channel mixing without spatial processing — used in Inception/ResNet.

2. Pooling and Flattening

Max Pooling: takes the maximum value in each pool window — retains the most prominent feature activation. Provides spatial invariance.

Average Pooling: takes the mean — smoother representation.

Global Average Pooling (GAP): reduces each channel's spatial map to a single number — replaces Flatten + Dense in modern architectures, reducing overfitting.

layers.MaxPooling2D(pool_size=2, strides=2)   # halves spatial dimensions
layers.GlobalAveragePooling2D()               # modern alternative to Flatten
Exam-ready points
  • Pooling has no learnable parameters.
  • After GAP on a 7×7×512 feature map → 512-dim vector (no 7×7×512 = 25,088 flattening needed).

3. CNN Architecture Patterns

ArchitectureYearKey innovationDepth
LeNet-51998First successful CNN (digits)5
AlexNet2012ReLU, Dropout, GPU training8
VGGNet2014All 3×3 kernels, uniform design16/19
GoogLeNet2014Inception modules, GAP22
ResNet2015Skip connections → 152 layers50/101/152
MobileNet2017Depthwise separable conv → mobile28

4. Practical CNN Checklist

  • Input: normalise to [0,1] or zero-mean/unit-var; use data augmentation (flip, crop, rotate).
  • Architecture: Conv → BN → ReLU blocks; reduce spatial size gradually; increase channels gradually.
  • Regularisation: Dropout after FC layers; L2 weight decay; data augmentation.
  • Optimiser: Adam or SGD with momentum + cosine LR decay.
  • Evaluation: confusion matrix, per-class precision/recall for imbalanced datasets.
data_aug = keras.Sequential([
    layers.RandomFlip('horizontal'),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
])

Worked Example — Small CNN for CIFAR-10

model = keras.Sequential([
    layers.Conv2D(32, 3, padding='same', activation='relu', input_shape=(32,32,3)),
    layers.BatchNormalization(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(2), layers.Dropout(0.2),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.BatchNormalization(),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(2), layers.Dropout(0.3),
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'), layers.Dropout(0.4),
    layers.Dense(10, activation='softmax')
])
model.compile('adam', 'sparse_categorical_crossentropy', metrics=['accuracy'])
# Achieves ~80% val accuracy on CIFAR-10 in 30 epochs

Viva Questions

  1. How many parameters does a Conv2D layer with 64 filters (3×3) applied to a 32-channel input have?
  2. What is the benefit of parameter sharing in CNNs?
  3. Explain how ResNet's skip connections solve the vanishing gradient problem.
  4. What is depthwise separable convolution (used in MobileNet) and why is it efficient?
  5. Compare Global Average Pooling vs Flatten before the output Dense layer.
Tip: press Esc to close.