Unit IV — Deep Learning for NLP & Computer Vision
Chapter 15 — Generative Models: GANs (Framework)
Framework
Fill the placeholders below with your full content.
Chapter 15 — Generative Models: GANs
Unit IV · Deep Learning for NLP & Computer Vision
Objectives
Understand the adversarial training framework · Know common GAN variants and stability tricks · Trace the image generation pipeline
1. GAN Basics
Generative Adversarial Networks (Goodfellow et al., 2014) consist of two networks in competition:
- Generator \(G\): takes random noise \(z \sim p_z\) → produces fake samples \(G(z)\) that try to look real.
- Discriminator \(D\): takes a sample (real or generated) → outputs probability of being real.
Minimax objective:
\[\min_G \max_D \;\mathcal{L}(D,G) = \mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1-D(G(z)))]\]
G wants D to be fooled (maximise D(G(z)) → 0.5); D wants to distinguish real from fake.
Nash Equilibrium
Exam-ready points
At convergence: \(p_G = p_{\text{data}}\) and \(D(x) = 0.5\) everywhere — D can no longer distinguish real from generated samples.
- In practice, G minimises \(-\log D(G(z))\) (non-saturating loss) instead of \(\log(1-D(G(z)))\) for better early gradients.
- Latent space \(z\): low-dimensional (e.g., 100-dim) noise vector.
2. Training GANs (Stability)
GAN training is notoriously unstable. Common failure modes:
- Mode collapse: G produces limited variety — only a few types of samples. Fix: mini-batch discrimination, Wasserstein loss.
- Vanishing gradients for G: D becomes too good → near-zero gradients for G. Fix: balance D/G training, use non-saturating loss.
- Oscillation: D and G chase each other without converging.
Best practices for stable training:
- Use DCGAN guidelines: strided conv (no pool), BatchNorm in G and D, LeakyReLU in D, ReLU in G, tanh output of G.
- Label smoothing: use 0.9 for real labels (not 1.0) to regularise D.
- Wasserstein GAN (WGAN): uses Wasserstein distance; removes log, adds weight clipping or gradient penalty (WGAN-GP) — more stable, meaningful loss curves.
- Train D for \(k\) steps per G step (typical: k=1–5).
- WGAN Critic (not Discriminator): outputs unbounded score, not probability.
- Gradient penalty (WGAN-GP): \(\mathcal{L}_{\text{GP}} = \lambda\,\mathbb{E}[(\|\nabla_{\hat{x}} D(\hat{x})\|_2 - 1)^2]\).
3. Image Generation Pipeline (DCGAN)
# Generator: z(100) → 7×7×256 → 14×14×128 → 28×28×64 → 28×28×1
def build_generator(z_dim=100):
model = keras.Sequential([
layers.Dense(7*7*256, use_bias=False, input_shape=(z_dim,)),
layers.BatchNormalization(), layers.LeakyReLU(),
layers.Reshape((7,7,256)),
layers.Conv2DTranspose(128,(5,5), strides=1, padding='same', use_bias=False),
layers.BatchNormalization(), layers.LeakyReLU(),
layers.Conv2DTranspose(64, (5,5), strides=2, padding='same', use_bias=False),
layers.BatchNormalization(), layers.LeakyReLU(),
layers.Conv2DTranspose(1, (5,5), strides=2, padding='same',
use_bias=False, activation='tanh')
]); return model
# Discriminator: 28×28×1 → feature maps → scalar
def build_discriminator():
model = keras.Sequential([
layers.Conv2D(64,(5,5), strides=2, padding='same', input_shape=(28,28,1)),
layers.LeakyReLU(), layers.Dropout(0.3),
layers.Conv2D(128,(5,5), strides=2, padding='same'),
layers.LeakyReLU(), layers.Dropout(0.3),
layers.Flatten(), layers.Dense(1) # no sigmoid → use from_logits
]); return model
Notable GAN variants: Conditional GAN (cGAN) — condition on class label; CycleGAN — unpaired image-to-image translation; StyleGAN — high-quality face synthesis with disentangled latent space; Pix2Pix — paired image translation.
Exam-ready points- FID (Fréchet Inception Distance): lower is better; measures distribution overlap between real and generated images.
- IS (Inception Score): higher is better; measures image quality + diversity.
Worked Example — GAN training loop
@tf.function
def train_step(real_images):
noise = tf.random.normal([BATCH, z_dim])
with tf.GradientTape() as gTape, tf.GradientTape() as dTape:
fake = generator(noise, training=True)
r_out = discriminator(real_images, training=True)
f_out = discriminator(fake, training=True)
d_loss = bce(tf.ones_like(r_out), r_out) + bce(tf.zeros_like(f_out), f_out)
g_loss = bce(tf.ones_like(f_out), f_out)
d_grads = dTape.gradient(d_loss, discriminator.trainable_variables)
g_grads = gTape.gradient(g_loss, generator.trainable_variables)
d_opt.apply_gradients(zip(d_grads, discriminator.trainable_variables))
g_opt.apply_gradients(zip(g_grads, generator.trainable_variables))
Exercises
- Write the GAN minimax objective and explain each term.
- What is mode collapse? How does WGAN address it?
- Compare DCGAN and vanilla GAN architectures.
Viva Questions
- Explain the roles of the Generator and Discriminator in a GAN.
- What is Nash Equilibrium in the context of GANs?
- What is mode collapse and how do you detect it?
- How does WGAN improve training stability over vanilla GAN?
- What is FID and why is it preferred over Inception Score for evaluation?