Unit IV — Deep Learning for NLP & Computer Vision

Chapter 15 — Generative Models: GANs (Framework)

Framework Fill the placeholders below with your full content.

Chapter 15 — Generative Models: GANs

Unit IV · Deep Learning for NLP & Computer Vision

Objectives

Understand the adversarial training framework · Know common GAN variants and stability tricks · Trace the image generation pipeline

1. GAN Basics

Generative Adversarial Networks (Goodfellow et al., 2014) consist of two networks in competition:

Generator \(G\): takes random noise \(z \sim p_z\) → produces fake samples \(G(z)\) that try to look real.
Discriminator \(D\): takes a sample (real or generated) → outputs probability of being real.

Minimax objective:

\[\min_G \max_D \;\mathcal{L}(D,G) = \mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1-D(G(z)))]\]

G wants D to be fooled (maximise D(G(z)) → 0.5); D wants to distinguish real from fake.

Nash Equilibrium

At convergence: \(p_G = p_{\text{data}}\) and \(D(x) = 0.5\) everywhere — D can no longer distinguish real from generated samples.

Exam-ready points

In practice, G minimises \(-\log D(G(z))\) (non-saturating loss) instead of \(\log(1-D(G(z)))\) for better early gradients.
Latent space \(z\): low-dimensional (e.g., 100-dim) noise vector.

2. Training GANs (Stability)

GAN training is notoriously unstable. Common failure modes:

Mode collapse: G produces limited variety — only a few types of samples. Fix: mini-batch discrimination, Wasserstein loss.
Vanishing gradients for G: D becomes too good → near-zero gradients for G. Fix: balance D/G training, use non-saturating loss.
Oscillation: D and G chase each other without converging.

Best practices for stable training:

Use DCGAN guidelines: strided conv (no pool), BatchNorm in G and D, LeakyReLU in D, ReLU in G, tanh output of G.
Label smoothing: use 0.9 for real labels (not 1.0) to regularise D.
Wasserstein GAN (WGAN): uses Wasserstein distance; removes log, adds weight clipping or gradient penalty (WGAN-GP) — more stable, meaningful loss curves.
Train D for \(k\) steps per G step (typical: k=1–5).

Exam-ready points

WGAN Critic (not Discriminator): outputs unbounded score, not probability.
Gradient penalty (WGAN-GP): \(\mathcal{L}_{\text{GP}} = \lambda\,\mathbb{E}[(\|\nabla_{\hat{x}} D(\hat{x})\|_2 - 1)^2]\).

3. Image Generation Pipeline (DCGAN)

# Generator: z(100) → 7×7×256 → 14×14×128 → 28×28×64 → 28×28×1
def build_generator(z_dim=100):
    model = keras.Sequential([
        layers.Dense(7*7*256, use_bias=False, input_shape=(z_dim,)),
        layers.BatchNormalization(), layers.LeakyReLU(),
        layers.Reshape((7,7,256)),
        layers.Conv2DTranspose(128,(5,5), strides=1, padding='same', use_bias=False),
        layers.BatchNormalization(), layers.LeakyReLU(),
        layers.Conv2DTranspose(64, (5,5), strides=2, padding='same', use_bias=False),
        layers.BatchNormalization(), layers.LeakyReLU(),
        layers.Conv2DTranspose(1,  (5,5), strides=2, padding='same',
                               use_bias=False, activation='tanh')
    ]); return model

# Discriminator: 28×28×1 → feature maps → scalar
def build_discriminator():
    model = keras.Sequential([
        layers.Conv2D(64,(5,5), strides=2, padding='same', input_shape=(28,28,1)),
        layers.LeakyReLU(), layers.Dropout(0.3),
        layers.Conv2D(128,(5,5), strides=2, padding='same'),
        layers.LeakyReLU(), layers.Dropout(0.3),
        layers.Flatten(), layers.Dense(1)   # no sigmoid → use from_logits
    ]); return model

Notable GAN variants: Conditional GAN (cGAN) — condition on class label; CycleGAN — unpaired image-to-image translation; StyleGAN — high-quality face synthesis with disentangled latent space; Pix2Pix — paired image translation.

Exam-ready points

FID (Fréchet Inception Distance): lower is better; measures distribution overlap between real and generated images.
IS (Inception Score): higher is better; measures image quality + diversity.

Worked Example — GAN training loop

@tf.function
def train_step(real_images):
    noise = tf.random.normal([BATCH, z_dim])
    with tf.GradientTape() as gTape, tf.GradientTape() as dTape:
        fake   = generator(noise, training=True)
        r_out  = discriminator(real_images, training=True)
        f_out  = discriminator(fake,        training=True)
        d_loss = bce(tf.ones_like(r_out), r_out) + bce(tf.zeros_like(f_out), f_out)
        g_loss = bce(tf.ones_like(f_out),  f_out)
    d_grads = dTape.gradient(d_loss, discriminator.trainable_variables)
    g_grads = gTape.gradient(g_loss, generator.trainable_variables)
    d_opt.apply_gradients(zip(d_grads, discriminator.trainable_variables))
    g_opt.apply_gradients(zip(g_grads, generator.trainable_variables))

Exercises

Write the GAN minimax objective and explain each term.
What is mode collapse? How does WGAN address it?
Compare DCGAN and vanilla GAN architectures.

Viva Questions

Explain the roles of the Generator and Discriminator in a GAN.
What is Nash Equilibrium in the context of GANs?
What is mode collapse and how do you detect it?
How does WGAN improve training stability over vanilla GAN?
What is FID and why is it preferred over Inception Score for evaluation?