Unit II — Introduction to Neural Networks

Chapter 4 — Biological Neuron → Computational Unit

Framework Fill the placeholders below with your full content.

Chapter 4 — Biological Neuron → Computational Unit

Unit II · Introduction to Neural Networks

Objectives

Map biological neuron anatomy to the artificial neuron model · Understand activation functions · Appreciate how depth increases model capacity

1. Biological Neuron

A biological neuron has: dendrites (receive signals), cell body / soma (integrates inputs), axon (transmits output), and synapses (connection strength between neurons). A neuron fires an action potential when the total input exceeds a threshold.

Biological → Artificial mapping

Dendrites → input connections (x)
Synapse strength → weights (w)
Soma integration → weighted sum \(z = w^\top x + b\)
Action potential → activation function \(a = \sigma(z)\)
Axon → output signal

Exam-ready points

McCulloch & Pitts (1943) proposed the first mathematical model of a neuron.
Hebb's rule (1949): "Neurons that fire together, wire together" — precursor to weight learning.

2. Computational Unit & Activation Functions

The artificial neuron computes: \(a = f\!\left(\sum_{j} w_j x_j + b\right)\) where \(f\) is the activation function.

Activation	Formula	Range	Note
Sigmoid	\(\frac{1}{1+e^{-z}}\)	(0,1)	Saturates → vanishing grad
Tanh	\(\frac{e^z - e^{-z}}{e^z+e^{-z}}\)	(-1,1)	Zero-centred; still saturates
ReLU	\(\max(0,z)\)	[0,∞)	Default for hidden layers; dying ReLU risk
Leaky ReLU	\(\max(\alpha z, z)\)	(-∞,∞)	Fixes dying ReLU (\(\alpha\)≈0.01)
Softmax	\(\frac{e^{z_k}}{\sum_j e^{z_j}}\)	(0,1)	Output layer for multi-class

Exam-ready points

Without non-linear activations, a deep network collapses to a single linear transform.
ReLU's gradient is 1 for \(z>0\), 0 otherwise — sparse activation, fast to compute.
Dying ReLU: a neuron stuck at 0 because \(z<0\) always → use He initialisation or Leaky ReLU.

3. Model Capacity and Depth

Universal Approximation Theorem (Cybenko, 1989; Hornik, 1991): A single hidden layer with enough neurons can approximate any continuous function on a compact domain. However, the required width may be exponential — depth is exponentially more efficient.

Depth provides compositional representations: early layers detect edges, middle layers detect shapes, deep layers detect objects. This matches how the visual cortex is organised.

Exam-ready points

Capacity ≈ number of distinct functions the model can represent.
High capacity → risk of overfitting on small data.
Depth (not just width) is the key advantage of DL: \(O(\log n)\) layers vs. \(O(2^n)\) neurons for shallow.

Worked Example — Comparing activations numerically

import numpy as np
z = np.array([-2, -1, 0, 1, 2])
sigmoid = 1 / (1 + np.exp(-z))   # [0.119, 0.269, 0.5, 0.731, 0.881]
tanh    = np.tanh(z)              # [-0.964,-0.762, 0,  0.762, 0.964]
relu    = np.maximum(0, z)        # [0,     0,      0,  1,     2    ]
print("Sigmoid:", sigmoid.round(3))
print("Tanh:   ", tanh.round(3))
print("ReLU:   ", relu)

Exercises

Draw the biological neuron and label its parts. Map each part to the artificial neuron.
Compute sigmoid, tanh, and ReLU for inputs {-3, 0, 3}.
Explain why a network without non-linear activations is equivalent to a single-layer model.

Viva Questions

What is the vanishing gradient problem and which activations cause it?
Why is ReLU preferred over sigmoid for hidden layers?
State the Universal Approximation Theorem.
What is the dying ReLU problem and how is it fixed?
Why do deeper networks need less total neurons than shallow ones to represent the same function?