Chapter 4 — Biological Neuron → Computational Unit
Chapter 4 — Biological Neuron → Computational Unit
Unit II · Introduction to Neural Networks
1. Biological Neuron
A biological neuron has: dendrites (receive signals), cell body / soma (integrates inputs), axon (transmits output), and synapses (connection strength between neurons). A neuron fires an action potential when the total input exceeds a threshold.
Synapse strength → weights (w)
Soma integration → weighted sum \(z = w^\top x + b\)
Action potential → activation function \(a = \sigma(z)\)
Axon → output signal
- McCulloch & Pitts (1943) proposed the first mathematical model of a neuron.
- Hebb's rule (1949): "Neurons that fire together, wire together" — precursor to weight learning.
2. Computational Unit & Activation Functions
The artificial neuron computes: \(a = f\!\left(\sum_{j} w_j x_j + b\right)\) where \(f\) is the activation function.
| Activation | Formula | Range | Note |
|---|---|---|---|
| Sigmoid | \(\frac{1}{1+e^{-z}}\) | (0,1) | Saturates → vanishing grad |
| Tanh | \(\frac{e^z - e^{-z}}{e^z+e^{-z}}\) | (-1,1) | Zero-centred; still saturates |
| ReLU | \(\max(0,z)\) | [0,∞) | Default for hidden layers; dying ReLU risk |
| Leaky ReLU | \(\max(\alpha z, z)\) | (-∞,∞) | Fixes dying ReLU (\(\alpha\)≈0.01) |
| Softmax | \(\frac{e^{z_k}}{\sum_j e^{z_j}}\) | (0,1) | Output layer for multi-class |
- Without non-linear activations, a deep network collapses to a single linear transform.
- ReLU's gradient is 1 for \(z>0\), 0 otherwise — sparse activation, fast to compute.
- Dying ReLU: a neuron stuck at 0 because \(z<0\) always → use He initialisation or Leaky ReLU.
3. Model Capacity and Depth
Universal Approximation Theorem (Cybenko, 1989; Hornik, 1991): A single hidden layer with enough neurons can approximate any continuous function on a compact domain. However, the required width may be exponential — depth is exponentially more efficient.
Depth provides compositional representations: early layers detect edges, middle layers detect shapes, deep layers detect objects. This matches how the visual cortex is organised.
Exam-ready points- Capacity ≈ number of distinct functions the model can represent.
- High capacity → risk of overfitting on small data.
- Depth (not just width) is the key advantage of DL: \(O(\log n)\) layers vs. \(O(2^n)\) neurons for shallow.
Worked Example — Comparing activations numerically
import numpy as np
z = np.array([-2, -1, 0, 1, 2])
sigmoid = 1 / (1 + np.exp(-z)) # [0.119, 0.269, 0.5, 0.731, 0.881]
tanh = np.tanh(z) # [-0.964,-0.762, 0, 0.762, 0.964]
relu = np.maximum(0, z) # [0, 0, 0, 1, 2 ]
print("Sigmoid:", sigmoid.round(3))
print("Tanh: ", tanh.round(3))
print("ReLU: ", relu)
Exercises
- Draw the biological neuron and label its parts. Map each part to the artificial neuron.
- Compute sigmoid, tanh, and ReLU for inputs {-3, 0, 3}.
- Explain why a network without non-linear activations is equivalent to a single-layer model.
Viva Questions
- What is the vanishing gradient problem and which activations cause it?
- Why is ReLU preferred over sigmoid for hidden layers?
- State the Universal Approximation Theorem.
- What is the dying ReLU problem and how is it fixed?
- Why do deeper networks need less total neurons than shallow ones to represent the same function?