Chapter 5 — McCulloch–Pitts, Threshold Logic, Perceptron
Chapter 5 — McCulloch–Pitts, Threshold Logic & Perceptron
Unit II · Introduction to Neural Networks
1. McCulloch–Pitts (MCP) Unit
The MCP neuron (1943) is the simplest computational neuron. It has binary inputs \(x_i \in \{0,1\}\), fixed weights \(w_i \in \{-1,+1\}\) (excitatory/inhibitory), and fires (output = 1) if the weighted sum exceeds threshold \(\theta\):
\[y = \begin{cases}1 & \sum_i w_i x_i \ge \theta \\ 0 & \text{otherwise}\end{cases}\]
MCP units can compute Boolean functions. AND: weights = 1, threshold = n (number of inputs). OR: weights = 1, threshold = 1. NOT: inhibitory weight -1.
- AND gate: \(y=1\) iff all inputs are 1.
- OR gate: \(y=1\) iff at least one input is 1.
- NAND is functionally complete — any Boolean function can be expressed using only NAND gates.
2. Thresholding Logic (TLU)
A Threshold Logic Unit generalises MCP by allowing real-valued weights. It represents a halfspace (linear threshold function). Any linear threshold function can be expressed as: \(y = H(w^\top x - \theta)\) where \(H\) is the Heaviside step function.
The set of all functions computable by a single TLU is the set of linearly separable Boolean functions — these are a strict subset of all Boolean functions (XOR is not linearly separable).
Exam-ready points- For \(n\) binary inputs, there are \(2^{2^n}\) Boolean functions but only \(O(2^{n^2})\) are linearly separable.
- Bias term \(b = -\theta\) absorbs the threshold: \(y = H(w^\top x + b)\).
3. Perceptron
Rosenblatt's Perceptron (1958) adds learning to the TLU. It uses the sign activation and a simple update rule:
\[w \leftarrow w + \eta\,(y - \hat{y})\,x\]
where \(\eta\) is the learning rate, \(y\) is the true label (\(\pm 1\)), and \(\hat{y} = \text{sign}(w^\top x)\).
Perceptron Convergence Theorem: If data is linearly separable, the perceptron will converge to a separating hyperplane in a finite number of steps.
def perceptron_train(X, y, lr=0.1, epochs=100):
w = np.zeros(X.shape[1]); b = 0
for _ in range(epochs):
for xi, yi in zip(X, y):
y_hat = np.sign(w @ xi + b)
if y_hat != yi:
w += lr * yi * xi
b += lr * yi
return w, b
Exam-ready points
- Perceptron update: only triggers on misclassified samples.
- Does NOT converge on non-linearly separable data.
- Multi-layer Perceptron (MLP) solves XOR by stacking layers with non-linear activations.
Worked Example — Perceptron learns AND
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]]); y = np.array([-1,-1,-1,1])
w, b, lr = np.zeros(2), 0.0, 0.1
for epoch in range(20):
for xi, yi in zip(X, y):
if np.sign(w @ xi + b) != yi:
w += lr * yi * xi; b += lr * yi
print(f"w={w}, b={b:.2f}")
# Verify: all correctly classified after convergence
Exercises
- Design an MCP network (two-layer) that computes XOR.
- Apply the Perceptron learning rule step by step for the OR function starting from \(w=[0,0], b=0\).
- State the Perceptron Convergence Theorem and its conditions.
Viva Questions
- How does the Perceptron differ from the MCP neuron?
- What is the Perceptron Convergence Theorem?
- Why can a single perceptron not learn XOR?
- What role does the bias play in the perceptron model?
- How did Minsky and Papert's 1969 book affect neural network research?