Unit II — Introduction to Neural Networks

Chapter 5 — McCulloch–Pitts, Threshold Logic, Perceptron

Framework Fill the placeholders below with your full content.

Chapter 5 — McCulloch–Pitts, Threshold Logic & Perceptron

Unit II · Introduction to Neural Networks

Objectives

Understand the McCulloch–Pitts neuron · Implement Boolean logic gates as threshold units · Derive and apply the Perceptron learning rule

1. McCulloch–Pitts (MCP) Unit

The MCP neuron (1943) is the simplest computational neuron. It has binary inputs \(x_i \in \{0,1\}\), fixed weights \(w_i \in \{-1,+1\}\) (excitatory/inhibitory), and fires (output = 1) if the weighted sum exceeds threshold \(\theta\):

\[y = \begin{cases}1 & \sum_i w_i x_i \ge \theta \\ 0 & \text{otherwise}\end{cases}\]

MCP units can compute Boolean functions. AND: weights = 1, threshold = n (number of inputs). OR: weights = 1, threshold = 1. NOT: inhibitory weight -1.

Limitation

Weights and threshold are fixed — MCP cannot learn. XOR cannot be computed by a single MCP unit.

Exam-ready points

AND gate: \(y=1\) iff all inputs are 1.
OR gate: \(y=1\) iff at least one input is 1.
NAND is functionally complete — any Boolean function can be expressed using only NAND gates.

2. Thresholding Logic (TLU)

A Threshold Logic Unit generalises MCP by allowing real-valued weights. It represents a halfspace (linear threshold function). Any linear threshold function can be expressed as: \(y = H(w^\top x - \theta)\) where \(H\) is the Heaviside step function.

The set of all functions computable by a single TLU is the set of linearly separable Boolean functions — these are a strict subset of all Boolean functions (XOR is not linearly separable).

Exam-ready points

For \(n\) binary inputs, there are \(2^{2^n}\) Boolean functions but only \(O(2^{n^2})\) are linearly separable.
Bias term \(b = -\theta\) absorbs the threshold: \(y = H(w^\top x + b)\).

3. Perceptron

Rosenblatt's Perceptron (1958) adds learning to the TLU. It uses the sign activation and a simple update rule:

\[w \leftarrow w + \eta\,(y - \hat{y})\,x\]

where \(\eta\) is the learning rate, \(y\) is the true label (\(\pm 1\)), and \(\hat{y} = \text{sign}(w^\top x)\).

Perceptron Convergence Theorem: If data is linearly separable, the perceptron will converge to a separating hyperplane in a finite number of steps.

def perceptron_train(X, y, lr=0.1, epochs=100):
    w = np.zeros(X.shape[1]); b = 0
    for _ in range(epochs):
        for xi, yi in zip(X, y):
            y_hat = np.sign(w @ xi + b)
            if y_hat != yi:
                w += lr * yi * xi
                b += lr * yi
    return w, b

Exam-ready points

Perceptron update: only triggers on misclassified samples.
Does NOT converge on non-linearly separable data.
Multi-layer Perceptron (MLP) solves XOR by stacking layers with non-linear activations.

Worked Example — Perceptron learns AND

import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]]); y = np.array([-1,-1,-1,1])
w, b, lr = np.zeros(2), 0.0, 0.1
for epoch in range(20):
    for xi, yi in zip(X, y):
        if np.sign(w @ xi + b) != yi:
            w += lr * yi * xi; b += lr * yi
print(f"w={w}, b={b:.2f}")
# Verify: all correctly classified after convergence

Exercises

Design an MCP network (two-layer) that computes XOR.
Apply the Perceptron learning rule step by step for the OR function starting from \(w=[0,0], b=0\).
State the Perceptron Convergence Theorem and its conditions.

Viva Questions

How does the Perceptron differ from the MCP neuron?
What is the Perceptron Convergence Theorem?
Why can a single perceptron not learn XOR?
What role does the bias play in the perceptron model?
How did Minsky and Papert's 1969 book affect neural network research?