Era 09 / 15 · Connectionism Returns 1986

Backpropagation

Nets learned many layers by sending error backward.

Beat 1 · Concrete

A ball finds the valley

Loss is height. Follow the slope downhill and you settle at the lowest point.

Gradient descent on a loss landscape A curved surface; a ball has rolled from a high starting point down into the lowest valley, where it rests. minimum

coral = high loss (far) chartreuse = settled at the minimum teal = lowest reachable loss

Beat 2 · Abstract

Error flows backward

Predict forward, then push the error right→left, nudging every weight on the way.

A multilayer network doing a forward pass then backpropagation Two input nodes connect to three hidden nodes to one output node. A teal signal travels left to right to make a prediction; then a coral error signal travels right to left, and each connection's weight (its thickness) is nudged as the error passes through it. forward · predict backward · send error

teal = forward prediction coral = error flowing back sand = weights (thickness)

Beat 3 · Interactive

Train it to do XOR

Step the gradient and watch loss fall to rest — learning the problem one perceptron couldn't.

Training loss descending to a minimum on XOR A loss curve drops from a high coral value down to a low chartreuse value at the teal minimum line; all four XOR cases end solved. loss minimum
loss · epoch 0

coral = loss (wrong) chartreuse = solved / settled teal = minimum loss

Footnotes & lineage
1986
Rumelhart, Hinton & Williams
"Learning representations by back-propagating errors" popularised the method and reignited connectionism.
The engine
The chain rule
Backprop is just calculus' chain rule applied layer by layer to assign blame for the error to every weight.
Era 05 callback
Hidden layers solve XOR
A single perceptron can't separate XOR. Add a hidden layer trained by backprop and the wall falls.