Era 12 / 15 · The Deep Learning Revolution 2012

The Deep Learning Revolution

Deep nets + GPUs + ImageNet — error fell off a cliff.

Beat 1 · Concrete

Seeing, layer by layer

A pixel image becomes a label by composing features — edges build textures build parts build “cat”.

A feature hierarchy activating from input image to label Five stacked stages — input image, edges, textures, parts, and the label “cat” — light up in sequence, each activating from the stage before it. Depth is the composition of features. INPUT EDGES TEXTURES PARTS LABEL “cat” “cat”

Beat 2 · Abstract

The cliff at 2012

ImageNet error sat high for years, then AlexNet dropped it sharply — and it kept falling past human level.

ImageNet classification-error timeline, 2010 to 2017 Top-5 classification error is roughly flat near 26 percent through 2011, then falls sharply at 2012 when AlexNet arrives, continuing down past the human level of about 5 percent by 2015 and reaching roughly 2 percent by 2017. 0% 10% 20% 30% classification error human ≈ 5% 2012 · AlexNet the cliff 2010 2011 2012 2013 2014 2015 2016 2017

Beat 3 · Interactive

Reveal the depth

Pick an image, then reveal one layer at a time — watch the features fire toward the correct label.

An interactive feed-forward pipeline from image to label A schematic network: the chosen image feeds three hidden layers of three units each, then three candidate labels — cat, dog, car. Revealing layers in order lights the units; at full depth the correct label is selected. INPUT LAYER 1 LAYER 2 LAYER 3 LABEL cat cat dog dog car car

Image “cat” — 0 of 3 layers revealed.

Footnotes — the three things that lined up

2012

AlexNet

Krizhevsky, Sutskever & Hinton won ILSVRC by a landslide with a deep convolutional net — the result that convinced the field depth wins.

Fuel

ImageNet + GPUs

A million labelled images gave the data; two consumer GPUs gave the compute. Scale that was finally large enough met an architecture finally deep enough.

Tricks

ReLU & dropout

ReLU activations let gradients flow through many layers; dropout fought overfitting. Small ideas that made deep training actually trainable.