Lecture 3-4: Deep Structured Learning 🧠

1. Why Go Deeper? 🤔

Slides 1-3

Last time, we reached 98.5% accuracy on MNIST using a standard Neural Network (MLP) with many tricks. However, simply adding more layers makes training difficult due to unstable gradients.

The Challenge: Standard networks ignore the spatial structure of images (we flattened 28x28 into 784). To go further, we need architectures that understand structure.

2. Convolutional Neural Networks (CNN) 🖼️

Slides 4-13

Key Concepts

Local Receptive Fields: Neurons only look at a small window (e.g., 5x5) of the input, not the whole image.
Shared Weights: The same filter is used across the entire image to detect features (edges, curves) anywhere.
Pooling: Simplifies the information (e.g., Max Pooling takes the largest value in a 2x2 grid).

→

Feature Map

Implementation: CNN on MNIST

Variations

3. Recurrent Neural Networks (RNN) 🔁

Slides 14-25

The Problem with Time

Standard networks have no memory. They process inputs individually. RNNs have loops, allowing information to persist.

Issue: Vanishing Gradient

As information loops back over many time steps, gradients multiply by small weights and vanish.

The Solution: LSTM

Long Short-Term Memory units have gates to control information flow:

Forget Gate: What to throw away.
Input Gate: What to store.
Output Gate: What to output.

4. The AI Composer 🎵

Slides 26-44

Can we train an LSTM on MIDI files to compose new music? The model predicts the next note/chord given a sequence of previous notes.

Utility: midi_phraser.py

Step 1: Decoding MIDI Files

Step 2: Preparing Data (Final Fantasy 7)

Step 3: LSTM Model & Generation

Mozart Violin Concertos 🎻

Utility: Checking Chords

5. Generative Adversarial Networks (GANs) 🎨

Slides 45-56

Two networks competing against each other:
Generator: Tries to create fake data to fool the discriminator.
Discriminator: Tries to tell real data from fake data.

Deep Structured Learning