1. Why Go Deeper? π€
Slides 1-3Last time, we reached 98.5% accuracy on MNIST using a standard Neural Network (MLP) with many tricks. However, simply adding more layers makes training difficult due to unstable gradients.
The Challenge: Standard networks ignore the spatial structure of images (we flattened 28x28 into 784). To go further, we need architectures that understand structure.
2. Convolutional Neural Networks (CNN) πΌοΈ
Slides 4-13Key Concepts
- Local Receptive Fields: Neurons only look at a small window (e.g., 5x5) of the input, not the whole image.
- Shared Weights: The same filter is used across the entire image to detect features (edges, curves) anywhere.
- Pooling: Simplifies the information (e.g., Max Pooling takes the largest value in a 2x2 grid).
Implementation: CNN on MNIST
Variations
3. Recurrent Neural Networks (RNN) π
Slides 14-25The Problem with Time
Standard networks have no memory. They process inputs individually. RNNs have loops, allowing information to persist.
Issue: Vanishing Gradient
As information loops back over many time steps, gradients multiply by small weights and vanish.
The Solution: LSTM
Long Short-Term Memory units have gates to control information flow:
- Forget Gate: What to throw away.
- Input Gate: What to store.
- Output Gate: What to output.
4. The AI Composer π΅
Slides 26-44Can we train an LSTM on MIDI files to compose new music? The model predicts the next note/chord given a sequence of previous notes.
Utility: midi_phraser.py
Step 1: Decoding MIDI Files
Step 2: Preparing Data (Final Fantasy 7)
Step 3: LSTM Model & Generation
Mozart Violin Concertos π»
Utility: Checking Chords
5. Generative Adversarial Networks (GANs) π¨
Slides 45-56
Two networks competing against each other:
Generator: Tries to create fake data to fool the discriminator.
Discriminator: Tries to tell real data from fake data.
The Data: Font Characters
Simple GAN (Dense Layers)
Deep Convolutional GAN (DCGAN)
Using Conv2D Transpose (UpSampling) to generate images from noise.