NUMERICAL ANALYSIS

Deep Structured Learning

Going deeper with Convolutional Networks, Music Generation, and GANs.

1. Why Go Deeper? πŸ€”

Slides 1-3

Last time, we reached 98.5% accuracy on MNIST using a standard Neural Network (MLP) with many tricks. However, simply adding more layers makes training difficult due to unstable gradients.

The Challenge: Standard networks ignore the spatial structure of images (we flattened 28x28 into 784). To go further, we need architectures that understand structure.

2. Convolutional Neural Networks (CNN) πŸ–ΌοΈ

Slides 4-13

Key Concepts

  • Local Receptive Fields: Neurons only look at a small window (e.g., 5x5) of the input, not the whole image.
  • Shared Weights: The same filter is used across the entire image to detect features (edges, curves) anywhere.
  • Pooling: Simplifies the information (e.g., Max Pooling takes the largest value in a 2x2 grid).
β†’
Feature Map

Implementation: CNN on MNIST

Variations

3. Recurrent Neural Networks (RNN) πŸ”

Slides 14-25

The Problem with Time

Standard networks have no memory. They process inputs individually. RNNs have loops, allowing information to persist.

Issue: Vanishing Gradient

As information loops back over many time steps, gradients multiply by small weights and vanish.

The Solution: LSTM

Long Short-Term Memory units have gates to control information flow:

  • Forget Gate: What to throw away.
  • Input Gate: What to store.
  • Output Gate: What to output.

4. The AI Composer 🎡

Slides 26-44

Can we train an LSTM on MIDI files to compose new music? The model predicts the next note/chord given a sequence of previous notes.

Utility: midi_phraser.py

Step 1: Decoding MIDI Files

Step 2: Preparing Data (Final Fantasy 7)

Step 3: LSTM Model & Generation


Mozart Violin Concertos 🎻

Utility: Checking Chords

5. Generative Adversarial Networks (GANs) 🎨

Slides 45-56

Two networks competing against each other:
Generator: Tries to create fake data to fool the discriminator.
Discriminator: Tries to tell real data from fake data.

The Data: Font Characters

Simple GAN (Dense Layers)

Deep Convolutional GAN (DCGAN)

Using Conv2D Transpose (UpSampling) to generate images from noise.

🧠 Knowledge Check