Chapter 1

Introduction to Distributed Algorithms & MPI

Unlock the power of supercomputing. Learn how to break massive problems into smaller pieces and solve them simultaneously using Message Passing Interface.

The Vocabulary of Speed

Process

An independent instance of a program running on a computer. In MPI, we often call these "tasks". Processes have their own separate memory space.

Thread

A "lightweight process". Multiple threads exist within a single process and share the same memory address space. Think of them as workers in the same room.

Parallelism

Breaking a problem into discrete parts that can be solved concurrently. Instructions from each part execute simultaneously on different CPUs.

Serial vs. Parallel Execution

Serial Computing

Task A
Task B
Task C

One instruction at a time. The CPU must finish A before starting B.

Parallel Computing

Core 1
Task A
Core 2
Task B
Core 3
Task C

Simultaneous execution. Multiple resources solving sub-problems at the exact same moment.

Hardware Architectures

How memory is organized dictates how we write code. There are two main paradigms.

1. Shared Memory

Global Memory (RAM) CPU 1 CPU 2 CPU 3
  • Like a team working in one room with a shared whiteboard.
  • All processors access the same global memory address space.
  • Fast data sharing, but hard to scale (memory bottlenecks).
  • Example: Your laptop (Multi-core CPU).

2. Distributed Memory

CPU Mem Network
  • Like a team in separate buildings communicating via phone.
  • Each processor has its own local memory.
  • Requires "Message Passing" to move data.
  • Highly scalable (Supercomputers).

Programming Models: The Right Tool for the Job

Feature OpenMP (Shared Memory) MPI (Message Passing)
Communication Implicit (via shared variables) Explicit (via send/receive functions)
Primary Usage Loop parallelization on a single node Scalable applications across clusters
Difficulty Easier (Incremental parallelization) Steeper (Requires explicit data mgmt)
Hardware Shared Memory Only Both Distributed & Shared
S

Amdahl's Law: The Speed Limit

No matter how many processors you add, your speedup is limited by the serial part of your program.

Speedup = 1 / ( (1-P) + P/N )
P = Parallel Portion
N = Number of Processors

Takeaway: If 10% of your code must run serially (e.g., Input/Output, startup), the maximum theoretical speedup is only 10x, even with infinite processors!

Knowledge Check

Test your understanding of the concepts.