Skip to content

Commit c0a74f0

Browse files
jstacclaude
andauthored
Improve clarity and fix typos in need_for_speed lecture (#523)
Fix typos (oursives, stray "for", missing period, "in" vs "is") and rewrite the parallelization and hardware accelerator sections for precision: sharpen multiprocessing vs multithreading distinction, add concrete examples, explain what a "core" is, and streamline GPU coverage. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 94dd7d2 commit c0a74f0

File tree

1 file changed

+48
-105
lines changed

1 file changed

+48
-105
lines changed

lectures/need_for_speed.md

Lines changed: 48 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -79,11 +79,11 @@ We need Python's scientific libraries for two reasons:
7979
1. Python is small
8080
2. Python is slow
8181

82-
**Python in small**
82+
**Python is small**
8383

8484
Core python is small by design -- this helps with optimization, accessibility, and maintenance
8585

86-
Scientific libraries provide the routines we don't want to -- and probably shouldn't -- write oursives
86+
Scientific libraries provide the routines we don't want to -- and probably shouldn't -- write ourselves
8787

8888
* numerical integration, interpolation, linear algebra, root finding, etc.
8989

@@ -127,37 +127,17 @@ Here's how they fit together:
127127
We will discuss all of these libraries at length in this lecture series.
128128

129129

130-
## Pure Python is slow
130+
## Why is Pure Python Slow?
131131

132-
As mentioned above, one major attraction of the scientific libraries is greater execution speeds.
132+
As mentioned above, numerical code written in pure Python is relatively slow.
133133

134-
We will discuss how scientific libraries can help us accelerate code.
134+
Let's try to understand what's driving slow execution speeds.
135135

136-
For this topic, it will be helpful if we understand what's driving slow execution speeds.
136+
### Type Checking
137137

138+
One source of overhead in pure Python operations is type checking.
138139

139-
### High vs low level code
140-
141-
Higher-level languages like Python are optimized for humans.
142-
143-
This means that the programmer can leave many details to the runtime environment
144-
145-
* specifying variable types
146-
* memory allocation/deallocation
147-
* etc.
148-
149-
In addition, pure Python is run by an [interpreter](https://en.wikipedia.org/wiki/Interpreter_(computing)), which executes code statement-by-statement.
150-
151-
This makes Python flexible, interactive, easy to write, easy to read, and relatively easy to debug.
152-
153-
On the other hand, the standard implementation of Python (called CPython) cannot
154-
match the speed of compiled languages such as C or Fortran.
155-
156-
157-
### Where are the bottlenecks?
158-
159-
Why is this the case?
160-
140+
Let's try to understand the issues.
161141

162142
#### Dynamic typing
163143

@@ -245,7 +225,7 @@ To illustrate, let's consider the problem of summing some data --- say, a collec
245225
In C or Fortran, an array of integers is stored in a single contiguous block of memory
246226

247227
* For example, a 64 bit integer is stored in 8 bytes of memory.
248-
* An array of $n$ such integers occupies $8n$ *consecutive* memory slots.
228+
* An array of $n$ such integers occupies $8n$ consecutive bytes.
249229

250230
Moreover, the data type is known at compile time.
251231

@@ -336,7 +316,7 @@ The idea of vectorization dates back to MATLAB, which uses vectorization extensi
336316
NumPy uses a similar model, inspired by MATLAB
337317

338318

339-
### Vectorization vs for pure Python loops
319+
### Vectorization vs pure Python loops
340320

341321
Let's try a quick speed comparison to illustrate how vectorization can
342322
accelerate code.
@@ -431,61 +411,45 @@ Let's review the two main kinds of CPU-based parallelization commonly used in
431411
scientific computing and discuss their pros and cons.
432412

433413

434-
#### Multiprocessing
435-
436-
Multiprocessing means concurrent execution of multiple threads of logic using more than one processor.
437-
438-
Multiprocessing can be carried out on one machine with multiple CPUs or on a
439-
cluster of machines connected by a network.
440-
441-
With multiprocessing, *each process has its own memory space*, although the physical memory chip might be shared.
414+
#### Multithreading
442415

416+
Multithreading means running multiple threads of execution within a single process.
443417

444-
#### Multithreading
418+
All threads share the same memory space, so they can read from and write to the same arrays without copying data.
445419

446-
Multithreading is similar to multiprocessing, except that, during execution, the
447-
threads all *share the same memory space*.
420+
For example, when a numerical operation on a large array runs on a modern laptop, the workload can be split across the machine's multiple CPU cores, with each core handling a portion of the array.
448421

422+
```{note}
449423
Native Python struggles to implement multithreading due to some [legacy design
450424
features](https://wiki.python.org/moin/GlobalInterpreterLock).
451-
452425
But this is not a restriction for scientific libraries like NumPy and Numba.
453-
454-
Functions imported from these libraries and JIT-compiled code run in low level
426+
Functions imported from these libraries and JIT-compiled code run in low-level
455427
execution environments where Python's legacy restrictions don't apply.
428+
```
456429

457430

458-
#### Advantages and Disadvantages
431+
#### Multiprocessing
459432

460-
Multithreading is more lightweight because most system and memory resources
461-
are shared by the threads.
433+
Multiprocessing means running multiple independent processes, each with its own separate memory space.
462434

463-
In addition, the fact that multiple threads all access a shared pool of memory
464-
is extremely convenient for numerical programming.
435+
Because memory is not shared, processes communicate by passing data between them.
465436

466-
On the other hand, multiprocessing is more flexible and can be distributed
467-
across clusters.
437+
Multiprocessing can run on a single machine or be distributed across a cluster of machines connected by a network.
468438

469-
For the great majority of what we do in these lectures, multithreading will
470-
suffice.
471439

440+
#### Which Should We Use?
472441

473-
### Hardware Accelerators
442+
For numerical work on a single machine, multithreading is usually preferred --- it is lightweight and the shared memory model is very convenient.
474443

475-
While CPUs with multiple cores have become standard for parallel computing, a
476-
more dramatic shift has occurred with the rise of specialized hardware
477-
accelerators.
444+
Multiprocessing becomes important when scaling beyond a single machine.
478445

479-
These accelerators are designed specifically for the kinds of highly parallel
480-
computations that arise in scientific computing, machine learning, and data
481-
science.
446+
For the great majority of what we do in these lectures, multithreading will suffice.
482447

483-
#### GPUs and TPUs
484448

485-
The two most important types of hardware accelerators are
449+
### Hardware Accelerators
486450

487-
* **GPUs** (Graphics Processing Units) and
488-
* **TPUs** (Tensor Processing Units).
451+
A more dramatic source of parallelism comes from specialized hardware
452+
accelerators, particularly **GPUs** (Graphics Processing Units).
489453

490454
GPUs were originally designed for rendering graphics, which requires performing
491455
the same operation on many pixels simultaneously.
@@ -494,63 +458,42 @@ the same operation on many pixels simultaneously.
494458
:scale: 40
495459
```
496460

497-
Scientists and engineers realized that this same architecture --- many simple
498-
processors working in parallel --- is ideal for scientific computing tasks
499-
500-
TPUs are a more recent development, designed by Google specifically for machine learning workloads.
501-
502-
Like GPUs, TPUs excel at performing massive numbers of matrix operations in parallel.
503-
504-
505-
#### Why TPUs/GPUs Matter
506-
507-
The performance gains from using hardware accelerators can be dramatic.
461+
This architecture --- thousands of simple cores executing the same instruction
462+
on different data points --- turns out to be ideal for scientific computing.
508463

509-
For example, a modern GPU can contain thousands of small processing cores,
510-
compared to the 8-64 cores typically found in CPUs.
464+
```{note}
465+
A **core** is an independent processing unit within a chip --- a circuit that
466+
can execute instructions on its own. A CPU typically has a small number of
467+
powerful cores, each capable of handling complex sequences of operations. A GPU
468+
instead packs thousands of smaller, simpler cores, each designed to perform
469+
basic arithmetic operations. The GPU's power comes from having all of these
470+
cores work on different pieces of the same problem simultaneously.
471+
```
511472

512-
When a problem can be expressed as many independent operations on arrays of
473+
When a computation can be expressed as independent operations on large arrays of
513474
data, GPUs can be orders of magnitude faster than CPUs.
514475

515-
This is particularly relevant for scientific computing because many algorithms
516-
naturally map onto the parallel architecture of GPUs.
517-
518-
519-
### Single GPUs vs GPU Servers
520-
521-
There are two common ways to access GPU resources:
476+
**TPUs** (Tensor Processing Units), designed by Google for machine learning,
477+
follow a similar philosophy, optimizing for massive parallel matrix operations.
522478

523-
#### Single GPU Systems
524479

525-
Many workstations and laptops now come with capable GPUs, or can be equipped with them.
480+
### Accessing GPU Resources
526481

527-
A single modern GPU can dramatically accelerate many scientific computing tasks.
528-
529-
For individual researchers and small projects, a single GPU is often sufficient.
482+
Many workstations and laptops now come with capable GPUs, and a single modern
483+
GPU is often sufficient for individual research projects.
530484

531485
Modern Python libraries like JAX, discussed extensively in this lecture series,
532486
automatically detect and use available GPUs with minimal code changes.
533487

534-
535-
#### Multi-GPU Servers
536-
537-
For larger-scale problems, servers containing multiple GPUs (often 4-8 GPUs per server) are increasingly common.
488+
For larger-scale problems, multi-GPU servers (often 4--8 GPUs per machine) are
489+
increasingly common.
538490

539491
```{figure} /_static/lecture_specific/need_for_speed/dgx.png
540492
:scale: 40
541493
```
542494

543-
544-
With appropriate software, computations can be distributed across multiple GPUs, either within a single server or across multiple servers.
545-
546-
This enables researchers to tackle problems that would be infeasible on a single GPU or CPU.
547-
548-
549-
### Summary
550-
551-
GPU computing is becoming far more accessible, particularly from within Python.
552-
553-
Some Python scientific libraries, like JAX, now support GPU acceleration with minimal changes to existing code.
495+
With appropriate software, computations can be distributed across multiple GPUs,
496+
either within a single server or across a cluster.
554497

555498
We will explore GPU computing in more detail in later lectures, applying it to a
556499
range of economic applications.

0 commit comments

Comments
 (0)