Skip to content

Commit 21f1ea0

Browse files
authored
Misc edits to need for speed and related lectures (#521)
* Misc edits to need for speed * misc
1 parent 7de88b0 commit 21f1ea0

File tree

2 files changed

+84
-132
lines changed

2 files changed

+84
-132
lines changed

lectures/need_for_speed.md

Lines changed: 44 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ premature optimization is the root of all evil." -- Donald Knuth
2727

2828
## Overview
2929

30-
It's probably safe to say that Python is the most popular language for scientific computing.
30+
Python is the most popular language for many aspects of scientific computing.
3131

3232
This is due to
3333

@@ -74,29 +74,30 @@ Let's briefly review Python's scientific libraries.
7474

7575
### Why do we need them?
7676

77-
One reason we use scientific libraries is because they implement routines we want to use.
77+
We need Python's scientific libraries for two reasons:
7878

79-
* numerical integration, interpolation, linear algebra, root finding, etc.
79+
1. Python is small
80+
2. Python is slow
81+
82+
**Python in small**
8083

81-
For example, it's usually better to use an existing routine for root finding than to write a new one from scratch.
84+
Core python is small by design -- this helps with optimization, accessibility, and maintenance
8285

83-
(For standard algorithms, efficiency is maximized if the community can
84-
coordinate on a common set of implementations, written by experts and tuned by
85-
users to be as fast and robust as possible!)
86+
Scientific libraries provide the routines we don't want to -- and probably shouldn't -- write oursives
8687

87-
But this is not the only reason that we use Python's scientific libraries.
88+
* numerical integration, interpolation, linear algebra, root finding, etc.
8889

89-
Another is that pure Python is not fast.
90+
**Python is slow**
9091

91-
So we need libraries that are designed to accelerate execution of Python code.
92+
Another reason we need the scientific libraries is that pure Python is relatively slow.
9293

93-
They do this using two strategies:
94+
Scientific libraries accelerate execution using three main strategies:
9495

95-
1. using compilers that convert Python-like statements into fast machine code for individual threads of logic and
96-
2. parallelizing tasks across multiple "workers" (e.g., CPUs, individual threads inside GPUs).
96+
1. Vectorization: providing compiled machine code and interfaces that make this code accessible
97+
1. JIT compilation: compilers that convert Python-like statements into fast machine code at runtime
98+
2. Parallelization: Shifting tasks across multiple threads/ CPUs / GPUs /TPUs
9799

98-
We will discuss these ideas extensively in this and the remaining lectures from
99-
this series.
100+
We will discuss these ideas in depth below.
100101

101102

102103
### Python's Scientific Ecosystem
@@ -123,7 +124,7 @@ Here's how they fit together:
123124
* Pandas provides types and functions for manipulating data.
124125
* Numba provides a just-in-time compiler that plays well with NumPy and helps accelerate Python code.
125126

126-
We will discuss all of these libraries extensively in this lecture series.
127+
We will discuss all of these libraries at length in this lecture series.
127128

128129

129130
## Pure Python is slow
@@ -189,15 +190,13 @@ a, b = ['foo'], ['bar']
189190
a + b
190191
```
191192

192-
(We say that the operator `+` is *overloaded* --- its action depends on the
193-
type of the objects on which it acts)
194193

195-
As a result, when executing `a + b`, Python must first check the type of the objects and then call the correct operation.
194+
As a result, when executing `a + b`, Python must first check the type of the
195+
objects and then call the correct operation.
196196

197-
This involves a nontrivial overhead.
197+
This involves overhead.
198198

199-
If we repeatedly execute this expression in a tight loop, the nontrivial
200-
overhead becomes a large overhead.
199+
If we repeatedly execute this expression in a tight loop, the overhead becomes large.
201200

202201

203202
#### Static types
@@ -243,38 +242,29 @@ To illustrate, let's consider the problem of summing some data --- say, a collec
243242

244243
#### Summing with Compiled Code
245244

246-
In C or Fortran, these integers would typically be stored in an array, which
247-
is a simple data structure for storing homogeneous data.
245+
In C or Fortran, an array of integers is stored in a single contiguous block of memory
248246

249-
Such an array is stored in a single contiguous block of memory
250-
251-
* In modern computers, memory addresses are allocated to each byte (one byte = 8 bits).
252247
* For example, a 64 bit integer is stored in 8 bytes of memory.
253248
* An array of $n$ such integers occupies $8n$ *consecutive* memory slots.
254249

255-
Moreover, the compiler is made aware of the data type by the programmer.
256-
257-
* In this case 64 bit integers
250+
Moreover, the data type is known at compile time.
258251

259252
Hence, each successive data point can be accessed by shifting forward in memory
260253
space by a known and fixed amount.
261254

262-
* In this case 8 bytes
263255

264256
#### Summing in Pure Python
265257

266258
Python tries to replicate these ideas to some degree.
267259

268-
For example, in the standard Python implementation (CPython), list elements are placed in memory locations that are in a sense contiguous.
260+
For example, in the standard Python implementation (CPython), list elements are
261+
placed in memory locations that are in a sense contiguous.
269262

270263
However, these list elements are more like pointers to data rather than actual data.
271264

272265
Hence, there is still overhead involved in accessing the data values themselves.
273266

274-
This is a considerable drag on speed.
275-
276-
In fact, it's generally true that memory traffic is a major culprit when it comes to slow execution.
277-
267+
Such overhead is a major culprit when it comes to slow execution.
278268

279269

280270
### Summary
@@ -295,15 +285,11 @@ synonymous with parallelization.
295285

296286
This task is best left to specialized compilers!
297287

298-
Certain Python libraries have outstanding capabilities for parallelizing scientific code -- we'll discuss this more as we go along.
299-
300-
301288

302289

303290
## Accelerating Python
304291

305-
In this section we look at three related techniques for accelerating Python
306-
code.
292+
In this section we look at three related techniques for accelerating Python code.
307293

308294
Here we'll focus on the fundamental ideas.
309295

@@ -325,10 +311,11 @@ Many economists usually refer to array programming as "vectorization."
325311
In computer science, this term has [a slightly different meaning](https://en.wikipedia.org/wiki/Automatic_vectorization).
326312
```
327313

328-
The key idea is to send array processing operations in batch to pre-compiled
329-
and efficient native machine code.
314+
The key idea is to send array processing operations in batch to pre-compiled and
315+
efficient native machine code.
330316

331-
The machine code itself is typically compiled from carefully optimized C or Fortran.
317+
The machine code itself is typically compiled from carefully optimized C or
318+
Fortran.
332319

333320
For example, when working in a high level language, the operation of inverting a
334321
large matrix can be subcontracted to efficient machine code that is pre-compiled
@@ -346,6 +333,7 @@ The idea of vectorization dates back to MATLAB, which uses vectorization extensi
346333
```{figure} /_static/lecture_specific/need_for_speed/matlab.png
347334
```
348335

336+
NumPy uses a similar model, inspired by MATLAB
349337

350338

351339
### Vectorization vs for pure Python loops
@@ -423,19 +411,17 @@ can be run) has slowed dramatically in recent years.
423411
Chip designers and computer programmers have responded to the slowdown by
424412
seeking a different path to fast execution: parallelization.
425413

426-
Hardware makers have increased the number of cores (physical CPUs) embedded in each machine.
414+
This involves
427415

428-
For programmers, the challenge has been to exploit these multiple CPUs by
429-
running many processes in parallel (i.e., simultaneously).
416+
1. increasing the number of CPUs embedded in each machine
417+
1. connecting hardware accelerators such as GPUs and TPUs
430418

431-
This is particularly important in scientific programming, which requires handling
432-
433-
* large amounts of data and
434-
* CPU intensive simulations and other calculations.
419+
For programmers, the challenge has been to exploit this hardware
420+
running many processes in parallel.
435421

436422
Below we discuss parallelization for scientific computing, with a focus on
437423

438-
1. the best tools for parallelization in Python and
424+
1. tools for parallelization in Python and
439425
1. how these tools can be applied to quantitative economic problems.
440426

441427

@@ -447,22 +433,18 @@ scientific computing and discuss their pros and cons.
447433

448434
#### Multiprocessing
449435

450-
Multiprocessing means concurrent execution of multiple processes using more than one processor.
451-
452-
In this context, a **process** is a chain of instructions (i.e., a program).
436+
Multiprocessing means concurrent execution of multiple threads of logic using more than one processor.
453437

454438
Multiprocessing can be carried out on one machine with multiple CPUs or on a
455-
collection of machines connected by a network.
439+
cluster of machines connected by a network.
456440

457-
In the latter case, the collection of machines is usually called a
458-
**cluster**.
441+
With multiprocessing, *each process has its own memory space*, although the physical memory chip might be shared.
459442

460-
With multiprocessing, each process has its own memory space, although the
461-
physical memory chip might be shared.
462443

463444
#### Multithreading
464445

465-
Multithreading is similar to multiprocessing, except that, during execution, the threads all share the same memory space.
446+
Multithreading is similar to multiprocessing, except that, during execution, the
447+
threads all *share the same memory space*.
466448

467449
Native Python struggles to implement multithreading due to some [legacy design
468450
features](https://wiki.python.org/moin/GlobalInterpreterLock).
@@ -472,6 +454,7 @@ But this is not a restriction for scientific libraries like NumPy and Numba.
472454
Functions imported from these libraries and JIT-compiled code run in low level
473455
execution environments where Python's legacy restrictions don't apply.
474456

457+
475458
#### Advantages and Disadvantages
476459

477460
Multithreading is more lightweight because most system and memory resources

0 commit comments

Comments
 (0)