Skip to content

Commit 4a6aae0

Browse files
apartsinclaude
andcommitted
WAVE 6 partial: review/audit fixes before rate limit
5 parallel agents applied agents #01, #04, #05, #10, #11, #20, #28: - 105 files modified with review/audit fixes - Learning objectives, factual checks, misconception callouts - Rate limit hit; WAVE 6 needs re-run for remaining files - 319 insertions, 249 deletions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4465e22 commit 4a6aae0

105 files changed

Lines changed: 319 additions & 249 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

front-matter/about-book.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,14 @@ <h2>At a Glance</h2>
2727

2828
<p>Whether you want to build your first RAG pipeline, ship an AI agent to production, or make strategic decisions about LLM adoption at your organization, this book meets you where you are. It is for software engineers, ML practitioners, researchers, product leaders, domain specialists, and educators who want to understand, build, and deploy systems powered by large language models. It assumes familiarity with Python and basic linear algebra; appendices cover the remaining prerequisites.</p>
2929

30-
<p>The book spans <strong>38 chapters</strong> in 11 parts, plus <strong>22 appendices</strong> (A through V) with framework tutorials, and a <a href="../capstone/index.html">capstone project</a>. For the full chapter map, dependency diagram, audience details, and background requirements, see <a href="section-fm.1a.html">FM.1: What This Book Covers</a>. Twenty tailored <a href="pathways/index.html">reading pathways</a> help you find the most relevant chapters for your goals.</p>
30+
<p>The book spans <strong>39 chapters</strong> (numbered 0 through 38) in 11 parts, plus <strong>22 appendices</strong> (A through V) with framework tutorials, and a <a href="../capstone/index.html">capstone project</a>. For the full chapter map, dependency diagram, audience details, and background requirements, see <a href="section-fm.1a.html">FM.1: What This Book Covers</a>. Twenty tailored <a href="pathways/index.html">reading pathways</a> help you find the most relevant chapters for your goals.</p>
3131

3232
<!-- ============================================================ -->
3333
<!-- HOW THIS BOOK WAS CREATED -->
3434
<!-- ============================================================ -->
3535
<h2>How This Book Was Created</h2>
3636

37-
<p>This book was produced through a collaborative process between its human authors and a team of 42 specialized AI writing agents. The authors curated every chapter, validated all technical content, and made all editorial decisions; AI agents proposed initial drafts, generated code examples, created illustrations, and checked cross-references across the 38-chapter structure.</p>
37+
<p>This book was produced through a collaborative process between its human authors and a team of 42 specialized AI writing agents. The authors curated every chapter, validated all technical content, and made all editorial decisions; AI agents proposed initial drafts, generated code examples, created illustrations, and checked cross-references across the 39-chapter structure.</p>
3838

3939
<p>Many of the book's illustrations were produced using Google Gemini's image generation capabilities, with prompts crafted by the authors and refined through iterative feedback. All diagrams and SVG figures were either hand-coded or generated and reviewed for technical accuracy.</p>
4040

front-matter/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ <h1>Introduction, Pathways &amp; How to Use This Book</h1>
2727
<div class="overview">
2828
<h2>Overview of the Front Matter</h2>
2929
<p>
30-
Before you build anything, you need a map. This front matter orients you before you dive into the technical chapters. It answers four questions every reader has at the start: What does this book cover, and who is it for? How should I navigate 38 chapters and 11 parts given my background and goals? How can an instructor build a university course from this material? And what conventions, callout types, and recurring elements will I encounter on every page?
30+
Before you build anything, you need a map. This front matter orients you before you dive into the technical chapters. It answers four questions every reader has at the start: What does this book cover, and who is it for? How should I navigate 39 chapters and 11 parts given my background and goals? How can an instructor build a university course from this material? And what conventions, callout types, and recurring elements will I encounter on every page?
3131
</p>
3232
<p>
3333
Whether you plan to read cover to cover or jump straight to the chapters that match your role, spending 15 minutes here will save you hours of backtracking later. Each section below links to a dedicated page with full detail.

front-matter/section-fm.1a.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ <h1>What This Book Covers</h1>
2525
</blockquote>
2626

2727
<p>
28-
Six months from now, you will be building AI systems that did not exist when you started reading. This book is a comprehensive, practitioner-oriented guide to the entire Large Language Model stack. It begins with the mathematical and conceptual foundations of machine learning, moves through the architecture and training of transformers, and culminates in the design, deployment, and governance of production AI agent systems. The journey spans 38 chapters (numbered 0 through 38) organized into eleven parts, plus 22 appendices covering frameworks, tools, and reference material.
28+
Six months from now, you will be building AI systems that did not exist when you started reading. This book is a comprehensive, practitioner-oriented guide to the entire Large Language Model stack. It begins with the mathematical and conceptual foundations of machine learning, moves through the architecture and training of transformers, and culminates in the design, deployment, and governance of production AI agent systems. The journey spans 39 chapters (numbered 0 through 38) organized into eleven parts, plus 22 appendices covering frameworks, tools, and reference material.
2929
</p>
3030

3131
<p>

front-matter/section-fm.5.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ <h2>The Production Philosophy</h2>
4747
</p>
4848
<p>
4949
This is not a gimmick. It is a deliberate architectural decision that enables: rapid iteration
50-
(a chapter can be produced and revised in hours, not months), consistent quality across all 38
50+
(a chapter can be produced and revised in hours, not months), consistent quality across all 39
5151
chapters (every chapter passes through the same 22 quality stages), and deep cross-referencing
5252
(agents can read and reference the entire book while writing any single section).
5353
</p>
@@ -79,7 +79,7 @@ <h2>The Human Role</h2>
7979
While the AI agents produce the content, human oversight plays a critical role at several points:
8080
</p>
8181
<ul>
82-
<li><strong>Book architecture:</strong> The overall structure (11 parts, 38 chapters, section breakdown) was designed by a human author with input from the Curriculum Architect agent.</li>
82+
<li><strong>Book architecture:</strong> The overall structure (11 parts, 39 chapters, section breakdown) was designed by a human author with input from the Curriculum Architect agent.</li>
8383
<li><strong>Quality standards:</strong> The conformance checklist, callout types, page layout standards, and CSS design system were human-defined, then enforced by agents.</li>
8484
<li><strong>Editorial judgment:</strong> Major decisions about scope (what to include/exclude), tone (technical but accessible), and audience (engineers, researchers, students) were human decisions.</li>
8585
<li><strong>Review and iteration:</strong> Every chapter is reviewed by a human who can request revisions, flag inaccuracies, or redirect emphasis.</li>

front-matter/syllabi/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ <h3>How to Use This Section</h3>
2727
<h2>University Course Syllabi</h2>
2828

2929
<p>
30-
With 38 chapters, no single semester can cover everything. The following four syllabi are designed for instructors adopting this book for a single-semester (14-week) university course. Click any card for the complete week-by-week syllabus with hyperlinked chapter references.
30+
With 39 chapters, no single semester can cover everything. The following four syllabi are designed for instructors adopting this book for a single-semester (14-week) university course. Click any card for the complete week-by-week syllabus with hyperlinked chapter references.
3131
</p>
3232

3333
<div class="course-grid">

part-1-foundations/module-00-ml-pytorch-foundations/section-0.1.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ <h2>2. Supervised Learning: Classification and Regression <span class="level-bad
117117

118118
<h3>Regression: Predicting Numbers</h3>
119119

120-
<p>In <strong>regression</strong>, the output is a continuous number. Predicting house prices, stock returns, <a class="cross-ref" href="../module-05-decoding-text-generation/section-05.2.html">temperature</a>, or the probability that a user clicks an ad: all regression tasks. The model produces a numeric prediction, and we measure how far off it is from the true value.</p>
120+
<p>In <strong>regression</strong>, the output is a continuous number. Predicting house prices, stock returns, <a class="cross-ref" href="../module-05-decoding-text-generation/section-5.2.html">temperature</a>, or the probability that a user clicks an ad: all regression tasks. The model produces a numeric prediction, and we measure how far off it is from the true value.</p>
121121

122122
<h3>Classification: Predicting Categories</h3>
123123

@@ -143,7 +143,7 @@ <h3>The Supervised Learning Recipe</h3>
143143

144144
<div class="callout note">
145145
<div class="callout-title">Three Learning Paradigms</div>
146-
<p><strong>Supervised learning</strong> requires human labels (input-output pairs). <strong>Unsupervised learning</strong> finds patterns in data without labels (clustering, dimensionality reduction). <strong>Self-supervised learning</strong> creates its own labels from the data: mask a word and predict it (<a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-06.1.html">BERT</a>), or predict the next word from all previous words (GPT). This is how every large language model is pre-trained. It is the reason LLMs can learn from the entire internet without human annotation.</p>
146+
<p><strong>Supervised learning</strong> requires human labels (input-output pairs). <strong>Unsupervised learning</strong> finds patterns in data without labels (clustering, dimensionality reduction). <strong>Self-supervised learning</strong> creates its own labels from the data: mask a word and predict it (<a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-6.1.html">BERT</a>), or predict the next word from all previous words (GPT). This is how every large language model is pre-trained. It is the reason LLMs can learn from the entire internet without human annotation.</p>
147147
</div>
148148

149149
<h2>3. Loss Functions and Optimization <span class="level-badge intermediate" title="Intermediate">INTERMEDIATE</span></h2>
@@ -161,7 +161,7 @@ <h3>Loss Functions: Defining "Wrong"</h3>
161161

162162
<p>Squaring the errors does two things: it makes all errors positive (so they do not cancel out), and it penalizes large errors more severely than small ones. A prediction that is off by 10 contributes 100 to the loss, while one that is off by 1 contributes just 1.</p>
163163

164-
<p><strong>For classification</strong>, the standard is <strong><a class="cross-ref" href="../module-04-transformer-architecture/section-04.1.html">Cross-Entropy</a> Loss</strong>:</p>
164+
<p><strong>For classification</strong>, the standard is <strong><a class="cross-ref" href="../module-04-transformer-architecture/section-4.1.html">Cross-Entropy</a> Loss</strong>:</p>
165165

166166
<div class="math-block">
167167
$$\mathcal{L} = -\frac{1}{n} \sum y_{i} \log(p_{i})$$
@@ -373,7 +373,7 @@ <h3>L2 Regularization (Ridge / Weight Decay)</h3>
373373

374374
</div>
375375

376-
<p>The hyperparameter <span class="math">$\lambda$</span> controls the strength of the penalty. Large weights are penalized quadratically, which pushes all weights toward smaller values without forcing them to zero. This is the most common regularization in deep learning, where it is called <strong><a class="cross-ref" href="section-0.2.html">weight decay</a></strong>. You will see weight decay appear again as a critical hyperparameter when <a class="cross-ref" href="../../part-4-training-adapting/module-14-fine-tuning-fundamentals/section-14.3.html">tuning fine-tuning hyperparameters in Chapter 13</a>.</p>
376+
<p>The hyperparameter <span class="math">$\lambda$</span> controls the strength of the penalty. Large weights are penalized quadratically, which pushes all weights toward smaller values without forcing them to zero. This is the most common regularization in deep learning, where it is called <strong><a class="cross-ref" href="section-0.2.html">weight decay</a></strong>. You will see weight decay appear again as a critical hyperparameter when <a class="cross-ref" href="../../part-4-training-adapting/module-14-fine-tuning-fundamentals/section-14.3.html">tuning fine-tuning hyperparameters in Chapter 14</a>.</p>
377377

378378
<h3>L1 Regularization (Lasso)</h3>
379379

@@ -502,7 +502,7 @@ <h3>Decomposing Prediction Error</h3>
502502

503503
<div class="callout note">
504504
<div class="callout-title">Note</div>
505-
<p>Modern deep learning complicates the classical bias-variance tradeoff. Very large neural networks (including LLMs) are so overparameterized that they can memorize the training set perfectly, yet they still generalize well. This phenomenon, sometimes called "benign overfitting" or the "double descent" curve, is an active area of research that connects directly to <a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-06.2.html">scaling laws and the Chinchilla findings in Chapter 6</a>. The classical framework remains a valuable mental model, but reality is richer than the simple U-shaped curve suggests.</p>
505+
<p>Modern deep learning complicates the classical bias-variance tradeoff. Very large neural networks (including LLMs) are so overparameterized that they can memorize the training set perfectly, yet they still generalize well. This phenomenon, sometimes called "benign overfitting" or the "double descent" curve, is an active area of research that connects directly to <a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-6.2.html">scaling laws and the Chinchilla findings in Chapter 6</a>. The classical framework remains a valuable mental model, but reality is richer than the simple U-shaped curve suggests.</p>
506506
</div>
507507

508508
<h2>6. Cross-Validation and Model Selection <span class="level-badge intermediate" title="Intermediate">INTERMEDIATE</span></h2>

0 commit comments

Comments
 (0)