ApartsinProjects
diff --git a/‎front-matter/about-book.html‎
Lines changed: 2 additions & 2 deletions b/‎front-matter/about-book.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎front-matter/index.html‎
Lines changed: 1 addition & 1 deletion b/‎front-matter/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎front-matter/section-fm.1a.html‎
Lines changed: 1 addition & 1 deletion b/‎front-matter/section-fm.1a.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎front-matter/section-fm.5.html‎
Lines changed: 2 additions & 2 deletions b/‎front-matter/section-fm.5.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎front-matter/syllabi/index.html‎
Lines changed: 1 addition & 1 deletion b/‎front-matter/syllabi/index.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎part-1-foundations/module-00-ml-pytorch-foundations/section-0.1.html‎
Lines changed: 5 additions & 5 deletions b/‎part-1-foundations/module-00-ml-pytorch-foundations/section-0.1.html‎
Lines changed: 5 additions & 5 deletions
@@ -27,14 +27,14 @@ <h2>At a Glance</h2>
 
     <p>Whether you want to build your first RAG pipeline, ship an AI agent to production, or make strategic decisions about LLM adoption at your organization, this book meets you where you are. It is for software engineers, ML practitioners, researchers, product leaders, domain specialists, and educators who want to understand, build, and deploy systems powered by large language models. It assumes familiarity with Python and basic linear algebra; appendices cover the remaining prerequisites.</p>
 
-    <p>The book spans <strong>38 chapters</strong> in 11 parts, plus <strong>22 appendices</strong> (A through V) with framework tutorials, and a <a href="../capstone/index.html">capstone project</a>. For the full chapter map, dependency diagram, audience details, and background requirements, see <a href="section-fm.1a.html">FM.1: What This Book Covers</a>. Twenty tailored <a href="pathways/index.html">reading pathways</a> help you find the most relevant chapters for your goals.</p>
+    <p>The book spans <strong>39 chapters</strong> (numbered 0 through 38) in 11 parts, plus <strong>22 appendices</strong> (A through V) with framework tutorials, and a <a href="../capstone/index.html">capstone project</a>. For the full chapter map, dependency diagram, audience details, and background requirements, see <a href="section-fm.1a.html">FM.1: What This Book Covers</a>. Twenty tailored <a href="pathways/index.html">reading pathways</a> help you find the most relevant chapters for your goals.</p>
 
     <!-- ============================================================ -->
     <!-- HOW THIS BOOK WAS CREATED                                     -->
     <!-- ============================================================ -->
     <h2>How This Book Was Created</h2>
 
-    <p>This book was produced through a collaborative process between its human authors and a team of 42 specialized AI writing agents. The authors curated every chapter, validated all technical content, and made all editorial decisions; AI agents proposed initial drafts, generated code examples, created illustrations, and checked cross-references across the 38-chapter structure.</p>
+    <p>This book was produced through a collaborative process between its human authors and a team of 42 specialized AI writing agents. The authors curated every chapter, validated all technical content, and made all editorial decisions; AI agents proposed initial drafts, generated code examples, created illustrations, and checked cross-references across the 39-chapter structure.</p>
 
     <p>Many of the book's illustrations were produced using Google Gemini's image generation capabilities, with prompts crafted by the authors and refined through iterative feedback. All diagrams and SVG figures were either hand-coded or generated and reviewed for technical accuracy.</p>
 
 
@@ -27,7 +27,7 @@ <h1>Introduction, Pathways &amp; How to Use This Book</h1>
     <div class="overview">
         <h2>Overview of the Front Matter</h2>
         <p>
-            Before you build anything, you need a map. This front matter orients you before you dive into the technical chapters. It answers four questions every reader has at the start: What does this book cover, and who is it for? How should I navigate 38 chapters and 11 parts given my background and goals? How can an instructor build a university course from this material? And what conventions, callout types, and recurring elements will I encounter on every page?
+            Before you build anything, you need a map. This front matter orients you before you dive into the technical chapters. It answers four questions every reader has at the start: What does this book cover, and who is it for? How should I navigate 39 chapters and 11 parts given my background and goals? How can an instructor build a university course from this material? And what conventions, callout types, and recurring elements will I encounter on every page?
         </p>
         <p>
             Whether you plan to read cover to cover or jump straight to the chapters that match your role, spending 15 minutes here will save you hours of backtracking later. Each section below links to a dedicated page with full detail.
 
@@ -25,7 +25,7 @@ <h1>What This Book Covers</h1>
 </blockquote>
 
     <p>
-        Six months from now, you will be building AI systems that did not exist when you started reading. This book is a comprehensive, practitioner-oriented guide to the entire Large Language Model stack. It begins with the mathematical and conceptual foundations of machine learning, moves through the architecture and training of transformers, and culminates in the design, deployment, and governance of production AI agent systems. The journey spans 38 chapters (numbered 0 through 38) organized into eleven parts, plus 22 appendices covering frameworks, tools, and reference material.
+        Six months from now, you will be building AI systems that did not exist when you started reading. This book is a comprehensive, practitioner-oriented guide to the entire Large Language Model stack. It begins with the mathematical and conceptual foundations of machine learning, moves through the architecture and training of transformers, and culminates in the design, deployment, and governance of production AI agent systems. The journey spans 39 chapters (numbered 0 through 38) organized into eleven parts, plus 22 appendices covering frameworks, tools, and reference material.
     </p>
 
     <p>
 
@@ -47,7 +47,7 @@ <h2>The Production Philosophy</h2>
     </p>
     <p>
         This is not a gimmick. It is a deliberate architectural decision that enables: rapid iteration
-        (a chapter can be produced and revised in hours, not months), consistent quality across all 38
+        (a chapter can be produced and revised in hours, not months), consistent quality across all 39
         chapters (every chapter passes through the same 22 quality stages), and deep cross-referencing
         (agents can read and reference the entire book while writing any single section).
     </p>
@@ -79,7 +79,7 @@ <h2>The Human Role</h2>
         While the AI agents produce the content, human oversight plays a critical role at several points:
     </p>
     <ul>
-        <li><strong>Book architecture:</strong> The overall structure (11 parts, 38 chapters, section breakdown) was designed by a human author with input from the Curriculum Architect agent.</li>
+        <li><strong>Book architecture:</strong> The overall structure (11 parts, 39 chapters, section breakdown) was designed by a human author with input from the Curriculum Architect agent.</li>
         <li><strong>Quality standards:</strong> The conformance checklist, callout types, page layout standards, and CSS design system were human-defined, then enforced by agents.</li>
         <li><strong>Editorial judgment:</strong> Major decisions about scope (what to include/exclude), tone (technical but accessible), and audience (engineers, researchers, students) were human decisions.</li>
         <li><strong>Review and iteration:</strong> Every chapter is reviewed by a human who can request revisions, flag inaccuracies, or redirect emphasis.</li>
 
@@ -27,7 +27,7 @@ <h3>How to Use This Section</h3>
     <h2>University Course Syllabi</h2>
 
     <p>
-        With 38 chapters, no single semester can cover everything. The following four syllabi are designed for instructors adopting this book for a single-semester (14-week) university course. Click any card for the complete week-by-week syllabus with hyperlinked chapter references.
+        With 39 chapters, no single semester can cover everything. The following four syllabi are designed for instructors adopting this book for a single-semester (14-week) university course. Click any card for the complete week-by-week syllabus with hyperlinked chapter references.
     </p>
 
     <div class="course-grid">
 
@@ -117,7 +117,7 @@ <h2>2. Supervised Learning: Classification and Regression <span class="level-bad
 
     <h3>Regression: Predicting Numbers</h3>
 
-    <p>In <strong>regression</strong>, the output is a continuous number. Predicting house prices, stock returns, <a class="cross-ref" href="../module-05-decoding-text-generation/section-05.2.html">temperature</a>, or the probability that a user clicks an ad: all regression tasks. The model produces a numeric prediction, and we measure how far off it is from the true value.</p>
+    <p>In <strong>regression</strong>, the output is a continuous number. Predicting house prices, stock returns, <a class="cross-ref" href="../module-05-decoding-text-generation/section-5.2.html">temperature</a>, or the probability that a user clicks an ad: all regression tasks. The model produces a numeric prediction, and we measure how far off it is from the true value.</p>
 
     <h3>Classification: Predicting Categories</h3>
 
@@ -143,7 +143,7 @@ <h3>The Supervised Learning Recipe</h3>
 
 <div class="callout note">
 <div class="callout-title">Three Learning Paradigms</div>
-<p><strong>Supervised learning</strong> requires human labels (input-output pairs). <strong>Unsupervised learning</strong> finds patterns in data without labels (clustering, dimensionality reduction). <strong>Self-supervised learning</strong> creates its own labels from the data: mask a word and predict it (<a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-06.1.html">BERT</a>), or predict the next word from all previous words (GPT). This is how every large language model is pre-trained. It is the reason LLMs can learn from the entire internet without human annotation.</p>
+<p><strong>Supervised learning</strong> requires human labels (input-output pairs). <strong>Unsupervised learning</strong> finds patterns in data without labels (clustering, dimensionality reduction). <strong>Self-supervised learning</strong> creates its own labels from the data: mask a word and predict it (<a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-6.1.html">BERT</a>), or predict the next word from all previous words (GPT). This is how every large language model is pre-trained. It is the reason LLMs can learn from the entire internet without human annotation.</p>
 </div>
 
     <h2>3. Loss Functions and Optimization <span class="level-badge intermediate" title="Intermediate">INTERMEDIATE</span></h2>
@@ -161,7 +161,7 @@ <h3>Loss Functions: Defining "Wrong"</h3>
 
     <p>Squaring the errors does two things: it makes all errors positive (so they do not cancel out), and it penalizes large errors more severely than small ones. A prediction that is off by 10 contributes 100 to the loss, while one that is off by 1 contributes just 1.</p>
 
-    <p><strong>For classification</strong>, the standard is <strong><a class="cross-ref" href="../module-04-transformer-architecture/section-04.1.html">Cross-Entropy</a> Loss</strong>:</p>
+    <p><strong>For classification</strong>, the standard is <strong><a class="cross-ref" href="../module-04-transformer-architecture/section-4.1.html">Cross-Entropy</a> Loss</strong>:</p>
 
     <div class="math-block">
     $$\mathcal{L} = -\frac{1}{n} \sum y_{i} \log(p_{i})$$
@@ -373,7 +373,7 @@ <h3>L2 Regularization (Ridge / Weight Decay)</h3>
 
 </div>
 
-    <p>The hyperparameter <span class="math">$\lambda$</span> controls the strength of the penalty. Large weights are penalized quadratically, which pushes all weights toward smaller values without forcing them to zero. This is the most common regularization in deep learning, where it is called <strong><a class="cross-ref" href="section-0.2.html">weight decay</a></strong>. You will see weight decay appear again as a critical hyperparameter when <a class="cross-ref" href="../../part-4-training-adapting/module-14-fine-tuning-fundamentals/section-14.3.html">tuning fine-tuning hyperparameters in Chapter 13</a>.</p>
+    <p>The hyperparameter <span class="math">$\lambda$</span> controls the strength of the penalty. Large weights are penalized quadratically, which pushes all weights toward smaller values without forcing them to zero. This is the most common regularization in deep learning, where it is called <strong><a class="cross-ref" href="section-0.2.html">weight decay</a></strong>. You will see weight decay appear again as a critical hyperparameter when <a class="cross-ref" href="../../part-4-training-adapting/module-14-fine-tuning-fundamentals/section-14.3.html">tuning fine-tuning hyperparameters in Chapter 14</a>.</p>
 
     <h3>L1 Regularization (Lasso)</h3>
 
@@ -502,7 +502,7 @@ <h3>Decomposing Prediction Error</h3>
 
     <div class="callout note">
         <div class="callout-title">Note</div>
-        <p>Modern deep learning complicates the classical bias-variance tradeoff. Very large neural networks (including LLMs) are so overparameterized that they can memorize the training set perfectly, yet they still generalize well. This phenomenon, sometimes called "benign overfitting" or the "double descent" curve, is an active area of research that connects directly to <a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-06.2.html">scaling laws and the Chinchilla findings in Chapter 6</a>. The classical framework remains a valuable mental model, but reality is richer than the simple U-shaped curve suggests.</p>
+        <p>Modern deep learning complicates the classical bias-variance tradeoff. Very large neural networks (including LLMs) are so overparameterized that they can memorize the training set perfectly, yet they still generalize well. This phenomenon, sometimes called "benign overfitting" or the "double descent" curve, is an active area of research that connects directly to <a class="cross-ref" href="../../part-2-understanding-llms/module-06-pretraining-scaling-laws/section-6.2.html">scaling laws and the Chinchilla findings in Chapter 6</a>. The classical framework remains a valuable mental model, but reality is richer than the simple U-shaped curve suggests.</p>
     </div>
 
     <h2>6. Cross-Validation and Model Selection <span class="level-badge intermediate" title="Intermediate">INTERMEDIATE</span></h2>