-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
331 lines (330 loc) · 32.6 KB
/
index.html
File metadata and controls
331 lines (330 loc) · 32.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<title>High-Level TOC - Building Vision and Audio AI with Deep Generative Models</title>
<style>
:root { --bg:#f5efe5; --paper:#fffdfa; --ink:#1d1f1b; --muted:#5d655b; --line:#d9d0c4; --accent:#0f5a48; --accent-2:#8e5622; --soft:#edf5f0; --soft-2:#f6efe4; --shadow:0 18px 54px rgba(29,31,27,.08); --max:1240px; } * { box-sizing:border-box; } body { margin:0; font-family:Georgia,"Times New Roman",serif; color:var(--ink); line-height:1.62; background:radial-gradient(circle at top right, rgba(15,90,72,.12), transparent 28%), linear-gradient(180deg, #f8f3eb 0%, #f2ecdf 100%); } .page { max-width:var(--max); margin:0 auto; padding:28px 20px 64px; } .hero { background:linear-gradient(135deg,#153a32 0%,#1d6955 100%); color:#f8f7f2; border-radius:30px; padding:40px 36px; box-shadow:var(--shadow); margin-bottom:22px; } .eyebrow { margin:0 0 10px; text-transform:uppercase; letter-spacing:.14em; font-size:.78rem; opacity:.86; } h1 { margin:0 0 12px; font-size:clamp(2.2rem,4vw,4.1rem); line-height:1.04; max-width:12ch; } .hero p { margin:0; max-width:80ch; color:rgba(248,247,242,.93); font-size:1.03rem; } .hero-note { margin-top:16px; display:inline-block; padding:8px 14px; border-radius:999px; background:rgba(255,255,255,.12); border:1px solid rgba(255,255,255,.18); font-size:.92rem; } .grid { display:grid; grid-template-columns:320px minmax(0,1fr); gap:22px; align-items:start; } .sidebar { position:sticky; top:16px; display:grid; gap:16px; } .panel, .part, .closing { background:var(--paper); border:1px solid var(--line); border-radius:24px; box-shadow:var(--shadow); } .panel { padding:18px; } .panel h2, .closing h2 { margin:0 0 10px; color:var(--accent); text-transform:uppercase; letter-spacing:.08em; font-size:.96rem; } .summary { list-style:none; margin:0; padding:0; display:grid; gap:10px; } .summary li { background:var(--soft-2); border-radius:14px; padding:10px 12px; font-size:.94rem; } .part-link { padding:12px 0; border-top:1px solid #e7ddd0; } .part-link:first-child { border-top:0; padding-top:0; } .part-link a { text-decoration:none; color:var(--accent); font-weight:700; } .part-link p { margin:6px 0 0; color:var(--muted); font-size:.92rem; } .part { padding:26px 24px; margin-bottom:22px; } .part-label { margin-bottom:8px; color:var(--accent-2); text-transform:uppercase; letter-spacing:.16em; font-size:.82rem; } .part h2 { margin:0 0 8px; font-size:2rem; line-height:1.1; } .part-intro { margin:0 0 18px; color:var(--muted); } .chapter { display:grid; grid-template-columns:76px minmax(0,1fr); gap:18px; padding:18px 0; border-top:1px solid #e7ddd0; } .chapter:first-of-type { border-top:0; padding-top:0; } .chapter-no { width:58px; height:58px; border-radius:18px; background:var(--soft); color:var(--accent); display:flex; align-items:center; justify-content:center; font-size:1.02rem; font-weight:700; } .chapter h3 { margin:0 0 6px; font-size:1.38rem; line-height:1.2; } .chapter-desc { margin:0 0 10px; color:var(--muted); } .meta { display:flex; flex-wrap:wrap; gap:8px; margin:10px 0 12px; } .tag { background:var(--soft); color:var(--accent); border-radius:999px; padding:5px 10px; font-size:.84rem; } .label { margin:14px 0 6px; color:var(--accent); font-weight:700; text-transform:uppercase; letter-spacing:.06em; font-size:.78rem; } ol, ul { margin:0 0 0 20px; padding:0; } li { margin-bottom:6px; } .pedagogy { display:grid; grid-template-columns:repeat(4,minmax(0,1fr)); gap:10px; margin-top:12px; } .pedagogy div { background:linear-gradient(180deg,#fcfaf5 0%,#f6efe4 100%); border:1px solid #e7dece; border-radius:16px; padding:10px 12px; font-size:.9rem; } .track-grid { display:grid; grid-template-columns:repeat(2,minmax(0,1fr)); gap:12px; margin-top:12px; } .track-box { background:linear-gradient(180deg,#f7fbf8 0%,#eef5f0 100%); border:1px solid #dce9e0; border-radius:16px; padding:12px 14px; } .track-box h4 { margin:0 0 8px; color:var(--accent); font-size:1rem; } .closing { padding:24px; } @media (max-width:980px) { .grid { grid-template-columns:1fr; } .sidebar { position:static; } .pedagogy { grid-template-columns:1fr 1fr; } .track-grid { grid-template-columns:1fr; } }
.topbar { display:flex; flex-wrap:wrap; gap:10px; align-items:center; justify-content:space-between; margin-bottom:18px; }
.crumbs { display:flex; flex-wrap:wrap; gap:10px; }
.crumbs a, .inline-link { text-decoration:none; color:var(--accent); font-weight:700; }
.mini-nav { display:grid; gap:8px; }
.mini-nav a { text-decoration:none; color:var(--accent); }
.section-list { display:grid; gap:16px; }
.section-card { background:var(--paper); border:1px solid var(--line); border-radius:24px; box-shadow:var(--shadow); padding:24px; }
.section-card h2 { margin:0 0 8px; font-size:1.8rem; }
.section-card p { margin:0 0 12px; color:var(--muted); }
.chapter-list { margin:0; padding-left:20px; }
.chapter-list li { margin-bottom:6px; }
.jump { display:inline-block; margin-top:10px; text-decoration:none; color:var(--accent); font-weight:700; }
.hero-grid { display:grid; grid-template-columns:repeat(3,minmax(0,1fr)); gap:14px; margin-top:18px; }
.hero-card { background:rgba(255,255,255,.12); border:1px solid rgba(255,255,255,.18); border-radius:18px; padding:14px 16px; }
.hero-card strong { display:block; margin-bottom:6px; text-transform:uppercase; letter-spacing:.08em; font-size:.8rem; }
.hero-card span { color:rgba(248,247,242,.93); font-size:.95rem; }
@media (max-width:980px) { .hero-grid { grid-template-columns:1fr; } }
</style>
</head>
<body>
<div class="page">
<header class="hero">
<p class="eyebrow">Book Plan</p> <h1>Building Vision and Audio AI with Deep Generative Models</h1> <p><strong>Subtitle:</strong> Synthetic Data Engines for Training, Adapting, and Evaluating Modern Vision and Audio Systems. This book focuses on the modern models, pipelines, and workflows that matter when generative models are used to expand data, create hard cases, support adaptation, and improve downstream systems.</p> <div class="hero-note">Core question: how do generative models help build better vision and audio systems through synthetic data?</div>
<div class="hero-grid">
<div class="hero-card"><strong>Thesis</strong><span>Generative models are not only content generators. They are infrastructure for dataset expansion, rare-case creation, domain adaptation, and robust system training.</span></div>
<div class="hero-card"><strong>Reader Value</strong><span>Learn how to choose a synthetic-data strategy, curate outputs, train downstream models, evaluate honestly, and debug where the pipeline breaks.</span></div>
<div class="hero-card"><strong>What Makes It Distinct</strong><span>The book is organized around synthetic data use in practice, teaching, and research rather than around media generation alone.</span></div>
</div>
</header>
<div class="topbar">
<div class="crumbs">
<span class="tag">High-Level TOC</span>
<span class="tag">Section Folders</span>
<span class="tag">Cross-linked Navigation</span>
</div>
<div class="mini-nav">
<a class="inline-link" href="training-finetuning-toolkit-matrix.html">Training and Fine-Tuning Toolkit Matrix</a>
</div>
</div>
<div class="grid">
<aside class="sidebar">
<section class="panel">
<h2>Sections</h2>
<div class="part-link"><a href="front-matter/index.html">Front Matter</a><p>How This Book Is Organized</p></div>
<div class="part-link"><a href="part-1-orientation-and-fast-start/index.html">Part I</a><p>Problem Framing and Workflow Overview</p></div>
<div class="part-link"><a href="part-2-data-and-experimental-foundations/index.html">Part II</a><p>Data, Labels, and Evaluation Foundations</p></div>
<div class="part-link"><a href="part-3-vision-audio-and-generative-model-internals/index.html">Part III</a><p>Vision, Audio, and Generative Model Internals</p></div>
<div class="part-link"><a href="part-4-synthetic-data-design-and-operations/index.html">Part IV</a><p>Synthetic Data Pipelines</p></div>
<div class="part-link"><a href="part-5-training-multimodal-systems-and-operations/index.html">Part V</a><p>Training, Fine-Tuning, and Systems</p></div>
<div class="part-link"><a href="part-6-applications-frontiers-and-capstones/index.html">Part VI</a><p>Applications, Limits, and Capstones</p></div>
<div class="part-link"><a href="appendices/index.html">Appendices</a><p>Foundations and Reference</p></div>
</section>
<section class="panel">
<h2>How To Use</h2>
<ul class="summary">
<li><strong>High-Level TOC:</strong><br/>Use this page to understand the whole book structure.</li>
<li><strong>Section TOCs:</strong><br/>Open each section page for chapter-by-chapter detail.</li>
<li><strong>Cross-links:</strong><br/>Every section page links back here for fast navigation.</li>
</ul>
</section>
<section class="panel">
<h2>Core Question</h2>
<ul class="summary">
<li><strong>Main Thesis:</strong><br/>Use generative models to build better downstream systems through principled synthetic data generation and curation.</li>
<li><strong>Not The Goal:</strong><br/>This is not a general survey of all vision and audio AI and not a book about media generation for its own sake.</li>
<li><strong>Main Workflow:</strong><br/>Find the data bottleneck, pick a synthetic-data strategy, generate and filter data, train the downstream model, evaluate honestly, and debug failures.</li>
</ul>
</section>
<section class="panel">
<h2>Presentation Plan</h2>
<ul class="summary">
<li><strong>1. Orient:</strong><br/>show the full workflow and define the problem space.</li>
<li><strong>2. Discipline:</strong><br/>establish data, annotation, and evaluation rigor.</li>
<li><strong>3. Explain Models:</strong><br/>teach representations, architectures, and model internals.</li>
<li><strong>4. Build Data:</strong><br/>design, generate, repurpose, and validate synthetic data.</li>
<li><strong>5. Train Systems:</strong><br/>fine-tune models and assemble retrieval or multimodal systems.</li>
<li><strong>6. Specialize:</strong><br/>map the toolkit into application blueprints and capstones.</li>
<li><strong>7. Extend:</strong><br/>use appendices for onboarding, self-study support, and research scaffolding.</li>
</ul>
</section>
<section class="panel">
<h2>Reader Paths</h2>
<ul class="summary">
<li><strong>Self-Study:</strong><br/>Follow Parts I-V, use the appendices on demand, and pick one recurring case study.</li>
<li><strong>Engineering:</strong><br/>Prioritize Parts I, II, IV, V, and the deployment/application chapters.</li>
<li><strong>Research:</strong><br/>Prioritize Parts II, III, V, VI, plus the course and capstone materials.</li>
</ul>
</section>
<section class="panel">
<h2>TOC Legend</h2>
<ul class="summary">
<li><strong>Core / Advanced:</strong><br/>Signals whether a chapter is required for most readers or intended as a deeper specialization.</li>
<li><strong>Undergraduate / Graduate:</strong><br/>Signals where the chapter fits most naturally in the suggested course pathways.</li>
<li><strong>Engineering / Research:</strong><br/>Signals whether the chapter primarily supports system building, research framing, or both.</li>
<li><strong>Companion Repo:</strong><br/>Signals that fast-moving tooling and implementation assets live primarily in notebooks and the repo.</li>
</ul>
</section>
<section class="panel">
<h2>Recurring Case Studies</h2>
<ul class="summary">
<li><strong>Industrial Inspection:</strong><br/>rare defects, synthetic vision data, line-side deployment, and failure slicing.</li>
<li><strong>Multilingual Speech Analytics:</strong><br/>ASR, diarization, synthetic speech augmentation, streaming inference, and governance.</li>
<li><strong>Multimodal Incident Review:</strong><br/>video, audio, retrieval, grounding, structured outputs, and human-in-the-loop triage.</li>
</ul>
</section>
<section class="panel">
<h2>Adoption Paths</h2>
<ul class="summary">
<li><strong>12-Week Course Maps:</strong><br/><a class="inline-link" href="course-pathways.html">Open undergraduate and graduate course pathways</a></li>
<li><strong>Dependency Graph:</strong><br/><a class="inline-link" href="dependency-graph.html">Open chapter dependencies and prerequisite routes</a></li>
<li><strong>Toolkit Matrix:</strong><br/><a class="inline-link" href="training-finetuning-toolkit-matrix.html">Open the training and fine-tuning matrix</a></li>
</ul>
</section>
</aside>
<main>
<section class="part">
<div class="part-label">Book Map</div>
<h2>High-Level Table of Contents</h2>
<p class="part-intro">This version is the overview map of the book. Each major section now lives in its own folder with a dedicated detailed TOC page.</p>
<div class="section-list">
<section class="section-card" id="front">
<div class="part-label">Front Matter</div>
<h2>How This Book Is Organized</h2>
<p>Defines the thesis, audience, reading plan, course pathways, case-study spine, and companion assets so the rest of the manuscript reads like one coherent argument rather than a loose topic collection.</p>
<div class="label">Presentation Role</div>
<p>Defines the reading logic before the technical material starts.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>FM0</strong> Preface: Why This Book, Why Now - the thesis and reader payoff.</li>
<li><strong>FM1</strong> Who This Book Is For - preparation level, entry paths, and adoption fit.</li>
<li><strong>FM2</strong> How This Book Is Organized, Read, and Taught - the main reading arc and chapter template.</li>
<li><strong>FM3</strong> Suggested Course Syllabuses - four 12-week teaching routes.</li>
<li><strong>FM4</strong> Recurring Case Studies and Project Spine - the end-to-end systems used throughout the book.</li>
<li><strong>FM5</strong> Companion Assets, Notebooks, and Instructor Kit - what lives in print versus repo.</li>
</ol>
<div class="label">Practical Assets</div>
<ul class="chapter-list">
<li>Reader-path guide, dependency logic, and course-adoption framing</li>
<li>Case-study spine for industrial inspection, multilingual speech analytics, and multimodal incident review</li>
<li><a class="inline-link" href="course-pathways.html">Four 12-week suggested course syllabuses</a></li>
<li><a class="inline-link" href="dependency-graph.html">Dependency graph for self-study and course planning</a></li>
</ul>
<a class="jump" href="front-matter/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="part1">
<div class="part-label">Part I</div>
<h2>Problem Framing and Workflow Overview</h2>
<p>Introduces the central workflow and gives readers a complete small-scale loop before the book moves into method depth.</p>
<div class="label">Presentation Role</div>
<p>Orient the reader by showing the whole workflow before introducing complexity.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>01</strong> What This Book Will Let You Build - scope, thesis, and downstream system targets. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>02</strong> A Complete Synthetic-Data Pipeline in Miniature - one full generate-train-evaluate loop. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>03</strong> Task Taxonomy and Success Criteria - what is being built and how success is measured. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>04</strong> The Modern Toolbox Without the Noise - the core stack for data, models, evaluation, and deployment. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Labs And Notebooks</div>
<ul class="chapter-list">
<li><code>01_problem-framing-studio.ipynb</code></li>
<li><code>02_end-to-end-mini-pipeline.ipynb</code></li>
<li><code>04_toolchain-smoke-test.ipynb</code></li>
</ul>
<div class="label">Case Study Thread</div>
<p>Introduces the industrial inspection, multilingual speech, and multimodal incident-review projects that recur across the rest of the book.</p>
<a class="jump" href="part-1-orientation-and-fast-start/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="part2">
<div class="part-label">Part II</div>
<h2>Data, Labels, and Evaluation Foundations</h2>
<p>Builds the discipline needed to make synthetic-data workflows credible rather than anecdotal.</p>
<div class="label">Presentation Role</div>
<p>Discipline the workflow so later experiments, training runs, and claims are credible.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>05</strong> Data Rights, Provenance, and Responsible Acquisition - what data can be used and how. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>06</strong> Dataset Design, Schemas, and Versioning - how examples and labels are structured. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>07</strong> Annotation Systems, Weak Supervision, and Auto-Labeling - how labels are created or repaired. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>08</strong> Baselines, Ablations, and Credible Claims - how to measure effect honestly. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Labs And Notebooks</div>
<ul class="chapter-list">
<li><code>06_schema-designer.ipynb</code></li>
<li><code>07_auto-label-and-review-loop.ipynb</code></li>
<li><code>08_ablation-planner.ipynb</code></li>
</ul>
<div class="label">Boundary Notes</div>
<p>Chapter 7 is for label creation and repair; later synthetic-asset acceptance lives in Chapter 19; live-system telemetry and drift live in Chapter 25.</p>
<a class="jump" href="part-2-data-and-experimental-foundations/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="part3">
<div class="part-label">Part III</div>
<h2>Vision, Audio, and Generative Model Internals</h2>
<p>Separates the model story into four clear tracks: vision models, audio models, generative models for vision, and generative models for audio, then closes with debugging and model selection.</p>
<div class="label">Presentation Role</div>
<p>Explain the internal mechanics of the core model families by modality and by generative role before later chapters turn them into synthetic-data and downstream training workflows.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>09</strong> Vision Models - encoders, task heads, and foundation-model internals for images, video, documents, and grounding. <span class="tag">Advanced</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>10</strong> Audio Models - encoders, speech systems, diarization stacks, and audio-language model internals. <span class="tag">Advanced</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>11</strong> Generative Models for Vision - diffusion, editing, video generation, simulation, and controllable synthesis. <span class="tag">Advanced</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>12</strong> Generative Models for Audio - speech, sound, codec-token, TTS, and voice-generation pipelines. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>13</strong> Pre-Deployment Debugging for Modern Multimodal Models - how to locate failure causes before launch. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Labs And Notebooks</div>
<ul class="chapter-list">
<li><code>09_vision-models-lab.ipynb</code></li>
<li><code>10_audio-models-lab.ipynb</code></li>
<li><code>11_vision-generators-lab.ipynb</code></li>
<li><code>12_audio-generators-lab.ipynb</code></li>
</ul>
<div class="label">Deep-Dive Promise</div>
<p>Every technical chapter in this part now calls out inner workings, algorithm sketch, intuition, tradeoffs, common failure modes, figures, and a worked example.</p>
<a class="jump" href="part-3-vision-audio-and-generative-model-internals/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="part4">
<div class="part-label">Part IV</div>
<h2>Synthetic Data Pipelines</h2>
<p>Shows how to choose, build, filter, and validate synthetic-data workflows for image, video, and audio tasks.</p>
<div class="label">Presentation Role</div>
<p>Build the synthetic-data engine that will feed later training, evaluation, and iteration.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>14</strong> Designing the Synthetic Data Engine - choose augmentation, synthesis, simulation, or adaptation. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>15</strong> Image Data: Generation, Editing, and Repurposing - image-side synthetic data workflows. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>16</strong> Video Data: Temporal Labels, Tracks, and Synthetic Clips - temporal structure and video QA. <span class="tag">Advanced</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>17</strong> Audio Data: Speech, Events, TTS, Diarization, and Voice Transformation - synthetic audio workflows for downstream tasks. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>18</strong> Simulation, Procedural Data, and Real-to-Sim-to-Real Pipelines - when simulation beats prompting. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>19</strong> Synthetic Data Operations: QA, Filtering, Reward Models, and Judge Loops - deciding what enters the dataset. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>20</strong> Synthetic Data for Training, Evaluation, Stress Testing, and Challenge Sets - separating the three main uses of synthetic data. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Labs And Notebooks</div>
<ul class="chapter-list">
<li><code>15_diffusion-data-recipes.ipynb</code></li>
<li><code>17_audio-synthetic-data-lab.ipynb</code></li>
<li><code>19_generate-and-judge-ops.ipynb</code></li>
</ul>
<div class="label">Case Study Thread</div>
<p>The same three recurring case studies are revisited here through image generation, video labeling, speech augmentation, and simulation-heavy data engines.</p>
<a class="jump" href="part-4-synthetic-data-design-and-operations/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="part5">
<div class="part-label">Part V</div>
<h2>Training, Fine-Tuning, and Systems</h2>
<p>Turns curated synthetic data into trained or adapted downstream systems, then shows how to deploy and monitor them responsibly.</p>
<div class="label">Presentation Role</div>
<p>Train, fine-tune, assemble, deploy, and monitor the systems enabled by the earlier parts.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>21</strong> Fine-Tuning Vision and Video Models - adapting downstream visual task models. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>22</strong> Fine-Tuning Audio Models and Streaming Speech Systems - adapting speech and audio task models. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>23</strong> Building Multimodal Retrieval, RAG, and Agent Systems - combining models into usable products. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>24</strong> Inference Engineering, Deployment, and Cost Control - making systems fast, stable, and affordable. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>25</strong> Monitoring, Drift, and Failure Analysis in the Wild - closing the loop after deployment. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Labs And Notebooks</div>
<ul class="chapter-list">
<li><code>21_vision-video-finetuning-lab.ipynb</code></li>
<li><code>23_multimodal-system-builder.ipynb</code></li>
<li><code>24_inference-benchmark-lab.ipynb</code></li>
</ul>
<div class="label">Boundary Notes</div>
<p>Chapter 19 decides whether synthetic assets enter the dataset; Chapter 25 starts only once a system is live and producing production traces, drift, and incidents.</p>
<a class="jump" href="part-5-training-multimodal-systems-and-operations/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="part6">
<div class="part-label">Part VI</div>
<h2>Applications, Limits, and Capstones</h2>
<p>Shows how the workflow plays out in concrete domains, where it fails, and how it can become a capstone or research contribution.</p>
<div class="label">Presentation Role</div>
<p>Specialize the workflow into domains, then turn it into capstones, teaching assets, and research directions.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>26</strong> Application Blueprint: Industrial Inspection and Manufacturing - vision systems under rare-case pressure. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>27</strong> Application Blueprint: Speech, Audio Analytics, and Voice Systems - audio systems under domain and governance pressure. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>28</strong> Application Blueprint: Robotics, Accessibility, and Multimodal Incident Systems - multimodal and simulation-heavy deployments. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>29</strong> Research Frontiers and Open Questions - what is still unresolved. <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Research</span></li>
<li><strong>30</strong> Anti-Patterns, Failed Designs, and When Not to Use Synthetic Data - where the thesis breaks. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>31</strong> Capstones, Replication Studies, and Paper-Style Contributions - turning the workflow into a deliverable. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>32</strong> Teaching Studios, Assessment Design, and Course Reuse - reusing the book in classrooms. <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Labs And Notebooks</div>
<ul class="chapter-list">
<li><code>28_multimodal-evidence-demo</code></li>
<li><code>31_capstone-starter-pack.ipynb</code></li>
<li><code>32_course-reuse-kit.ipynb</code></li>
</ul>
<div class="label">Case Study Thread</div>
<p>Closes the loop by turning the recurring case studies into domain blueprints, research questions, capstones, and teaching modules.</p>
<a class="jump" href="part-6-applications-frontiers-and-capstones/index.html">Open Detailed Section TOC</a>
</section>
<section class="section-card" id="appendix">
<div class="part-label">Appendices</div>
<h2>Foundations and Reference</h2>
<p>The appendices keep the main text focused while still supporting readers who need review material, environment guidance, or cloud provisioning help. Together they make the book self-contained for readers with basic engineering knowledge by covering Python, data handling, deep learning foundations, PyTorch, media formats, setup, and research workflow support.</p>
<div class="label">Presentation Role</div>
<p>Extend the main arc with onboarding, reference material, and optional support for independent learners.</p>
<div class="label">Chapter Map</div>
<ol class="chapter-list">
<li><strong>A</strong> Python, Data Handling, and Scientific Computing Primer <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span></li>
<li><strong>B</strong> Basic Mathematics Refresher <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span></li>
<li><strong>C</strong> Deep Learning Foundations and Training Patterns <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span></li>
<li><strong>D</strong> PyTorch Tutorial and Workflow Primer <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span></li>
<li><strong>E</strong> Media, Dataset Formats, and Annotation File Primer <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span></li>
<li><strong>F</strong> Installation Instructions and Local Environment Setup <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span></li>
<li><strong>G</strong> Provisioning Cloud Models and Hosted Infrastructure <span class="tag">Advanced</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>H</strong> Audio Companion and Modular Learning Guide <span class="tag">Core</span> <span class="tag">Undergrad</span> <span class="tag">Graduate</span> <span class="tag">Eng</span> <span class="tag">Research</span></li>
<li><strong>I</strong> Research Methods Companion Appendix <span class="tag">Core</span> <span class="tag">Graduate</span> <span class="tag">Research</span></li>
</ol>
<div class="label">Practical Assets</div>
<ul class="chapter-list">
<li>Math refresher, PyTorch primer, setup guide, and cloud provisioning notes</li>
</ul>
<a class="jump" href="appendices/index.html">Open Detailed Section TOC</a>
</section>
</div>
</section>
</main>
</div>
</div>
</body>
</html>