Add 'Workflows in the AI era' side quest#679
Open
pinin4fjords wants to merge 12 commits intomasterfrom
Open
Conversation
✅ Deploy Preview for nextflow-training ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
|
Nextflow linting complete! ❌ 4 files had 21 errors 💡 Tip: Click filename locations to go directly to that code. View all 21 issues
View formatting changes
|
adamrtalbot
reviewed
Jan 13, 2026
Collaborator
adamrtalbot
left a comment
There was a problem hiding this comment.
Great start! I have some comments:
- reduce the gushing over workflows a bit, show don't tell
- reduce the rate of hyphens.
- we shouldn't over explain how a Nextflow process works. Just highlight the features (inputs, outputs, script, software packaging). If a user wants to learn how to use Nextflow, they should move to hello-nextflow afterwards.
| - **Efficient parallelization** - Independent tasks run simultaneously, so analysis completes in hours, not days. | ||
| - **Resource awareness** - Respects memory and CPU limits. No crashed jobs or killed processes. | ||
| - **Failure recovery** - Can resume from where it stopped. A single failure doesn't waste hours of completed work. | ||
| - **Portability** - Runs on laptop, cluster, or cloud with the same code. |
|
|
||
| #### 1.3.4. Add Salmon Index Download | ||
|
|
||
| Salmon needs a pre-built index of the reference transcriptome. We'll download a pre-built index (to save time) only if it doesn't already exist. This avoids re-downloading for every sample. |
Collaborator
There was a problem hiding this comment.
There's possibly a lesson on conditional logic here, which is easier in a workflow language.
Collaborator
There was a problem hiding this comment.
Well, ish. It may be a trickier conceptually in Nextflow, but it's more robust (detecting the directory vs the object existing).
Collaborator
Author
Changes addressing review feedbackAdam's specific comments addressed:
Additional changes per feedback:
Unclear comments (left as-is for clarification):
|
3 tasks
Resets the PR branch onto current master and places the side quest content under the new docs/en/ subtree at workflows_in_the_ai_era/ to match the dir-per-quest layout used by the rest of the side quests, and to make the AI-era framing visible in the file structure. Filename + nav layout only; content rewrite, publishDir migration, meta-map removal, and lint fixes follow in subsequent commits. [skip ci] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop publishDir directives from all three modules
- Drop params.outdir; replace with workflow-level output { } block
in main.nf using publish: declarations
- Replace tuple val(meta), path(reads) with tuple val(id),
path(reads); meta maps are the metadata side quest's territory
- Switch deprecated `Channel` to lowercase `channel`
- Trim explanatory comments from modules; the markdown does the
teaching
- Tidy nextflow.config: trace/timeline/report write under
pipeline_info/, profiles cleaned up
Solution copy passes nextflow lint with zero errors. Starter copy
keeps ??? placeholders (CI lints only solutions/, per
.github/workflows/nextflow-lint.yml).
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New title and opening anchor: lead with the question newcomers
actually arrive with ('why a workflow tool when an agent can run my
analysis or generate the pipeline for me?'), and answer with the
durable-artefact case in three short paragraphs
- Reframe the 'good analysis pipeline' list as the properties an AI
assistant doesn't give you for free
- Apply repo heading numbering (## 1., ### 1.1., #### 1.1.1.) across
the page; passes check_headings.py
- Replace each Part 1 'Reflection' / advocacy paragraph with a short
'### Takeaway' that names what the agent's ad-hoc execution didn't
give you (provenance, throttling, version pinning, resume)
- Update Part 2 'Contrast with scripts' tips to 'Contrast with the
agent's script' and add the agent-on-its-own clause to each
- Migrate Part 2 process examples to drop publishDir; show the
workflow-level publish: + output {} block in main.nf and recommend
-output-dir on the CLI
- Update path references: side-quests/workflow_management_fundamentals
-> side-quests/workflows_in_the_ai_era; ../hello_nextflow ->
../../hello_nextflow (matches the dir-per-quest layout)
- Add 'Learning goals' and 'Prerequisites' subsections near the top to
match other side quests
- Add new section 5 'But what if the AI writes the Nextflow?' that
closes the loop on the second half of the user's question, with the
punchline 'because Claude needs them too'
- Restructure Summary as 'Key patterns' (pinned container, publish/
output block, -output-dir, -resume, profiles) plus 'Additional
resources' and a 'What's next?' link back to side_quests/index.md
- Strip em-dashes (CLAUDE.md style); replace with periods, colons, or
semicolons
- Drop tag '$meta.id' references in markdown to match the modules
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add --platform=linux/amd64 to docker.runOptions, mirroring
hello-nf-core. Required for FastQC's bioconda image to run on
Apple Silicon (the JVM SIGSEGVs under qemu without it).
- Drop the timeline/report/trace blocks. They were writing into
pipeline_info/ relative to the launch dir, polluting the source
tree and tripping 'file already exists' on -resume. They aren't
load-bearing for this side quest's pedagogy; if a learner wants
them they can pass -with-report -with-timeline -with-trace.
Verified end-to-end: solution pipeline completes 10 tasks in ~1m on
docker; -resume is a full cache hit; outputs land at
results/{fastqc,fastp,salmon}/.
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Correct hl_lines on three Nextflow code blocks: avoid highlighting structural keywords (input:/output:), highlight the meaningful content lines, and don't run past the end of the snippet. - fastp.nf: "5 8-10 13-21" -> "6 9-11 15-21" - salmon.nf: "8-9 12 16-22" -> "9-10 13 17-23" - main.nf: "1 4-10 13-30" -> "1 3-9 12-18" - Rename two subsection labels back from 'Takeaway' to 'The problem: sequential execution' and 'The hidden problem' (they discuss new problems, not summarise what was learned). - Update side-quests/workflows_in_the_ai_era/README.md to match the new title, the new solution path, and the AI-era framing. [skip ci] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a41659b to
b67f0fe
Compare
Two findings from walking the tutorial as a learner inside the
training container:
1. Section 3.1.3 documented "ls -la bash/" but showed tree-style
output. ls -la wouldn't produce that. Switched to "ls bash/" with
the actual one-line output, then added a short bullet list giving
each script's role (which was the value of the original tree
listing).
2. The mamba install is ~3 minutes. The previous structure had it as
a hard wall in 3.2 before any teaching happened. Restructured:
- 3.2 is now a short "Kick off the tool install" section: one
paragraph + the install command, with a note that the activation
and version check come later.
- The activate + version check + "remember to re-activate" warning
moved to a new 3.3.7 "Activate the env and verify it's ready",
placed right before the learner actually needs to run anything.
- The install-pain takeaway folded into 3.3.9 (Takeaway), which
now mentions provenance for the conda env solve.
Net: the learner kicks off the install, reads through 3.3.1-3.3.6
building the script (which takes longer than 3 minutes), and by
the time they need to run, the env is ready. No idle waiting.
Verified end-to-end: mamba install (~3 min), activate, fastp + salmon
ran on a real sample inside the training container. FastQC's JVM
crashes under x86 emulation on Apple Silicon Docker without Rosetta;
that's a Mac-host issue, not a tutorial bug, since Codespaces is
linux/amd64 native.
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous framing implied 'agent produces bash, you produce Nextflow'. In reality the same agent writes either form just as readily. The lesson is not 'AI vs no-AI' but 'what abstraction is worth pointing AI at'. - Opening: leads with 'agent will produce either bash or Nextflow; question is which artefact is worth keeping'. Both forms are AI- authored options; the engineering virtues come from the abstraction, not the author. - Section 1 properties intro: 'properties any agent producing bash has to remember to write, every time, and the properties a workflow tool supplies structurally'. - 3.3.9 takeaway: bash script is what the agent might hand you for a quick analysis; everything past 'ran once on my laptop' is the agent's responsibility separately. Asking the same agent for a workflow is what fixes it. - 3.5.3 hidden problem: an agent producing bash needs to write 20-30 lines of throttling. The same agent producing Nextflow doesn't. - 4.4.3 (was 'agent on its own would have hit the same wall'): the same agent producing this Nextflow process gets parallelisation for free; producing bash, it has to write throttling itself and won't. - Section 5 reframed: 'why this matters more, not less, when AI is writing the code'. The case for the workflow tool gets stronger with more AI authoring, because the agent's mistakes have somewhere safe to land. - Summary: the pipeline is the durable artefact; the agent writes either form; the form that makes the agent's output trustworthy is the one with a workflow boundary baked in. [skip ci] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion' The previous framing implied 'agent writes bash, you write Nextflow' or 'agent's bash is worse than agent's Nextflow'. Both miss the deeper point: the agent will write either form on demand, and the question is which artefact is worth keeping after the conversation ends. Structural changes: - New section 2, 'The artefact those properties live in', explicitly introduces the artefact as a thing that lives in version control, gets vetted at code review, tested, updated when tools change, read by colleagues. Without this concept named before Part 1, the takeaways throughout don't have a referent. - Sections 2-5 renumbered to 3-6 to make room. - Section 6 (was 5) reshaped as the closing argument that walks through the four eras (commands -> scripts -> workflows -> AI authoring), ending on 'the artefact has to outlive the conversation'. - Opening anchor rewrites to lead with the four-era arc and the artefact framing, replacing the previous 'durable artefact' paragraph. - Summary punchline replaces 'Claude needs them too' (cute but oblique) with 'the artefact has to outlive the conversation that made it' (concrete, vendor-neutral, ages well). Vendor neutrality: - Removed the 'Claude' punchline. - Removed inline links to 'nf-core Claude Code skills' (specific product naming). - Kept Seqera AI in Additional resources as a useful pointer per user request, but vendor-neutralised the surrounding language. [skip ci] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous intro (~330 words) was an outlier against the rest of
the side-quests intro range (median ~50-180 words). Worse, it
litigated the script-vs-workflow thesis in paragraph 4 before the
reader had any reason to care, duplicating what section 2 ('The
artefact those properties live in') already does properly.
Cut paragraph 4 entirely. Trim the laundry-list tail of paragraph 3.
Result: 4 short paragraphs that pose the question, sketch the four
eras, and signal the worked-example arc. The thesis lives in
section 2 where it has room to breathe.
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the AI angle was load-bearing in the intro, section 3,
and Summary, but in Parts 1 and 2 it was decorative phrases tacked
onto timeless explanations ('the agent didn't pin versions',
'whether you wrote it or asked an agent to'). The lesson read as
the timeless script-vs-workflow story with AI sprinkles.
Reframed Parts 1 and 2's premises so the AI thread is the framing,
not the commentary:
- Section 1 intro: 'this is the kind of script an agent would
produce from a paragraph of intent: it works on three samples on
the laptop where it ran. Each property is something the author,
agent or human, has to remember to write into the script.'
- Section 2 intro: 'the same paragraph of intent, given to the same
agent, would produce something close to what you write here. The
difference is not authorship or speed; it is the artefact's shape.'
- 'Contrast with the agent's script' admonitions retitled to
'Contrast with the script form'. Bodies rewritten to contrast
the artefact forms (where the script put X on the author, the
workflow puts X in the structure), not the authors.
- Section 1.8 takeaway closes with an artefact-form judgement: each
warning is more bash someone writes and a maintainer reads; the
form is the problem, not who wrote it.
- Section 2.8 takeaway closes mirroring: every property is supplied
by the workflow boundary itself, not by the author; a maintainer
can see at a glance what ran where; the form did the work the
script left to the author.
- Section 3 tightened to two paragraphs. The four-era recap moved
out (the intro already names them); the bottleneck shift and the
'case gets stronger with AI' argument stay, plus the punchline.
Page is now 1137 lines (was 1142) but reads more cohesively: the
AI angle frames each part rather than decorating it.
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three real fixes from the review pass:
1. Section 2.5 prescribed MULTIQC code that the solution didn't
contain. Added side-quests/solutions/workflows_in_the_ai_era/
nextflow/modules/multiqc.nf and wired it into main.nf with the
.map/.mix/.collect pattern, plus multiqc_report in publish: and
output {}. Solution now produces results/multiqc/multiqc_report
.html on a successful run.
2. fastqc.nf and fastp.nf in the solution had cpus/memory directives
and ${task.cpus} threading, but the doc walks the learner through
building these modules without resource directives (the
${task.cpus} pattern is introduced for SALMON_QUANT in section
2.4, deliberately). Stripped cpus/memory from fastqc.nf and
fastp.nf in the solution; threads stay hardcoded at 2 and 4
respectively to match what the doc shows. Pedagogical beat
preserved.
3. Fixed multiqc.nf output mismatch: process declared
path "multiqc_data" but the script ran with --filename
multiqc_report, which makes MultiQC produce multiqc_report_data
instead. Dropped --filename so MultiQC uses default names that
match the declared outputs (multiqc_report.html, multiqc_data).
Plus the hl_lines nit:
- fastp.nf 'After' tab had hl_lines="6 9-11 15-21" but missed line
22 (--thread 4). Now hl_lines="6 9-11 15-22".
Plus a lint cleanup on three .map { id, files -> files } closures
in main.nf where id was unused; renamed to _id to suppress the
'parameter not used' warning. Mirrored in the markdown.
Plus the duplicated thesis: dropped the standalone 'The artefact
has to outlive the conversation that made it' line from the Summary
since the line above already lands the same point.
Verified end-to-end on Apple Silicon (Docker / linux/amd64
emulation): all 11 tasks complete via -resume, multiqc_report.html
lands at results/multiqc/. Solution lints clean (0 errors, 0
warnings). check_headings.py clean.
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix items:
- Block 12 hl_lines: '1 3-9 12-18' -> '1 4-9 13-18'. The original
highlighted 'publish:' (line 3) and 'output {' (line 12) which
already exist in the Before as structural placeholders, contrary
to CLAUDE.md's 'don't highlight structural keywords' rule. Now
highlights only the genuinely new lines (channel assignments and
path entries).
- Side-quest table description: 22-word phrase that wraps badly on
narrow screens replaced with 'When to use a workflow tool in an
AI-driven world' (~10 words).
Plus my own check-highlights pass found two consistency issues that
the earlier manual check missed:
- Block 2 (FastQC) hl_lines: '3-5' -> '2-5' to include the echo
line, matching block 1 (download) and block 5 (salmon quant).
- Block 3 (fastp) hl_lines: '3-10' -> '2-10' for the same reason.
Polish items:
- Added a 'Why main:?' admonition after the publish/output wire-up
in 2.4.2, explaining when the main: label is required.
- Reframed section 2.5 (MultiQC) so it reads as 'walking through
code that's already been written for you' rather than asking the
learner to add it; matches the section's actual structure.
- Added an inline comment + a parenthetical explaining the _id
underscore convention at first use in the MULTIQC wiring.
- Section 3 title 'Why workflows still matter when AI does the
writing' -> 'Workflows when AI writes the code'. Matches the
noun-phrase pattern of other section titles.
Minor items:
- Resume output block in 2.6: added a MULTIQC line with the
'<- And anything downstream' annotation, since modifying the
pipeline would re-run MULTIQC too. Was previously misleading.
- Closed the '50 samples next week' narrative loop in the Part 2
takeaway: 'That 50-sample run next week your PI mentioned in
section 1.1.1? Same nextflow run command, no changes to main.nf,
swap -profile docker for -profile slurm if you want it on the
cluster.'
Reviewer's note that I claimed to use /check-highlights earlier
when I'd only manually counted lines was correct. This pass invokes
the skill properly.
[skip ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A new side quest that answers, hands-on, the question newcomers actually arrive with in 2026: "why should I use a workflow tool when an AI agent can run my analysis or generate the whole pipeline for me?"
The reader builds a real RNA-seq analysis (FastQC, fastp, Salmon) twice. Once as a bash pipeline of the kind an agent would produce on demand, hitting each engineering limit in turn. Once as a Nextflow workflow, where reproducibility, software tracking, scalability, parallelisation, resource awareness, failure recovery, and portability come from the workflow boundary itself. A short closing section addresses the second half of the question (what if the AI writes the Nextflow?) with the punchline Claude needs them too.
Page
docs/en/docs/side_quests/workflows_in_the_ai_era/index.md(~1170 lines, ~75 minutes)docs/en/docs/side_quests/index.mdtable anddocs/en/mkdocs.ymlnav, slotted between Development Environment and Essential Scripting Patterns.Runnable assets
side-quests/workflows_in_the_ai_era/(starter, with TODOs) andside-quests/solutions/workflows_in_the_ai_era/(full solutions). Real RNA-seq test data fromnf-core/test-datasets.Compared to the original PR
This PR was opened in October 2025 as Workflow Management Fundamentals and has been substantially restructured:
docs/en/docs/side_quests/workflows_in_the_ai_era/index.md) and the runnable-asset directories. The mechanism (build in bash, rebuild in Nextflow, contrast at each step) is preserved.mamba activate+ version check is a new 3.3.7 placed right before the learner actually needs to run anything. The 3-minute install finishes invisibly while the learner reads and edits.Repo standards (brought up to current)
docs/en/subtree and the dir-per-side-quest layout.publishDir-> workflow output system (publish:block + top-leveloutput {}+-output-diron the CLI). Noparams.outdirleft.metamaps in favour oftuple val(id), path(reads). Themetadataside quest is the right home for maps.## 1./### 1.1.with trailing periods); passescheck_headings.py.prettier.val(meta)->val(id)) addressed; some were already fixed in earlier commits and are now consolidated into the rewrite.Verified
nextflow lint side-quests/solutions/workflows_in_the_ai_era/returns zero errors. (Starter copy keeps???placeholders; CI lints**/solutions/*only per.github/workflows/nextflow-lint.yml.)uv run .github/check_headings.pyclean.prettier --checkclean.-resumeis a full cache hit; outputs land atresults/{fastqc,fastp,salmon}/./run-tutorialskill inside theghcr.io/nextflow-io/training:latestcontainer: Part 1 mamba install completes in ~3 min, fastp + salmon run on real test data; Part 2 already covered above.Test plan
mamba activate+ version check at 3.3.7 lands when the install would have finished naturally.nextflow run main.nf -output-dir results -profile dockerflow.