Skip to content

feat: MSFragger extended AA support#2730

Merged
fcyu merged 16 commits intodevelopfrom
msfragger-ext-AAs
Apr 15, 2026
Merged

feat: MSFragger extended AA support#2730
fcyu merged 16 commits intodevelopfrom
msfragger-ext-AAs

Conversation

@dpolasky
Copy link
Copy Markdown
Member

@dpolasky dpolasky commented Apr 3, 2026

This PR adds GUI and pipeline support for MSFragger's extended_aas mode, which allows users to define custom non-canonical amino acids with user-specified masses.

New features
GUI (TabMsfragger)

  • New "Use Extended AA Definitions" checkbox and editable table in the MSFragger tab for defining extended amino acids (name, mass, and allowed sites).
  • Extended AA masses are included in the mod mass set passed to IonQuant and other downstream tools.

FASTA preprocessing for downstream tools

  • New utility class ExtendedAAFastaEdit (runnable as a standalone main class) reads a FASTA file and replaces all extended AA patterns of the form (name) in sequence lines with X, writing a new file with _toX appended to the filename. Needed because the peptide sequences output from MSFragger will not match the original fasta as the extended AAs are replaced by "X".
  • New CmdExtendedAAFastaEdit wires this into the pipeline — when extended AAs are enabled, it runs before MSFragger and produces the _toX.fasta that all downstream tools (PeptideProphet, Percolator, ProteinProphet, PTM-Shepherd, SpecLibGen, DIA-NN, transfer learning, etc.) use instead of the original FASTA. MSFragger itself always uses the original FASTA.

Pipeline (FragpipeRun)

  • downstreamFastaFile variable routes the correct FASTA (original or _toX) to each tool.
  • cmdExtendedAAFastaEdit is inserted into the execution graph between centroid check and all search steps.
  • PeptideProphet and Percolator receive the downstream FASTA path for pepXML rewriting when extended AAs are active.

Offset parsing update (MassOffsetUtils)

  • Updated sitesPattern regex to support the new extended AA site syntax (aa=name_...) used in detailed mass offset strings.

@dpolasky dpolasky requested review from Copilot and fcyu April 3, 2026 13:55
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 3, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds extended amino acid (MSFragger extended_amino_acids) support across the GUI and execution pipeline, including FASTA preprocessing for downstream tools and updated parsing/rewriting for detailed mass offsets and pepXML database paths.

Changes:

  • Add msfragger.extended_amino_acids to workflow templates and parameter files; update some detailed offset encodings (e.g., 0.00000 formatting).
  • Introduce FASTA rewriting utility/command to produce a _toX.fasta for downstream tools and route that FASTA through the pipeline when enabled.
  • Update pepXML rewriting and detailed offset parsing to support updated database paths and extended AA site syntax.

Reviewed changes

Copilot reviewed 96 out of 96 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
workflows/XRNAX-MassOffset.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/WWA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT35.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT18-Astral.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16-ubiquitination-K_tmt_plus_ubiq.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16-ubiquitination-K_tmt_or_ubiq.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16-MS3.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16-acetyl.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT16-acetyl-noloc.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-ubiquitination-K_tmt_plus_ubiq.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-ubiquitination-K_tmt_or_ubiq.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-ubiquitin.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-phospho-bridge.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-Open.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-MS3.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-MS3-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-bridge.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-acetyl.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/TMT10-acetyl-noloc.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Stellar-GPFDIA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Stellar-DDA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/SILAC3.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/SILAC3-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Open.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Open-quickscan.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-peptidome.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-TMT10.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-glyco.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-diaPASEF.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-DIA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-DIA-Astral.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-customDB-groupFDR.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Nonspecific-HLA-C57.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Mass-Offset-CommonPTMs.workflow Adds msfragger.extended_amino_acids; updates detailed-offset string formatting (e.g., 0.00000 encoding).
workflows/LFQ-ubiquitin.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/LFQ-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/LFQ-MBR.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Labile_phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Labile_ADP-ribosylation.workflow Adds msfragger.extended_amino_acids; updates detailed-offset string formatting (e.g., 0.00000 encoding).
workflows/iTRAQ4.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/iTRAQ4-phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-Pair.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-open-Hybrid.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-open-HCD.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-Hybrid.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-HCD.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-DIA-OPair.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-O-DIA-HCD.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-TMT.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-open-Hybrid.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-open-HCD.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-LFQ.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-Hybrid.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-HCD.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/glyco-N-DIA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/FPOP.workflow Adds msfragger.extended_amino_acids; updates detailed-offset string formatting (e.g., 0.00000 encoding).
workflows/Diagnostic-ion-mining.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/DIA_SpecLib_Quant.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/DIA_SpecLib_Quant_Ubiq.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/DIA_SpecLib_Quant_Phospho.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/DIA_SpecLib_Quant_Phospho_diaPASEF.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/DIA_SpecLib_Quant_diaPASEF.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/DIA_DIA-Umpire_SpecLib_Quant.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/citrullination.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-PAL.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-ABPP-isoTOP.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-ABPP-isoDTB.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-ABPP-ipIAA.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-ABPP-IADTB-TMT16.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-ABPP-IADTB-diaPASEF.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/chemprot-ABPP-diaTOP.workflow Adds msfragger.extended_amino_acids workflow key.
workflows/Basic-Search.workflow Adds msfragger.extended_amino_acids workflow key.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/util/RewritePepxml.java Adds optional --fasta= handling and rewrites pepXML database paths in-place.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/util/MassOffsetUtils.java Updates detailed-offset parsing/formatting and allowed-site handling to support extended syntax.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/util/ExtendedAAFastaEdit.java New CLI utility to rewrite FASTA sequences by replacing (name) patterns with X.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/percolator/PercolatorOutputToPepXML.java Adds optional FASTA-path rewrite of <search_database>/database_name in output pepXML.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/MsfraggerParams.java Adds extended_amino_acids parameter constant and comment metadata.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_open.params Adds extended_amino_acids default key to params template.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_offset.params Adds extended_amino_acids default key to params template.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_nonspecific.params Adds extended_amino_acids default key to params template.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_closed.params Adds extended_amino_acids default key to params template.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java Adds GUI controls for enabling extended AAs and loading/saving definitions; exposes masses to downstream mod-mass aggregation.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/FragpipeRun.java Routes downstream FASTA path when extended AAs are enabled; introduces and wires CmdExtendedAAFastaEdit; includes extended AA masses in mod-mass set.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/dialogs/DetailedOffsetEditDialog.java Updates construction of zero offset to match MassOffset signature change.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdPercolator.java Plumbs optional updated FASTA path to Percolator pepXML conversion step.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdPeptideProphet.java Plumbs optional updated FASTA path to RewritePepxml step via --fasta=.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdMsfraggerDigest.java Allows extended-AA site syntax in variable-mod site validation.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdMsfragger.java Allows extended-AA site syntax in variable-mod site validation (via new regex).
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdExtendedAAFastaEdit.java New pipeline command wrapper to run ExtendedAAFastaEdit before downstream tools.
FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/api/OffsetsTableModel.java Updates detailed-offset model to preserve allowed-sites as a string (matching MassOffset changes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 2458 to 2463
addToGraph(graphOrder, cmdStart, DIRECTION.IN);
addToGraph(graphOrder, cmdCheckCentroid, DIRECTION.IN, cmdStart);
addToGraph(graphOrder, cmdExtendedAAFastaEdit, DIRECTION.IN, cmdCheckCentroid);
addToGraph(graphOrder, cmdUmpire, DIRECTION.IN, cmdCheckCentroid);
addToGraph(graphOrder, cmdDiaTracer, DIRECTION.IN, cmdCheckCentroid);
addToGraph(graphOrder, cmdMsfragger, DIRECTION.IN, cmdCheckCentroid, cmdUmpire);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmdExtendedAAFastaEdit is added to the task graph but nothing depends on it. When extended AAs are enabled, multiple downstream commands are configured to use downstreamFastaFile, so they must not start until the edited FASTA is guaranteed to exist. Add an explicit dependency edge (e.g., make cmdMsfragger and/or all FASTA-consuming downstream commands depend on cmdExtendedAAFastaEdit when enabled) to avoid a race where tools run before the _toX.fasta is created.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Fragpipe will not start the next command before a previous one finishes, so adding an extra edge here is not necessary. @fcyu does that sound right to you?

Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java Outdated
Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java Outdated
Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java Outdated
Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdExtendedAAFastaEdit.java Outdated
Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java Outdated
Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java Outdated
fcyu and others added 3 commits April 3, 2026 10:06
…gger.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…gger.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…gger.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@fcyu
Copy link
Copy Markdown
Member

fcyu commented Apr 3, 2026

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e45dd5bcab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@fcyu
Copy link
Copy Markdown
Member

fcyu commented Apr 3, 2026

@codex review all changes in this pull request. Before reviewing, read the whole repository to have a good understanding of the code. Then, check each of the changes in this pull request carefully to find all bugs.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b9ecd25f2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +93 to +94
// Sequence line — replace (name) patterns with X
writer.write(EXTENDED_AA_PATTERN.matcher(line).replaceAll("X"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace extended AA tokens across wrapped FASTA lines

The replacement is done line-by-line, so an extended AA token split by FASTA wrapping (e.g., (... at end of one sequence line and ...) at the start of the next) will never match EXTENDED_AA_PATTERN and will be left unchanged. In that case the downstream _toX FASTA no longer matches MSFragger peptide sequences with X, which can break downstream peptide/protein mapping.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be ignored - unlikely to trigger, as fasta files should have the sequence all in one line

Comment thread FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/FragpipeRun.java
dpolasky and others added 5 commits April 3, 2026 14:53
unrestricted nonlabile offsets (no rules, just blank mass) were not captured due to no '(' after mass
…mers

Previously cmdExtendedAAFastaEdit only depended on cmdCheckCentroid, leaving
downstream consumers (Crystalc, PeptideProphet, Percolator, PhilosopherDbAnnotate,
PtmShepherd, SpecLibGen, TransferLearning, DiaNN) free to run before the _toX
fasta was written. Add explicit graph edges so each consumer waits on the edit
command.
- TabMsfragger.actionButtonSaveExtendedAAs: wrap PrintWriter in try-with-resources,
  surface checkError() failures, trim name/mass parts, and pass 'this' as the
  parent component for the save dialog and error dialog.
- TabMsfragger.actionBtnLoadExtendedAAFile: drop unread Fragpipe.propsVarSet call
  and fix 'defintions' typo.
- TabMsfragger: fix stray indent on adjacent javadoc.
- MassOffsetUtils.parseFloats: log the offending entry (splits[i]) instead of
  always splits[0].
- CmdExtendedAAFastaEdit: drop unused Paths import.
Remove the 'Use Extended AA Definitions' checkbox; the feature is now
enabled whenever the 'Extended amino acids definition' textbox is
non-blank. Drops PROP_misc_fragger_use_extended_aas and the related
enablement updater, and updates isUseExtendedAAs(),
getExtendedAAMassSet(), and paramsFromMap() accordingly.
@fcyu fcyu merged commit d35c9f3 into develop Apr 15, 2026
1 check passed
@fcyu fcyu deleted the msfragger-ext-AAs branch April 15, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants