Conversation
…stream tools writes secondary fasta file with extended AA names replaced with X to match the peptides reported by MSFragger
There was a problem hiding this comment.
Pull request overview
Adds extended amino acid (MSFragger extended_amino_acids) support across the GUI and execution pipeline, including FASTA preprocessing for downstream tools and updated parsing/rewriting for detailed mass offsets and pepXML database paths.
Changes:
- Add
msfragger.extended_amino_acidsto workflow templates and parameter files; update some detailed offset encodings (e.g.,0.00000formatting). - Introduce FASTA rewriting utility/command to produce a
_toX.fastafor downstream tools and route that FASTA through the pipeline when enabled. - Update pepXML rewriting and detailed offset parsing to support updated database paths and extended AA site syntax.
Reviewed changes
Copilot reviewed 96 out of 96 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/XRNAX-MassOffset.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/WWA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT35.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT18-Astral.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16-ubiquitination-K_tmt_plus_ubiq.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16-ubiquitination-K_tmt_or_ubiq.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16-MS3.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16-acetyl.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT16-acetyl-noloc.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-ubiquitination-K_tmt_plus_ubiq.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-ubiquitination-K_tmt_or_ubiq.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-ubiquitin.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-phospho-bridge.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-Open.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-MS3.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-MS3-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-bridge.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-acetyl.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/TMT10-acetyl-noloc.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Stellar-GPFDIA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Stellar-DDA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/SILAC3.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/SILAC3-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Open.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Open-quickscan.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-peptidome.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-TMT10.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-glyco.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-diaPASEF.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-DIA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-DIA-Astral.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-customDB-groupFDR.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Nonspecific-HLA-C57.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Mass-Offset-CommonPTMs.workflow | Adds msfragger.extended_amino_acids; updates detailed-offset string formatting (e.g., 0.00000 encoding). |
| workflows/LFQ-ubiquitin.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/LFQ-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/LFQ-MBR.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Labile_phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Labile_ADP-ribosylation.workflow | Adds msfragger.extended_amino_acids; updates detailed-offset string formatting (e.g., 0.00000 encoding). |
| workflows/iTRAQ4.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/iTRAQ4-phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-Pair.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-open-Hybrid.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-open-HCD.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-Hybrid.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-HCD.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-DIA-OPair.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-O-DIA-HCD.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-TMT.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-open-Hybrid.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-open-HCD.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-LFQ.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-Hybrid.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-HCD.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/glyco-N-DIA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/FPOP.workflow | Adds msfragger.extended_amino_acids; updates detailed-offset string formatting (e.g., 0.00000 encoding). |
| workflows/Diagnostic-ion-mining.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/DIA_SpecLib_Quant.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/DIA_SpecLib_Quant_Ubiq.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/DIA_SpecLib_Quant_Phospho.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/DIA_SpecLib_Quant_Phospho_diaPASEF.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/DIA_SpecLib_Quant_diaPASEF.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/DIA_DIA-Umpire_SpecLib_Quant.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/citrullination.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-PAL.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-ABPP-isoTOP.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-ABPP-isoDTB.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-ABPP-ipIAA.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-ABPP-IADTB-TMT16.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-ABPP-IADTB-diaPASEF.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/chemprot-ABPP-diaTOP.workflow | Adds msfragger.extended_amino_acids workflow key. |
| workflows/Basic-Search.workflow | Adds msfragger.extended_amino_acids workflow key. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/util/RewritePepxml.java | Adds optional --fasta= handling and rewrites pepXML database paths in-place. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/util/MassOffsetUtils.java | Updates detailed-offset parsing/formatting and allowed-site handling to support extended syntax. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/util/ExtendedAAFastaEdit.java | New CLI utility to rewrite FASTA sequences by replacing (name) patterns with X. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/percolator/PercolatorOutputToPepXML.java | Adds optional FASTA-path rewrite of <search_database>/database_name in output pepXML. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/MsfraggerParams.java | Adds extended_amino_acids parameter constant and comment metadata. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_open.params | Adds extended_amino_acids default key to params template. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_offset.params | Adds extended_amino_acids default key to params template. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_nonspecific.params | Adds extended_amino_acids default key to params template. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tools/fragger/fragger_closed.params | Adds extended_amino_acids default key to params template. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/tabs/TabMsfragger.java | Adds GUI controls for enabling extended AAs and loading/saving definitions; exposes masses to downstream mod-mass aggregation. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/FragpipeRun.java | Routes downstream FASTA path when extended AAs are enabled; introduces and wires CmdExtendedAAFastaEdit; includes extended AA masses in mod-mass set. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/dialogs/DetailedOffsetEditDialog.java | Updates construction of zero offset to match MassOffset signature change. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdPercolator.java | Plumbs optional updated FASTA path to Percolator pepXML conversion step. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdPeptideProphet.java | Plumbs optional updated FASTA path to RewritePepxml step via --fasta=. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdMsfraggerDigest.java | Allows extended-AA site syntax in variable-mod site validation. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdMsfragger.java | Allows extended-AA site syntax in variable-mod site validation (via new regex). |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/cmd/CmdExtendedAAFastaEdit.java | New pipeline command wrapper to run ExtendedAAFastaEdit before downstream tools. |
| FragPipe-GUI/src/main/java/org/nesvilab/fragpipe/api/OffsetsTableModel.java | Updates detailed-offset model to preserve allowed-sites as a string (matching MassOffset changes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| addToGraph(graphOrder, cmdStart, DIRECTION.IN); | ||
| addToGraph(graphOrder, cmdCheckCentroid, DIRECTION.IN, cmdStart); | ||
| addToGraph(graphOrder, cmdExtendedAAFastaEdit, DIRECTION.IN, cmdCheckCentroid); | ||
| addToGraph(graphOrder, cmdUmpire, DIRECTION.IN, cmdCheckCentroid); | ||
| addToGraph(graphOrder, cmdDiaTracer, DIRECTION.IN, cmdCheckCentroid); | ||
| addToGraph(graphOrder, cmdMsfragger, DIRECTION.IN, cmdCheckCentroid, cmdUmpire); |
There was a problem hiding this comment.
cmdExtendedAAFastaEdit is added to the task graph but nothing depends on it. When extended AAs are enabled, multiple downstream commands are configured to use downstreamFastaFile, so they must not start until the edited FASTA is guaranteed to exist. Add an explicit dependency edge (e.g., make cmdMsfragger and/or all FASTA-consuming downstream commands depend on cmdExtendedAAFastaEdit when enabled) to avoid a race where tools run before the _toX.fasta is created.
There was a problem hiding this comment.
I think Fragpipe will not start the next command before a previous one finishes, so adding an extra edge here is not necessary. @fcyu does that sound right to you?
…gger.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…gger.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…gger.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e45dd5bcab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…gger.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@codex review all changes in this pull request. Before reviewing, read the whole repository to have a good understanding of the code. Then, check each of the changes in this pull request carefully to find all bugs. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5b9ecd25f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Sequence line — replace (name) patterns with X | ||
| writer.write(EXTENDED_AA_PATTERN.matcher(line).replaceAll("X")); |
There was a problem hiding this comment.
Replace extended AA tokens across wrapped FASTA lines
The replacement is done line-by-line, so an extended AA token split by FASTA wrapping (e.g., (... at end of one sequence line and ...) at the start of the next) will never match EXTENDED_AA_PATTERN and will be left unchanged. In that case the downstream _toX FASTA no longer matches MSFragger peptide sequences with X, which can break downstream peptide/protein mapping.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I think this can be ignored - unlikely to trigger, as fasta files should have the sequence all in one line
unrestricted nonlabile offsets (no rules, just blank mass) were not captured due to no '(' after mass
…mers Previously cmdExtendedAAFastaEdit only depended on cmdCheckCentroid, leaving downstream consumers (Crystalc, PeptideProphet, Percolator, PhilosopherDbAnnotate, PtmShepherd, SpecLibGen, TransferLearning, DiaNN) free to run before the _toX fasta was written. Add explicit graph edges so each consumer waits on the edit command.
- TabMsfragger.actionButtonSaveExtendedAAs: wrap PrintWriter in try-with-resources, surface checkError() failures, trim name/mass parts, and pass 'this' as the parent component for the save dialog and error dialog. - TabMsfragger.actionBtnLoadExtendedAAFile: drop unread Fragpipe.propsVarSet call and fix 'defintions' typo. - TabMsfragger: fix stray indent on adjacent javadoc. - MassOffsetUtils.parseFloats: log the offending entry (splits[i]) instead of always splits[0]. - CmdExtendedAAFastaEdit: drop unused Paths import.
Remove the 'Use Extended AA Definitions' checkbox; the feature is now enabled whenever the 'Extended amino acids definition' textbox is non-blank. Drops PROP_misc_fragger_use_extended_aas and the related enablement updater, and updates isUseExtendedAAs(), getExtendedAAMassSet(), and paramsFromMap() accordingly.
This PR adds GUI and pipeline support for MSFragger's extended_aas mode, which allows users to define custom non-canonical amino acids with user-specified masses.
New features
GUI (TabMsfragger)
FASTA preprocessing for downstream tools
Pipeline (FragpipeRun)
Offset parsing update (MassOffsetUtils)