Releases: jotech/gapseq
Sievers Apple (v2.0.1)
What's Changed
- make
gapseq testwork on macos, and add support for mawk by @jonasoh in #283 - bug fix related to md5sum calculation of reaction names under macOS (#286 )
Full Changelog: v2.0.0...v2.0.1
Vanilla Orange (v2.0.0)
What's Changed
As a major change, this version includes a re-implementation of parts in gapseq find and gapseq find-transport:
-
Run time is greatly improved by performing only one large multiple sequence alignment rather than many smaller ones.
-
Users can now choose between three different sequence alignment algorithms: blast, diamond, mmseqs2. The user can choose the algorithm using the option
-A <algorithm>ingapseq find/gapseq find-transport. -
A number of bug fixes (see PR #258)
-
The output table
<query>-Pathways.tblnow includes additional columns that fully document how the completion percent was calculated and why the pathways were predicted to be present or absent. Also, an FAQ and its answer concerning completeness calculations were added to the documentation. -
When a genomic nucleotide FASTA file is used as input, it’s first translated into amino acid sequences of open reading frames (ORFs). For this step, the optional dependency pyrodigal is required.
gapseq automatically selects the appropriate codon translation table by running pyrodigal with three options:- Table 4: "Mycoplasma/Spiroplasma (Mollicutes)"
- Table 11: "Bacterial, Archaeal, and Plant Plastid Code" (default for most prokaryotic tools)
- Table 25: "Candidate Division SR1 and Gracilibacteria"
The choice between Table 11 and Tables 4/25 depends on genome coverage. If using Table 4 or 25 gives at least 5% higher coverage than Table 11, then 4 or 25 is used. Choosing between Table 4 and 25 is more nuanced since both yield the same coverage. The key difference is how the codon UGA is interpreted:
- In Table 11, UGA is a stop codon.
- In Table 4, UGA codes for Tryptophan.
- In Table 25, UGA codes for Glycine.
Since the Tryptophan content in proteins is typically around 1%, the table that produces a Tryptophan usage closest to this value is selected.
Admittedly, this approach relies on an arbitrary threshold, but it works well in practice. If users already know the correct codon table for their genome, they can provide a protein FASTA file directly to avoid translation by gapseq.
-
There are fewer dependencies on other software libraries. Specifically, the dependencies on 'exonerate', 'barrnap', 'bedtools', 'perl', and 'parallel' were dropped.
-
Users can now specify a custom directory for the reference sequence database, and which version to use (not only the latest). This option is especially relevant in cases where gapseq is installed in a location where the user does not have write permissions. See documentation for details.
-
For protein complexes, gapseq infers which subunit a reference sequence belongs to from the Fasta headers. However, subunit naming is often inconsistent. Example: EC 1.2.7.1 (Pyruvate synthase): Some proteins have the subunits stated as "subunit alpha/beta/gamma/delta"; others have "subunit PorA/PorB/PorC/PorD". For enzymes, where this is often an issue, we now have a subunit ID dictionary in
dat/complex_subunit_dict.tsv. This dictionary links synonyms to common IDs. Currently, the dictionary needs to be curated manually, but we could probably also automate this somehow.
Other small changes in the new gapseq version
Complex detection
In the old and the new gapseq version, complexes are detected by analysing the fasta sequence headers for key terms such as "chain" or "subunit". In rare cases, where there were several sequences but only very few that indicated a subunit association, gapseq always needed hits to those sequences in order to say that the complex is there. However, in most organisms, this enzyme might not be a complex/heteromer.
New approach: If 20% or less of the sequences are predicted to be a specific subunit, the reaction is not tested as a complex; i.e., no subunit hits are required for the reaction prediction to be TRUE. This is implemented in src/complex_prediction.R
Gram prediction
Gram prediction is used to determine which biomass reaction to add to a bacterial metabolic model. In the previous version, the prediction was made within the gapseq draft, where the biomass reaction was also added to the model. Now, the Gram-staining prediction is moved to gapseq find. The rationale behind this decision is that gapseq find already has the genome sequence as input; performing HMM-based Gram prediction here makes sense, as it also requires the genome. The predicted Gram staining is added as information to the headers of the output tables "...-Reactions.tbl" and "...-Pathways.tbl".
Updating reference sequence databases
gapseq now has a new module to update the reference sequence database. Two examples:
gapseq update-sequences -t Bacteria # Update Reference sequences for Bacteria
gapseq update-sequences -t Bacteria -D ~/gapseqDB/ # Update Reference sequences for Archaea and save the database in a user-defined directoryNew Contributors
Full Changelog: v1.4.0...v2.0.0
Berkeley Pit (v1.4.0)
New major features and updates
- All dependencies on R-packages of the sybil family (sybil, sybilSBML, glpkAPI, cplexAPI) are now removed. Instead, the R-package 'cobrar' now serves as a toolbox for constraint-based metabolic modeling and interface to the LP solver 'glpk'. a771d5d
- Optimised protein complex detection and prediction. fd523d8 a0b1380
- More gapseq reactions are mapped to pathways. 1d0e874
New minor features and updates
- Optimised download of reference protein sequences from UniProt. 5365dce
- log reaction in skip blast mode. 12a8e30
- Transport reactions are now added to subsystems. 690bece
- use only reviewed sequences in case of multiple candidates. 4732ccb
- enable search for reactions in pathway database. 552f28c
- improved handling of undefined subunits in protein complexes. a0b1380
Microbial physiology
- Transfer EC 2.4.1.129 to EC 2.4.99.28. 4e6663d
- L-leucine degradation via reductive Stickland reaction. bf5d31b
- New pathway for Arginine biosynthesis. 792194b
Bug fixes
- Gap-filling (minimum required growth rate is now enforced). fb7fbcb
- Gap-filling (non-feasible solutions, which are rare, are now handled and logged). 568dfa8
- Added Nickel ions to the "gut.csv" gap-fill medium. 9b5d3ac
The release name is a reference to Radiolab's podcast episode "Even the Worst Laid Plans?" (August 19, 2010), which covered a short story about microbes in the Berkeley Pit, an open pit of a former copper mine in Butte, Montana, US.
Microbial circle (v1.3.1)
- same release as in v1.3, only the version number was corrected
Microbial circle

Kandinski 1923: Circles in a Circle [1]
new major features
gapseq adapt: turn on or off growth for a list of carbon sources a647ef5 and fix growth rate 2742adb- improved speed by protein fasta input 4259a6b 642549c ab34887 0d0123e
pan-draft: species-level models from metagenomes 2e808c54 documentationgapseq test-long: fast, comprehensive test run that reconstructs ecoli core model 46f973a
new minor features
- screen for carbon sources and fermentation products 7ec79b3
- updated medium prediction f89fdbd dfaccb9 a6f2aa3 a9510ce 9047eeb
- handling of temporary files 2e34daa
- improved protein complex prediction 780e528 653be24
- integrated new uniprot API access 82ff4ad 3c936ab
- better macos/FreeBSD support b4b2c94 #145
- user-defined biomass 52dd5c8
- reference sequence download from zenodo e66a9c7
- updated rxn weight calculation 503af0e
- added M9 medium 3d31919 and updated TSB medium 73abaab
- updated pathway database metacyc v27 024476a
microbial physiology
- methanogenesis 5ba14b1 9da5f10
- rnf complex 0bea241
- bile acid pathways f715432 0293ab7
- arginine degradation fcf6deb
- aromatics degradation 140d619
- stickland fermentation 82222bd b3a904d 380d52c
- arginine degradation 4b6b6f3
- pyridoxal biosynthesis 7307765
- malolactic enzyme 02deb83
- lysin degradation 098aee6
- xylulose degradation 0ef6d9d
- inuline degradation 72ff3e0
- dehalogenation 0a77c68 deb77f2
- oxalate/formate antiporter 9e85ab1 0f03bf3
- chlorobenzene degradation 2d739a5
- glycan degradation fb3631d
bug fixes
- gpr rules e2f3209
- test script 6225829
- sbml 6fefcb5 8777e8c 8767b04 139b3b4 #153
- network coverage 0bc1278
- offline mode b27d588
- key enzyme handling 43b7241
- sequence download 55368fc 91e64d2
- gapseq find 8626e07
- gapseq find-transport 3e2d939
- output files d3a1bf2
contributors
The gapseq core team (@Waschina, @jotech) would like to thank for their support: @nicola-debernardini, @jonasoh, Anna Burrichter, @ArnaudBelcour, @jchmiel4
We appriacte all your help!
Xlthlx's moon (1.2)
- Improvement of reaction and metabolite database for archaeal metabolism (incl. methanogenesis, mevalonate pathways, chorismate biosynthesis)
- Anaerobic Degradation pathways for secondary plant metabolites (incl. daidzin, daidzein, quercetin, genistein, sulfoquinovose)
- New module for automated prediction of gapfill-/growth- medium
- Improved performance of SBML export
- gapseq version tags in main output files
- Improved prediction of reactions with multiple associated EC-numbers
- Revised reaction and metabolite database for C1-metabolism (i.e. Wood-Ljungdahl pathway)
- improved representation of nitrogen metabolism (e.g. ammonia oxidation)
- new bile acids pathways (deamination, 7dehydroxylation, epimerization)
- easier installation via conda libsbml package
- support to adjust for environmental conditions (low/high h2)
- enabled support for photosynthesis
- full model construction on the fly
- xylan degradation
- medium prediction
- improved threonine biosynthesis prediction
- updated reaction sequences (uniprot) and pathway databases (metacyc)
- revised transporter prediction
- extended nucleotide metabolism
- updated archaeal pathways
Through the guts of a beggar
A man may fish with the worm that hath eat of a king, and eat of the fish that hath fed of that worm ... a king may go a progress through the guts of a beggar (Hamlet: act 4, scene 3)
- archaea support (especially methanogens)
- new documentation
- improved fiber degradation
- extended electron bifurcation reactions
- improved anaerobic vitamine biosynthesis
- more cases of extracellular degradation
- added 'gapseq adapt' to manually improve models
- enabled photosynthesis
- many smaller bugfixes




