Skip to content

Releases: jotech/gapseq

Sievers Apple (v2.0.1)

09 Feb 15:48

Choose a tag to compare

What's Changed

  • make gapseq test work on macos, and add support for mawk by @jonasoh in #283
  • bug fix related to md5sum calculation of reaction names under macOS (#286 )

Full Changelog: v2.0.0...v2.0.1

Vanilla Orange (v2.0.0)

14 Jan 09:42

Choose a tag to compare

What's Changed

As a major change, this version includes a re-implementation of parts in gapseq find and gapseq find-transport:

  • Run time is greatly improved by performing only one large multiple sequence alignment rather than many smaller ones.

  • Users can now choose between three different sequence alignment algorithms: blast, diamond, mmseqs2. The user can choose the algorithm using the option -A <algorithm> in gapseq find/gapseq find-transport.

  • A number of bug fixes (see PR #258)

  • The output table <query>-Pathways.tbl now includes additional columns that fully document how the completion percent was calculated and why the pathways were predicted to be present or absent. Also, an FAQ and its answer concerning completeness calculations were added to the documentation.

  • When a genomic nucleotide FASTA file is used as input, it’s first translated into amino acid sequences of open reading frames (ORFs). For this step, the optional dependency pyrodigal is required.
    gapseq automatically selects the appropriate codon translation table by running pyrodigal with three options:

    • Table 4: "Mycoplasma/Spiroplasma (Mollicutes)"
    • Table 11: "Bacterial, Archaeal, and Plant Plastid Code" (default for most prokaryotic tools)
    • Table 25: "Candidate Division SR1 and Gracilibacteria"

    The choice between Table 11 and Tables 4/25 depends on genome coverage. If using Table 4 or 25 gives at least 5% higher coverage than Table 11, then 4 or 25 is used. Choosing between Table 4 and 25 is more nuanced since both yield the same coverage. The key difference is how the codon UGA is interpreted:

    • In Table 11, UGA is a stop codon.
    • In Table 4, UGA codes for Tryptophan.
    • In Table 25, UGA codes for Glycine.

    Since the Tryptophan content in proteins is typically around 1%, the table that produces a Tryptophan usage closest to this value is selected.

    Admittedly, this approach relies on an arbitrary threshold, but it works well in practice. If users already know the correct codon table for their genome, they can provide a protein FASTA file directly to avoid translation by gapseq.

  • There are fewer dependencies on other software libraries. Specifically, the dependencies on 'exonerate', 'barrnap', 'bedtools', 'perl', and 'parallel' were dropped.

  • Users can now specify a custom directory for the reference sequence database, and which version to use (not only the latest). This option is especially relevant in cases where gapseq is installed in a location where the user does not have write permissions. See documentation for details.

  • For protein complexes, gapseq infers which subunit a reference sequence belongs to from the Fasta headers. However, subunit naming is often inconsistent. Example: EC 1.2.7.1 (Pyruvate synthase): Some proteins have the subunits stated as "subunit alpha/beta/gamma/delta"; others have "subunit PorA/PorB/PorC/PorD". For enzymes, where this is often an issue, we now have a subunit ID dictionary in dat/complex_subunit_dict.tsv. This dictionary links synonyms to common IDs. Currently, the dictionary needs to be curated manually, but we could probably also automate this somehow.

Other small changes in the new gapseq version

Complex detection

In the old and the new gapseq version, complexes are detected by analysing the fasta sequence headers for key terms such as "chain" or "subunit". In rare cases, where there were several sequences but only very few that indicated a subunit association, gapseq always needed hits to those sequences in order to say that the complex is there. However, in most organisms, this enzyme might not be a complex/heteromer.

New approach: If 20% or less of the sequences are predicted to be a specific subunit, the reaction is not tested as a complex; i.e., no subunit hits are required for the reaction prediction to be TRUE. This is implemented in src/complex_prediction.R

Gram prediction

Gram prediction is used to determine which biomass reaction to add to a bacterial metabolic model. In the previous version, the prediction was made within the gapseq draft, where the biomass reaction was also added to the model. Now, the Gram-staining prediction is moved to gapseq find. The rationale behind this decision is that gapseq find already has the genome sequence as input; performing HMM-based Gram prediction here makes sense, as it also requires the genome. The predicted Gram staining is added as information to the headers of the output tables "...-Reactions.tbl" and "...-Pathways.tbl".

Updating reference sequence databases

gapseq now has a new module to update the reference sequence database. Two examples:

gapseq update-sequences -t Bacteria # Update Reference sequences for Bacteria
gapseq update-sequences -t Bacteria -D ~/gapseqDB/ # Update Reference sequences for Archaea and save the database in a user-defined directory

New Contributors

Full Changelog: v1.4.0...v2.0.0

Berkeley Pit (v1.4.0)

10 Feb 07:58

Choose a tag to compare

New major features and updates

  • All dependencies on R-packages of the sybil family (sybil, sybilSBML, glpkAPI, cplexAPI) are now removed. Instead, the R-package 'cobrar' now serves as a toolbox for constraint-based metabolic modeling and interface to the LP solver 'glpk'. a771d5d
  • Optimised protein complex detection and prediction. fd523d8 a0b1380
  • More gapseq reactions are mapped to pathways. 1d0e874

New minor features and updates

  • Optimised download of reference protein sequences from UniProt. 5365dce
  • log reaction in skip blast mode. 12a8e30
  • Transport reactions are now added to subsystems. 690bece
  • use only reviewed sequences in case of multiple candidates. 4732ccb
  • enable search for reactions in pathway database. 552f28c
  • improved handling of undefined subunits in protein complexes. a0b1380

Microbial physiology

  • Transfer EC 2.4.1.129 to EC 2.4.99.28. 4e6663d
  • L-leucine degradation via reductive Stickland reaction. bf5d31b
  • New pathway for Arginine biosynthesis. 792194b

Bug fixes

  • Gap-filling (minimum required growth rate is now enforced). fb7fbcb
  • Gap-filling (non-feasible solutions, which are rare, are now handled and logged). 568dfa8
  • Added Nickel ions to the "gut.csv" gap-fill medium. 9b5d3ac

The release name is a reference to Radiolab's podcast episode "Even the Worst Laid Plans?" (August 19, 2010), which covered a short story about microbes in the Berkeley Pit, an open pit of a former copper mine in Butte, Montana, US.

Microbial circle (v1.3.1)

26 Aug 09:15

Choose a tag to compare

  • same release as in v1.3, only the version number was corrected

Microbial circle

23 Aug 12:58

Choose a tag to compare

microbial cycle
Kandinski 1923: Circles in a Circle [1]

new major features

new minor features

microbial physiology

bug fixes

contributors

The gapseq core team (@Waschina, @jotech) would like to thank for their support: @nicola-debernardini, @jonasoh, Anna Burrichter, @ArnaudBelcour, @jchmiel4
We appriacte all your help!

Xlthlx's moon (1.2)

10 Feb 10:41

Choose a tag to compare

  • Improvement of reaction and metabolite database for archaeal metabolism (incl. methanogenesis, mevalonate pathways, chorismate biosynthesis)
  • Anaerobic Degradation pathways for secondary plant metabolites (incl. daidzin, daidzein, quercetin, genistein, sulfoquinovose)
  • New module for automated prediction of gapfill-/growth- medium
  • Improved performance of SBML export
  • gapseq version tags in main output files
  • Improved prediction of reactions with multiple associated EC-numbers
  • Revised reaction and metabolite database for C1-metabolism (i.e. Wood-Ljungdahl pathway)
  • improved representation of nitrogen metabolism (e.g. ammonia oxidation)
  • new bile acids pathways (deamination, 7dehydroxylation, epimerization)
  • easier installation via conda libsbml package
  • support to adjust for environmental conditions (low/high h2)
  • enabled support for photosynthesis
  • full model construction on the fly
  • xylan degradation
  • medium prediction
  • improved threonine biosynthesis prediction
  • updated reaction sequences (uniprot) and pathway databases (metacyc)
  • revised transporter prediction
  • extended nucleotide metabolism
  • updated archaeal pathways

Through the guts of a beggar

25 Sep 12:04

Choose a tag to compare

A man may fish with the worm that hath eat of a king, and eat of the fish that hath fed of that worm ... a king may go a progress through the guts of a beggar (Hamlet: act 4, scene 3)

  • archaea support (especially methanogens)
  • new documentation
  • improved fiber degradation
  • extended electron bifurcation reactions
  • improved anaerobic vitamine biosynthesis
  • more cases of extracellular degradation
  • added 'gapseq adapt' to manually improve models
  • enabled photosynthesis
  • many smaller bugfixes

mumbo jumbo

20 Mar 18:05

Choose a tag to compare

  • official productive release
  • used for results in the gapseq manuscript

Darwinian turtle

11 Feb 09:07

Choose a tag to compare

  • draft model generation
  • enzyme complex detection
  • gapfilling by bitscore weight

Baby tiger

09 Aug 15:51

Choose a tag to compare

Preliminary release used in first studies

  • pathway analysis for metacyc, kegg, seed, and custom pathways
  • gapfilling of metabolic models
  • draft model creation
  • a lot of corrections in seed reaction database
  • transporter prediction