Skip to content

Add viral detection from bulk metagenomes#1222

Open
SantaMcCloud wants to merge 3 commits into
galaxyproject:mainfrom
SantaMcCloud:add_viral_dete_wf
Open

Add viral detection from bulk metagenomes#1222
SantaMcCloud wants to merge 3 commits into
galaxyproject:mainfrom
SantaMcCloud:add_viral_dete_wf

Conversation

@SantaMcCloud

Copy link
Copy Markdown
Contributor

FOR CONTRIBUTOR:

  • I have read the Adding workflows guidelines
  • License permits unrestricted use (educational + commercial)
  • Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

  • .dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
  • Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
  • In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
  • In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
  • In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
  • Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
  • Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
  • Changelog contains appropriate entries
  • Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

@github-actions

Copy link
Copy Markdown

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ viral-detection-from-bulk-metagenomes.ga_0

    Execution Problem:

    • Failed to run workflow, invocation ended in [failed] state.
      

    Workflow invocation details

    • Invocation Messages

      • Invocation scheduling failed because step 20 depends on output 'quality summary' of step 18, but this step did not produce an output of that name.

      • Defined workflow output 'quality summary' was not found in step 18.

      • Defined workflow output 'complete genomes' was not found in step 18.

      • Defined workflow output 'output cluster' was not found in step 16.

    • Steps
      • Step 1: Assembled contigs:

        • step_state: scheduled
      • Step 2: Trimmed reads:

        • step_state: scheduled
      • Step 3: Unlabelled step (toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.5+galaxy3):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.3.1

            Command Line:

            • env -i $(which awk) --sandbox -v FS='	' -v OFS='	' --re-interval -f '/tmp/tmprv2i5ylf/job_working_directory/000/7/configs/tmpuseao8c6' '/tmp/tmprv2i5ylf/files/d/1/2/dataset_d12e3f93-49db-4245-b361-22e2adcd9f12.dat' > '/tmp/tmprv2i5ylf/job_working_directory/000/7/outputs/dataset_c7ac4067-4b03-483c-acd9-36130e432d5b.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              code "/^>/ {print \">\" FILENAME \"_\" ++i} !/^>/ {print}"
              dbkey "?"
              variables []
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.3.1

            Command Line:

            • env -i $(which awk) --sandbox -v FS='	' -v OFS='	' --re-interval -f '/tmp/tmprv2i5ylf/job_working_directory/000/8/configs/tmp0utynb6_' '/tmp/tmprv2i5ylf/files/1/9/3/dataset_1932c856-c1c9-43d3-8b24-f26b2d0d624d.dat' > '/tmp/tmprv2i5ylf/job_working_directory/000/8/outputs/dataset_9fa65793-d088-4dc3-9c0e-537982ad48af.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              code "/^>/ {print \">\" FILENAME \"_\" ++i} !/^>/ {print}"
              dbkey "?"
              variables []
      • Step 4: Unlabelled step (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmprv2i5ylf/files/c/7/a/dataset_c7ac4067-4b03-483c-acd9-36130e432d5b.dat' --ftype 'fasta' --chunksize 50000 --file_names 'chunk_' --file_ext 'fasta'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fasta"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 3, "input": {"values": [{"id": 9, "src": "dce"}]}, "newfilenames": "chunk_", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "fasta", "select_mode": {"__current_case__": 0, "chunksize": "50000", "mode": "chunk"}}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmprv2i5ylf/files/9/f/a/dataset_9fa65793-d088-4dc3-9c0e-537982ad48af.dat' --ftype 'fasta' --chunksize 50000 --file_names 'chunk_' --file_ext 'fasta'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fasta"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 3, "input": {"values": [{"id": 10, "src": "dce"}]}, "newfilenames": "chunk_", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "fasta", "select_mode": {"__current_case__": 0, "chunksize": "50000", "mode": "chunk"}}
      • Step 5: Unlabelled step (toolshed.g2.bx.psu.edu/repos/ufz/genomad_end_to_end/genomad_end_to_end/1.11.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/genomad:1.11.1--pyhdfd78af_0

            Command Line:

            • ln -s '/tmp/tmprv2i5ylf/files/0/7/7/dataset_0773651a-b599-46b9-846d-527ff87ad285.dat' sequence.fa && mkdir output/ && genomad end-to-end --conservative --threads ${GALAXY_SLOTS:-4}    --lenient-taxonomy  --sensitivity 4.2 --splits 8   --composition auto  sequence.fa output/ '/cvmfs/data.galaxyproject.org/byhand/genomad_databases/1.9'

            Exit Code:

            • 0

            Standard Output:

            • ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad annotate (v1.11.1). This will perform gene calling in     │
              │  the input sequences and annotate the predicted proteins with geNomad's      │
              │  markers.                                                                    │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_annotate                                                  │
              │    ├── sequence_annotate.json (execution parameters)                         │
              │    ├── sequence_genes.tsv (gene annotation data)                             │
              │    ├── sequence_taxonomy.tsv (taxonomic assignment)                          │
              │    ├── sequence_mmseqs2.tsv (MMseqs2 output file)                            │
              │    └── sequence_proteins.faa (protein FASTA file)                            │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [21:51:49] Executing genomad annotate.                                          
              [21:51:49] Creating the output/sequence_annotate directory.                     
              [21:58:18] Proteins predicted with pyrodigal-gv were written to                 
                         sequence_proteins.faa.                                               
              [22:13:20] Proteins annotated with MMseqs2 and geNomad database (v1.9) were     
                         written to sequence_mmseqs2.tsv.                                     
              [22:13:22] Gene data was written to sequence_genes.tsv.                         
              [22:13:22] Taxonomic assignment data was written to sequence_taxonomy.tsv.      
              [22:13:22] geNomad annotate finished!                                           
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad find-proviruses (v1.11.1). This will find putative        │
              │  proviral regions within the input sequences.                                │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_find_proviruses                                           │
              │    ├── sequence_find_proviruses.json (execution parameters)                  │
              │    ├── sequence_provirus.tsv (provirus data)                                 │
              │    ├── sequence_provirus.fna (provirus nucleotide sequences)                 │
              │    ├── sequence_provirus_proteins.faa (provirus protein sequences)           │
              │    ├── sequence_provirus_genes.tsv (provirus gene annotation data)           │
              │    ├── sequence_provirus_taxonomy.tsv (provirus taxonomic assignment)        │
              │    ├── sequence_provirus_mmseqs2.tsv (MMseqs2 output file)                   │
              │    └── sequence_provirus_aragorn.tsv (Aragorn output file)                   │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [22:13:23] Executing genomad find-proviruses.                                   
              [22:13:23] Creating the output/sequence_find_proviruses directory.              
              [22:13:26] Integrases identified with MMseqs2 and geNomad database (v1.9) were  
                         written to sequence_provirus_mmseqs2.tsv.                            
              [22:13:27] tRNAs identified with Aragorn were written to                        
                         sequence_provirus_aragorn.tsv.                                       
              [22:13:28] Provirus regions identified.                                         
              [22:13:28] Provirus data was written to sequence_provirus.tsv.                  
              [22:13:28] Provirus nucleotide sequences were written to sequence_provirus.fna. 
              [22:13:29] Provirus protein sequences were written to                           
                         sequence_provirus_proteins.faa.                                      
              [22:13:29] Provirus gene data was written to sequence_provirus_genes.tsv.       
              [22:13:29] Taxonomic assignment data was written to                             
                         sequence_provirus_taxonomy.tsv.                                      
              [22:13:29] geNomad find-proviruses finished!                                    
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad marker-classification (v1.11.1). This will classify the   │
              │  input sequences into chromosome, plasmid, or virus based on the presence    │
              │  of geNomad markers and other gene-related features.                         │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_marker_classification                                     │
              │    ├── sequence_marker_classification.json (execution parameters)            │
              │    ├── sequence_features.tsv (sequence feature data: tabular format)         │
              │    ├── sequence_features.npz (sequence feature data: binary format)          │
              │    ├── sequence_marker_classification.tsv (sequence classification: tabular  │
              │    │   format)                                                               │
              │    ├── sequence_marker_classification.npz (sequence classification: binary   │
              │    │   format)                                                               │
              │    ├── sequence_provirus_features.tsv (provirus feature data: tabular        │
              │    │   format)                                                               │
              │    ├── sequence_provirus_features.npz (provirus feature data: binary         │
              │    │   format)                                                               │
              │    ├── sequence_provirus_marker_classification.tsv (provirus                 │
              │    │   classification: tabular format)                                       │
              │    └── sequence_provirus_marker_classification.npz (provirus                 │
              │        classification: binary format)                                        │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [22:13:29] Executing genomad marker-classification.                             
              [22:13:29] Creating the output/sequence_marker_classification directory.        
              [22:13:33] Sequence features computed.                                          
              [22:13:33] Sequence features in binary format written to sequence_features.npz. 
              [22:13:34] Sequence features in tabular format written to sequence_features.tsv.
              [22:13:34] Provirus features computed.                                          
              [22:13:34] Provirus features in binary format written to                        
                         sequence_provirus_features.npz.                                      
              [22:13:34] Provirus features in tabular format written to                       
                         sequence_provirus_features.tsv.                                      
              [22:13:35] Sequences classified.                                                
              [22:13:35] Sequence classification in binary format written to                  
                         sequence_marker_classification.npz.                                  
              [22:13:35] Sequence classification in tabular format written to                 
                         sequence_marker_classification.tsv.                                  
              [22:13:35] Proviruses classified.                                               
              [22:13:35] Provirus classification in binary format written to                  
                         sequence_provirus_marker_classification.npz.                         
              [22:13:35] Provirus classification in tabular format written to                 
                         sequence_provirus_marker_classification.tsv.                         
              [22:13:35] geNomad marker-classification finished!                              
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad nn-classification (v1.11.1). This will classify the       │
              │  input sequences into chromosome, plasmid, or virus based on the nucleotide  │
              │  sequence.                                                                   │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_nn_classification                                         │
              │    ├── sequence_nn_classification.json (execution parameters)                │
              │    ├── sequence_encoded_sequences (directory containing encoded sequence     │
              │    │   data)                                                                 │
              │    ├── sequence_nn_classification.tsv (contig classification: tabular        │
              │    │   format)                                                               │
              │    ├── sequence_nn_classification.npz (contig classification: binary         │
              │    │   format)                                                               │
              │    ├── sequence_encoded_proviruses (directory containing encoded sequence    │
              │    │   data)                                                                 │
              │    ├── sequence_provirus_nn_classification.tsv (provirus classification:     │
              │    │   tabular format)                                                       │
              │    └── sequence_provirus_nn_classification.npz (provirus classification:     │
              │        binary format)                                                        │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [22:13:38] Executing genomad nn-classification.                                 
              [22:13:38] Creating the output/sequence_nn_classification directory.            
              [22:13:38] Creating the                                                         
                         output/sequence_nn_classification/sequence_encoded_sequences         
                         directory.                                                           
              [22:14:25] Encoded sequence data written to sequence_encoded_sequences.         
              [22:14:25] Creating the                                                         
                         output/sequence_nn_classification/sequence_encoded_proviruses        
                         directory.                                                           
              [22:14:25] Encoded provirus data written to sequence_encoded_proviruses.        
              
              [23:16:16] Sequences classified.                                                
              [23:16:16] Sequence classification in binary format written to                  
                         sequence_nn_classification.npz.                                      
              [23:16:49] Sequence classification in tabular format written to                 
                         sequence_nn_classification.tsv.                                      
              
              [23:16:52] Proviruses classified.                                               
              [23:16:52] Provirus classification in binary format written to                  
                         sequence_provirus_nn_classification.npz.                             
              [23:16:52] Provirus classification in tabular format written to                 
                         sequence_provirus_nn_classification.tsv.                             
              [23:16:52] geNomad nn-classification finished!                                  
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad aggregated-classification (v1.11.1). This will aggregate  │
              │  the results of the marker-classification and nn-classification modules to   │
              │  classify the input sequences into chromosome, plasmid, or virus.            │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_aggregated_classification                                 │
              │    ├── sequence_aggregated_classification.json (execution parameters)        │
              │    ├── sequence_aggregated_classification.tsv (sequence classification:      │
              │    │   tabular format)                                                       │
              │    ├── sequence_aggregated_classification.npz (sequence classification:      │
              │    │   binary format)                                                        │
              │    ├── sequence_provirus_aggregated_classification.tsv (provirus             │
              │    │   classification: tabular format)                                       │
              │    └── sequence_provirus_aggregated_classification.npz (provirus             │
              │        classification: binary format)                                        │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [23:16:52] Executing genomad aggregated-classification.                         
              [23:16:52] Creating the output/sequence_aggregated_classification directory.    
              [23:16:52] The total marker frequencies of the input sequences were computed.   
              [23:16:52] Sequences classified.                                                
              [23:16:52] Sequence classification in binary format written to                  
                         sequence_aggregated_classification.npz.                              
              [23:16:52] Sequence classification in tabular format written to                 
                         sequence_aggregated_classification.tsv.                              
              [23:16:52] Proviruses classified.                                               
              [23:16:52] Provirus classification in binary format written to                  
                         sequence_provirus_aggregated_classification.npz.                     
              [23:16:52] Provirus classification in tabular format written to                 
                         sequence_provirus_aggregated_classification.tsv.                     
              [23:16:52] geNomad aggregated-classification finished!                          
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad summary (v1.11.1). This will summarize the results        │
              │  across modules into a classification report.                                │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_summary                                                   │
              │    ├── sequence_summary.json (execution parameters)                          │
              │    ├── sequence_virus_summary.tsv (virus classification summary)             │
              │    ├── sequence_plasmid_summary.tsv (plasmid classification summary)         │
              │    ├── sequence_virus.fna (virus nucleotide FASTA file)                      │
              │    ├── sequence_plasmid.fna (plasmid nucleotide FASTA file)                  │
              │    ├── sequence_virus_proteins.faa (virus protein FASTA file)                │
              │    ├── sequence_plasmid_proteins.faa (plasmid protein FASTA file)            │
              │    ├── sequence_virus_genes.tsv (virus gene annotation data)                 │
              │    └── sequence_plasmid_genes.tsv (plasmid gene annotation data)             │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [23:16:52] Executing genomad summary.                                           
              [23:16:52] Creating the output/sequence_summary directory.                      
              [23:16:53] Using scores from aggregated-classification.                         
              [23:16:53] 26 plasmid(s) and 163 virus(es) were identified.                     
              [23:16:53] Nucleotide sequences were written to sequence_plasmid.fna and        
                         sequence_virus.fna.                                                  
              [23:16:53] Protein sequences were written to sequence_plasmid_proteins.faa and  
                         sequence_virus_proteins.faa.                                         
              [23:16:53] Gene annotation data was written to sequence_plasmid_genes.tsv and   
                         sequence_virus_genes.tsv.                                            
              [23:16:53] Summary files were written to sequence_plasmid_summary.tsv and       
                         sequence_virus_summary.tsv.                                          
              [23:16:53] geNomad summary finished!                                            
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              DATABASE "1.9"
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              annotation {"full_ictv_lineage": false, "lenient_taxonomy": true, "sensitivity": "4.2", "splits": "8"}
              basic {"disable_find_proviruses": true, "disable_nn_classification": true, "enable_score_calibration": false}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filter_cond {"__current_case__": 0, "filtering_preset": "--conservative"}
              license true
              provirus {"skip_integrase_identification": false, "skip_trna_identification": false}
              score {"composition": "auto", "force_auto": false}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/genomad:1.11.1--pyhdfd78af_0

            Command Line:

            • ln -s '/tmp/tmprv2i5ylf/files/7/d/a/dataset_7da41ef1-98ad-4553-82df-a44d9e302aad.dat' sequence.fa && mkdir output/ && genomad end-to-end --conservative --threads ${GALAXY_SLOTS:-4}    --lenient-taxonomy  --sensitivity 4.2 --splits 8   --composition auto  sequence.fa output/ '/cvmfs/data.galaxyproject.org/byhand/genomad_databases/1.9'

            Exit Code:

            • 0

            Standard Output:

            • ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad annotate (v1.11.1). This will perform gene calling in     │
              │  the input sequences and annotate the predicted proteins with geNomad's      │
              │  markers.                                                                    │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_annotate                                                  │
              │    ├── sequence_annotate.json (execution parameters)                         │
              │    ├── sequence_genes.tsv (gene annotation data)                             │
              │    ├── sequence_taxonomy.tsv (taxonomic assignment)                          │
              │    ├── sequence_mmseqs2.tsv (MMseqs2 output file)                            │
              │    └── sequence_proteins.faa (protein FASTA file)                            │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [23:17:06] Executing genomad annotate.                                          
              [23:17:06] Creating the output/sequence_annotate directory.                     
              [23:23:16] Proteins predicted with pyrodigal-gv were written to                 
                         sequence_proteins.faa.                                               
              [23:35:33] Proteins annotated with MMseqs2 and geNomad database (v1.9) were     
                         written to sequence_mmseqs2.tsv.                                     
              [23:35:34] Gene data was written to sequence_genes.tsv.                         
              [23:35:34] Taxonomic assignment data was written to sequence_taxonomy.tsv.      
              [23:35:34] geNomad annotate finished!                                           
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad find-proviruses (v1.11.1). This will find putative        │
              │  proviral regions within the input sequences.                                │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_find_proviruses                                           │
              │    ├── sequence_find_proviruses.json (execution parameters)                  │
              │    ├── sequence_provirus.tsv (provirus data)                                 │
              │    ├── sequence_provirus.fna (provirus nucleotide sequences)                 │
              │    ├── sequence_provirus_proteins.faa (provirus protein sequences)           │
              │    ├── sequence_provirus_genes.tsv (provirus gene annotation data)           │
              │    ├── sequence_provirus_taxonomy.tsv (provirus taxonomic assignment)        │
              │    ├── sequence_provirus_mmseqs2.tsv (MMseqs2 output file)                   │
              │    └── sequence_provirus_aragorn.tsv (Aragorn output file)                   │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [23:35:34] Executing genomad find-proviruses.                                   
              [23:35:34] Creating the output/sequence_find_proviruses directory.              
              [23:35:38] Integrases identified with MMseqs2 and geNomad database (v1.9) were  
                         written to sequence_provirus_mmseqs2.tsv.                            
              [23:35:39] tRNAs identified with Aragorn were written to                        
                         sequence_provirus_aragorn.tsv.                                       
              [23:35:40] Provirus regions identified.                                         
              [23:35:40] Provirus data was written to sequence_provirus.tsv.                  
              [23:35:40] Provirus nucleotide sequences were written to sequence_provirus.fna. 
              [23:35:40] Provirus protein sequences were written to                           
                         sequence_provirus_proteins.faa.                                      
              [23:35:40] Provirus gene data was written to sequence_provirus_genes.tsv.       
              [23:35:40] Taxonomic assignment data was written to                             
                         sequence_provirus_taxonomy.tsv.                                      
              [23:35:40] geNomad find-proviruses finished!                                    
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad marker-classification (v1.11.1). This will classify the   │
              │  input sequences into chromosome, plasmid, or virus based on the presence    │
              │  of geNomad markers and other gene-related features.                         │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_marker_classification                                     │
              │    ├── sequence_marker_classification.json (execution parameters)            │
              │    ├── sequence_features.tsv (sequence feature data: tabular format)         │
              │    ├── sequence_features.npz (sequence feature data: binary format)          │
              │    ├── sequence_marker_classification.tsv (sequence classification: tabular  │
              │    │   format)                                                               │
              │    ├── sequence_marker_classification.npz (sequence classification: binary   │
              │    │   format)                                                               │
              │    ├── sequence_provirus_features.tsv (provirus feature data: tabular        │
              │    │   format)                                                               │
              │    ├── sequence_provirus_features.npz (provirus feature data: binary         │
              │    │   format)                                                               │
              │    ├── sequence_provirus_marker_classification.tsv (provirus                 │
              │    │   classification: tabular format)                                       │
              │    └── sequence_provirus_marker_classification.npz (provirus                 │
              │        classification: binary format)                                        │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [23:35:40] Executing genomad marker-classification.                             
              [23:35:40] Creating the output/sequence_marker_classification directory.        
              [23:35:44] Sequence features computed.                                          
              [23:35:44] Sequence features in binary format written to sequence_features.npz. 
              [23:35:45] Sequence features in tabular format written to sequence_features.tsv.
              [23:35:45] Provirus features computed.                                          
              [23:35:45] Provirus features in binary format written to                        
                         sequence_provirus_features.npz.                                      
              [23:35:45] Provirus features in tabular format written to                       
                         sequence_provirus_features.tsv.                                      
              [23:35:46] Sequences classified.                                                
              [23:35:46] Sequence classification in binary format written to                  
                         sequence_marker_classification.npz.                                  
              [23:35:46] Sequence classification in tabular format written to                 
                         sequence_marker_classification.tsv.                                  
              [23:35:46] Proviruses classified.                                               
              [23:35:46] Provirus classification in binary format written to                  
                         sequence_provirus_marker_classification.npz.                         
              [23:35:46] Provirus classification in tabular format written to                 
                         sequence_provirus_marker_classification.tsv.                         
              [23:35:46] geNomad marker-classification finished!                              
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad nn-classification (v1.11.1). This will classify the       │
              │  input sequences into chromosome, plasmid, or virus based on the nucleotide  │
              │  sequence.                                                                   │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_nn_classification                                         │
              │    ├── sequence_nn_classification.json (execution parameters)                │
              │    ├── sequence_encoded_sequences (directory containing encoded sequence     │
              │    │   data)                                                                 │
              │    ├── sequence_nn_classification.tsv (contig classification: tabular        │
              │    │   format)                                                               │
              │    ├── sequence_nn_classification.npz (contig classification: binary         │
              │    │   format)                                                               │
              │    ├── sequence_encoded_proviruses (directory containing encoded sequence    │
              │    │   data)                                                                 │
              │    ├── sequence_provirus_nn_classification.tsv (provirus classification:     │
              │    │   tabular format)                                                       │
              │    └── sequence_provirus_nn_classification.npz (provirus classification:     │
              │        binary format)                                                        │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [23:35:50] Executing genomad nn-classification.                                 
              [23:35:50] Creating the output/sequence_nn_classification directory.            
              [23:35:50] Creating the                                                         
                         output/sequence_nn_classification/sequence_encoded_sequences         
                         directory.                                                           
              [23:36:36] Encoded sequence data written to sequence_encoded_sequences.         
              [23:36:36] Creating the                                                         
                         output/sequence_nn_classification/sequence_encoded_proviruses        
                         directory.                                                           
              [23:36:36] Encoded provirus data written to sequence_encoded_proviruses.        
              
              [00:38:58] Sequences classified.                                                
              [00:38:58] Sequence classification in binary format written to                  
                         sequence_nn_classification.npz.                                      
              [00:39:32] Sequence classification in tabular format written to                 
                         sequence_nn_classification.tsv.                                      
              
              [00:39:36] Proviruses classified.                                               
              [00:39:36] Provirus classification in binary format written to                  
                         sequence_provirus_nn_classification.npz.                             
              [00:39:36] Provirus classification in tabular format written to                 
                         sequence_provirus_nn_classification.tsv.                             
              [00:39:36] geNomad nn-classification finished!                                  
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad aggregated-classification (v1.11.1). This will aggregate  │
              │  the results of the marker-classification and nn-classification modules to   │
              │  classify the input sequences into chromosome, plasmid, or virus.            │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_aggregated_classification                                 │
              │    ├── sequence_aggregated_classification.json (execution parameters)        │
              │    ├── sequence_aggregated_classification.tsv (sequence classification:      │
              │    │   tabular format)                                                       │
              │    ├── sequence_aggregated_classification.npz (sequence classification:      │
              │    │   binary format)                                                        │
              │    ├── sequence_provirus_aggregated_classification.tsv (provirus             │
              │    │   classification: tabular format)                                       │
              │    └── sequence_provirus_aggregated_classification.npz (provirus             │
              │        classification: binary format)                                        │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [00:39:36] Executing genomad aggregated-classification.                         
              [00:39:36] Creating the output/sequence_aggregated_classification directory.    
              [00:39:36] The total marker frequencies of the input sequences were computed.   
              [00:39:36] Sequences classified.                                                
              [00:39:36] Sequence classification in binary format written to                  
                         sequence_aggregated_classification.npz.                              
              [00:39:37] Sequence classification in tabular format written to                 
                         sequence_aggregated_classification.tsv.                              
              [00:39:37] Proviruses classified.                                               
              [00:39:37] Provirus classification in binary format written to                  
                         sequence_provirus_aggregated_classification.npz.                     
              [00:39:37] Provirus classification in tabular format written to                 
                         sequence_provirus_aggregated_classification.tsv.                     
              [00:39:37] geNomad aggregated-classification finished!                          
              ╭──────────────────────────────────────────────────────────────────────────────╮
              │  Executing geNomad summary (v1.11.1). This will summarize the results        │
              │  across modules into a classification report.                                │
              │  ──────────────────────────────────────────────────────────────────────────  │
              │  Outputs:                                                                    │
              │    output/sequence_summary                                                   │
              │    ├── sequence_summary.json (execution parameters)                          │
              │    ├── sequence_virus_summary.tsv (virus classification summary)             │
              │    ├── sequence_plasmid_summary.tsv (plasmid classification summary)         │
              │    ├── sequence_virus.fna (virus nucleotide FASTA file)                      │
              │    ├── sequence_plasmid.fna (plasmid nucleotide FASTA file)                  │
              │    ├── sequence_virus_proteins.faa (virus protein FASTA file)                │
              │    ├── sequence_plasmid_proteins.faa (plasmid protein FASTA file)            │
              │    ├── sequence_virus_genes.tsv (virus gene annotation data)                 │
              │    └── sequence_plasmid_genes.tsv (plasmid gene annotation data)             │
              ╰──────────────────────────────────────────────────────────────────────────────╯
              [00:39:37] Executing genomad summary.                                           
              [00:39:37] Creating the output/sequence_summary directory.                      
              [00:39:37] Using scores from aggregated-classification.                         
              [00:39:37] 25 plasmid(s) and 152 virus(es) were identified.                     
              [00:39:37] Nucleotide sequences were written to sequence_plasmid.fna and        
                         sequence_virus.fna.                                                  
              [00:39:37] Protein sequences were written to sequence_plasmid_proteins.faa and  
                         sequence_virus_proteins.faa.                                         
              [00:39:37] Gene annotation data was written to sequence_plasmid_genes.tsv and   
                         sequence_virus_genes.tsv.                                            
              [00:39:38] Summary files were written to sequence_plasmid_summary.tsv and       
                         sequence_virus_summary.tsv.                                          
              [00:39:38] geNomad summary finished!                                            
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              DATABASE "1.9"
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              annotation {"full_ictv_lineage": false, "lenient_taxonomy": true, "sensitivity": "4.2", "splits": "8"}
              basic {"disable_find_proviruses": true, "disable_nn_classification": true, "enable_score_calibration": false}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filter_cond {"__current_case__": 0, "filtering_preset": "--conservative"}
              license true
              provirus {"skip_integrase_identification": false, "skip_trna_identification": false}
              score {"composition": "auto", "force_auto": false}
      • Step 6: Unlabelled step (toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.7--1

            Command Line:

            • cat '/tmp/tmprv2i5ylf/job_working_directory/000/13/configs/tmpu1plns84' && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/cf4397560712/query_tabular/query_tabular.py' -d -s 'workdb.sqlite' -j '/tmp/tmprv2i5ylf/job_working_directory/000/13/configs/tmp5rp5z2qf' -Q '/tmp/tmprv2i5ylf/job_working_directory/000/13/configs/tmpu1plns84'     -o '/tmp/tmprv2i5ylf/job_working_directory/000/13/outputs/dataset_85b71cb8-f01e-4284-93e3-b858e136463a.dat'

            Exit Code:

            • 0

            Standard Error:

            • JSON: {'tables': [{'file_path': '/tmp/tmprv2i5ylf/files/1/1/1/dataset_1113ad11-373b-4d14-8320-c4356dbf47a8.dat', 'table_name': 't1', 'firstlinenames': True, 'column_names': ''}]}
              
              SQL: 
              SELECT * from t1;
                        
              rowcount: None
              

            Standard Output:

            • SELECT * from t1;
                      

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tabular"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              add_to_database {"withdb": null}
              addqueries {"queries": []}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              modify_database {"sql_stmts": []}
              query_result {"__current_case__": 0, "header": "yes", "header_prefix": ""}
              save_db false
              sqlquery "SELECT * from t1;"
              tables [{"__index__": 0, "input_opts": {"linefilters": []}, "table": {"values": [{"id": 44, "src": "dce"}]}, "tbl_opts": {"col_names": "", "column_names_from_first_line": true, "indexes": [], "load_named_columns": false, "pkey_autoincr": "", "table_name": ""}}]
              workdb "workdb.sqlite"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.7--1

            Command Line:

            • cat '/tmp/tmprv2i5ylf/job_working_directory/000/14/configs/tmp1ef73t4n' && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/cf4397560712/query_tabular/query_tabular.py' -d -s 'workdb.sqlite' -j '/tmp/tmprv2i5ylf/job_working_directory/000/14/configs/tmpciis6fx1' -Q '/tmp/tmprv2i5ylf/job_working_directory/000/14/configs/tmp1ef73t4n'     -o '/tmp/tmprv2i5ylf/job_working_directory/000/14/outputs/dataset_9947f413-942a-4d2b-be48-f239dff5339b.dat'

            Exit Code:

            • 0

            Standard Error:

            • JSON: {'tables': [{'file_path': '/tmp/tmprv2i5ylf/files/b/b/1/dataset_bb1b3270-e72e-45f6-87c0-c483523a0a58.dat', 'table_name': 't1', 'firstlinenames': True, 'column_names': ''}]}
              
              SQL: 
              SELECT * from t1;
                        
              rowcount: None
              

            Standard Output:

            • SELECT * from t1;
                      

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tabular"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              add_to_database {"withdb": null}
              addqueries {"queries": []}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              modify_database {"sql_stmts": []}
              query_result {"__current_case__": 0, "header": "yes", "header_prefix": ""}
              save_db false
              sqlquery "SELECT * from t1;"
              tables [{"__index__": 0, "input_opts": {"linefilters": []}, "table": {"values": [{"id": 46, "src": "dce"}]}, "tbl_opts": {"col_names": "", "column_names_from_first_line": true, "indexes": [], "load_named_columns": false, "pkey_autoincr": "", "table_name": ""}}]
              workdb "workdb.sqlite"
      • Step 7: Unlabelled step (toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.1.0--2

            Command Line:

            • ( awk '{if (NR==1) {print}}' "/tmp/tmprv2i5ylf/files/3/f/b/dataset_3fbe12b3-732a-472d-894c-5f7d4c860a6c.dat";   awk '{if (NR!=1) {print}}' "/tmp/tmprv2i5ylf/files/3/f/b/dataset_3fbe12b3-732a-472d-894c-5f7d4c860a6c.dat";   ) > /tmp/tmprv2i5ylf/job_working_directory/000/15/outputs/dataset_aee5fcb8-5e35-41d5-a3e7-ea9c90545d6d.dat

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filename {"__current_case__": 1, "add_name": false}
              one_header true
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.1.0--2

            Command Line:

            • ( awk '{if (NR==1) {print}}' "/tmp/tmprv2i5ylf/files/5/9/6/dataset_59638d4c-6fb0-49ed-b2a2-f4c87f2350b3.dat";   awk '{if (NR!=1) {print}}' "/tmp/tmprv2i5ylf/files/5/9/6/dataset_59638d4c-6fb0-49ed-b2a2-f4c87f2350b3.dat";   ) > /tmp/tmprv2i5ylf/job_working_directory/000/16/outputs/dataset_da5b093c-e355-4ee0-98fe-a1bac6a068a7.dat

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filename {"__current_case__": 1, "add_name": false}
              one_header true
      • Step 8: Unlabelled step (toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.1.0--2

            Command Line:

            • ( awk '{if (NR==1) {print}}' "/tmp/tmprv2i5ylf/files/1/1/1/dataset_1113ad11-373b-4d14-8320-c4356dbf47a8.dat";   awk '{if (NR!=1) {print}}' "/tmp/tmprv2i5ylf/files/1/1/1/dataset_1113ad11-373b-4d14-8320-c4356dbf47a8.dat";   ) > /tmp/tmprv2i5ylf/job_working_directory/000/17/outputs/dataset_9c6aeebc-98ab-478f-bacc-86648409f0aa.dat

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filename {"__current_case__": 1, "add_name": false}
              one_header true
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.1.0--2

            Command Line:

            • ( awk '{if (NR==1) {print}}' "/tmp/tmprv2i5ylf/files/b/b/1/dataset_bb1b3270-e72e-45f6-87c0-c483523a0a58.dat";   awk '{if (NR!=1) {print}}' "/tmp/tmprv2i5ylf/files/b/b/1/dataset_bb1b3270-e72e-45f6-87c0-c483523a0a58.dat";   ) > /tmp/tmprv2i5ylf/job_working_directory/000/18/outputs/dataset_da8b9c2f-5a64-45c5-8db0-ebc6cee4118a.dat

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filename {"__current_case__": 1, "add_name": false}
              one_header true
      • Step 9: Unlabelled step (toolshed.g2.bx.psu.edu/repos/iuc/qiime_filter_fasta/qiime_filter_fasta/1.9.1.0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/qiime:1.9.1--py_3

            Command Line:

            • export MPLBACKEND=Agg && echo "backend:agg" > matplotlibrc && filter_fasta.py --input_fasta_fp '/tmp/tmprv2i5ylf/files/2/9/7/dataset_29729de7-c50c-4c3d-bfde-7efb8528f3c7.dat' --output_fasta_fp '/tmp/tmprv2i5ylf/job_working_directory/000/19/outputs/dataset_5909f3e5-25fe-4676-a9f9-a5d7d8eacb64.dat' --seq_id_fp '/tmp/tmprv2i5ylf/files/8/5/b/dataset_85b71cb8-f01e-4284-93e3-b858e136463a.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              negate false
              selection {"__current_case__": 1, "seq_id_fp": {"values": [{"id": 48, "src": "dce"}]}, "type": "seq_list"}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/qiime:1.9.1--py_3

            Command Line:

            • export MPLBACKEND=Agg && echo "backend:agg" > matplotlibrc && filter_fasta.py --input_fasta_fp '/tmp/tmprv2i5ylf/files/8/2/e/dataset_82e1637b-954f-4647-a702-7f598619ce73.dat' --output_fasta_fp '/tmp/tmprv2i5ylf/job_working_directory/000/20/outputs/dataset_0e45df56-7708-4984-baaf-0b14b525694f.dat' --seq_id_fp '/tmp/tmprv2i5ylf/files/9/9/4/dataset_9947f413-942a-4d2b-be48-f239dff5339b.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              negate false
              selection {"__current_case__": 1, "seq_id_fp": {"values": [{"id": 50, "src": "dce"}]}, "type": "seq_list"}
      • Step 10: Unlabelled step (toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.1.0--2

            Command Line:

            • (   cat "/tmp/tmprv2i5ylf/files/5/9/0/dataset_5909f3e5-25fe-4676-a9f9-a5d7d8eacb64.dat" ;   ) > /tmp/tmprv2i5ylf/job_working_directory/000/21/outputs/dataset_8751a163-fbb7-48dc-85b2-7c7c10aa866d.dat

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filename {"__current_case__": 1, "add_name": false}
              one_header false
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/gawk:5.1.0--2

            Command Line:

            • (   cat "/tmp/tmprv2i5ylf/files/0/e/4/dataset_0e45df56-7708-4984-baaf-0b14b525694f.dat" ;   ) > /tmp/tmprv2i5ylf/job_working_directory/000/22/outputs/dataset_73bbf73b-2da0-41bf-b209-4ab3e018024c.dat

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filename {"__current_case__": 1, "add_name": false}
              one_header false
      • Step 11: CheckV: Separate proviruses (toolshed.g2.bx.psu.edu/repos/ufz/checkv_end_to_end/checkv_end_to_end/1.0.3+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/checkv:1.0.3--pyhdfd78af_0

            Command Line:

            • checkv end_to_end '/tmp/tmprv2i5ylf/files/8/7/5/dataset_8751a163-fbb7-48dc-85b2-7c7c10aa866d.dat' output -d '/cvmfs/data.galaxyproject.org/byhand/checkv/checkv-db' --remove_tmp -t "${GALAXY_SLOTS:-1}"

            Exit Code:

            • 0

            Standard Error:

            • CheckV v1.0.3: contamination
              [1/8] Reading database info...
              [2/8] Reading genome info...
              [3/8] Calling genes with prodigal-gv...
              [4/8] Reading gene info...
              [5/8] Running hmmsearch...
              [6/8] Annotating genes...
              [7/8] Identifying host regions...
              [8/8] Writing results...
              Run time: 166.54 seconds
              Peak mem: 0.15 GB
              
              CheckV v1.0.3: completeness
              [1/8] Skipping gene calling...
              [2/8] Initializing queries and database...
              [3/8] Running DIAMOND blastp search...
              [4/8] Computing AAI...
              [5/8] Running AAI based completeness estimation...
              [6/8] Running HMM based completeness estimation...
              [7/8] Determining genome copy number...
              [8/8] Writing results...
              Run time: 106.48 seconds
              Peak mem: 1.99 GB
              
              CheckV v1.0.3: complete_genomes
              [1/7] Reading input sequences...
              [2/7] Finding complete proviruses...
              [3/7] Finding direct/inverted terminal repeats...
              [4/7] Filtering terminal repeats...
              [5/7] Checking genome for completeness...
              [6/7] Checking genome for large duplications...
              [7/7] Writing results...
              Run time: 0.02 seconds
              Peak mem: 1.99 GB
              
              CheckV v1.0.3: quality_summary
              [1/6] Reading input sequences...
              [2/6] Reading results from contamination module...
              [3/6] Reading results from completeness module...
              [4/6] Reading results from complete genomes module...
              [5/6] Classifying contigs into quality tiers...
              [6/6] Writing results...
              Run time: 0.01 seconds
              Peak mem: 1.99 GB
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              optional_outputs None
              reference "1.5"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/checkv:1.0.3--pyhdfd78af_0

            Command Line:

            • checkv end_to_end '/tmp/tmprv2i5ylf/files/7/3/b/dataset_73bbf73b-2da0-41bf-b209-4ab3e018024c.dat' output -d '/cvmfs/data.galaxyproject.org/byhand/checkv/checkv-db' --remove_tmp -t "${GALAXY_SLOTS:-1}"

            Exit Code:

            • 0

            Standard Error:

            • CheckV v1.0.3: contamination
              [1/8] Reading database info...
              [2/8] Reading genome info...
              [3/8] Calling genes with prodigal-gv...
              [4/8] Reading gene info...
              [5/8] Running hmmsearch...
              [6/8] Annotating genes...
              [7/8] Identifying host regions...
              [8/8] Writing results...
              Run time: 100.36 seconds
              Peak mem: 0.23 GB
              
              CheckV v1.0.3: completeness
              [1/8] Skipping gene calling...
              [2/8] Initializing queries and database...
              [3/8] Running DIAMOND blastp search...
              [4/8] Computing AAI...
              [5/8] Running AAI based completeness estimation...
              [6/8] Running HMM based completeness estimation...
              [7/8] Determining genome copy number...
              [8/8] Writing results...
              Run time: 21.01 seconds
              Peak mem: 1.38 GB
              
              CheckV v1.0.3: complete_genomes
              [1/7] Reading input sequences...
              [2/7] Finding complete proviruses...
              [3/7] Finding direct/inverted terminal repeats...
              [4/7] Filtering terminal repeats...
              [5/7] Checking genome for completeness...
              [6/7] Checking genome for large duplications...
              [7/7] Writing results...
              Run time: 0.02 seconds
              Peak mem: 1.38 GB
              
              CheckV v1.0.3: quality_summary
              [1/6] Reading input sequences...
              [2/6] Reading results from contamination module...
              [3/6] Reading results from completeness module...
              [4/6] Reading results from complete genomes module...
              [5/6] Classifying contigs into quality tiers...
              [6/6] Writing results...
              Run time: 0.02 seconds
              Peak mem: 1.38 GB
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              optional_outputs None
              reference "1.5"
      • Step 12: Unlabelled step (__FILTER_EMPTY_DATASETS__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              input {"values": [{"id": 20, "src": "hdca"}]}
              replacement {"__class__": "RuntimeValue"}
      • Step 13: Unlabelled step (toolshed.g2.bx.psu.edu/repos/ufz/genomad_end_to_end/genomad_end_to_end/1.11.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is running

            Command Line:

            • ln -s '/tmp/tmprv2i5ylf/files/c/7/1/dataset_c71c6363-8202-4979-943b-85719ebd6e44.dat' sequence.fa && mkdir output/ && genomad end-to-end --conservative --threads ${GALAXY_SLOTS:-4}    --lenient-taxonomy --full-ictv-lineage --sensitivity 4.2 --splits 4   --composition auto  sequence.fa output/ '/cvmfs/data.galaxyproject.org/byhand/genomad_databases/1.9'

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              DATABASE "1.9"
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              annotation {"full_ictv_lineage": true, "lenient_taxonomy": true, "sensitivity": "4.2", "splits": "4"}
              basic {"disable_find_proviruses": true, "disable_nn_classification": true, "enable_score_calibration": false}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filter_cond {"__current_case__": 0, "filtering_preset": "--conservative"}
              license true
              provirus {"skip_integrase_identification": false, "skip_trna_identification": false}
              score {"composition": "auto", "force_auto": false}
          • Job 2:

            • Job state is queued

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              DATABASE "1.9"
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              annotation {"full_ictv_lineage": true, "lenient_taxonomy": true, "sensitivity": "4.2", "splits": "4"}
              basic {"disable_find_proviruses": true, "disable_nn_classification": true, "enable_score_calibration": false}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              filter_cond {"__current_case__": 0, "filtering_preset": "--conservative"}
              license true
              provirus {"skip_integrase_identification": false, "skip_trna_identification": false}
              score {"composition": "auto", "force_auto": false}
      • Step 14: Unlabelled step (toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cat/9.5+galaxy3):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              queries [{"__index__": 0, "inputs2": {"values": [{"id": 21, "src": "hdca"}]}}]
      • Step 15: Unlabelled step (toolshed.g2.bx.psu.edu/repos/yhoogstrate/segmentation_fold/smf_utils_fix-fasta-headers/smf-v1.7-0_utils-v2.1.1-1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fasta"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
      • Step 16: Unlabelled step (toolshed.g2.bx.psu.edu/repos/iuc/mmseqs2_easy_linclust_clustering/mmseqs2_easy_linclust_clustering/17-b804f+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              align {"alignment_mode": "0", "alignment_output_mode": "0", "alt_ali": "0", "convertalis": false, "corr_score_weight": "0.0", "evalue": "0.001", "max_accept": "2147483647", "max_rejected": "2147483647", "min_aln_len": "0", "realign": false, "realign_max_seqs": "2147483647", "realign_score_bias": "-0.2", "score_bias": "0.0", "seq_id_mode": "0", "wrapped_scoring": false}
              alph_type {"__current_case__": 0, "dbtype": "0"}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              cluster {"cluster_mode": "0", "max_iterations": "1000", "similarity_type": "2"}
              common {"max_seq_len": "65535"}
              cov "0.85"
              cov_mode "0"
              dbkey "?"
              expert {"filter_hits": false, "sort_results": "0"}
              kmermatcher {"cluster_weight_threshold": "0.9", "hash_shift": "67", "ignore_multi_kmer": false, "include_only_extendable": false, "kmer_per_seq": "21"}
              min_seq_id "0.95"
              misc {"id_offset": "0", "rescore_mode": "0", "shuffle": true}
              output_files {"output_selection": ["file_rep_seq", "file_cluster_tsv"]}
              prefilter {"add_self_matches": false, "kmer_length": "0", "mask": "1", "mask_lower_case": "0", "mask_n_repeat": "0", "mask_prob": "0.9", "spaced_kmer_mode": "0"}
      • Step 17: Unlabelled step (toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.3):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              checks [{"__index__": 0, "pattern": " +\\n", "replacement": "\\n"}]
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
      • Step 18: Unlabelled step (toolshed.g2.bx.psu.edu/repos/ufz/checkv_end_to_end/checkv_end_to_end/1.0.3+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              optional_outputs None
              reference "1.5"
      • Step 19: Unlabelled step (toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              analysis_type {"__current_case__": 0, "analysis_type_selector": "simple", "presets": "no_presets"}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              library {"__current_case__": 1, "aligned_file": false, "input_1": {"values": [{"id": 3, "src": "dce"}]}, "paired_options": {"I": "0", "X": "500", "__current_case__": 0, "dovetail": false, "fr_rf_ff": "--fr", "no_contain": false, "no_discordant": true, "no_mixed": false, "no_overlap": false, "paired_options_selector": "yes"}, "type": "paired_collection", "unaligned_file": false}
              reference_genome {"__current_case__": 1, "own_file": {"values": [{"id": 67, "src": "hda"}]}, "source": "history"}
              rg {"__current_case__": 2, "rg_selector": "set_id_auto"}
              sam_options {"__current_case__": 0, "no_unal": true, "omit_sec_seq": false, "reorder": false, "sam_no_qname_trunc": false, "sam_opt": false, "sam_options_selector": "yes", "soft_clipped_unmapped_tlen": false, "xeq": false}
              save_mapping_stats false
          • Job 2:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "15ac24fc428311f1a4187c1e52ed9d6b"
              analysis_type {"__current_case__": 0, "analysis_type_selector": "simple", "presets": "no_presets"}
              chromInfo "/tmp/tmprv2i5ylf/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              library {"__current_case__": 1, "aligned_file": false, "input_1": {"values": [{"id": 6, "src": "dce"}]}, "paired_options": {"I": "0", "X": "500", "__current_case__": 0, "dovetail": false, "fr_rf_ff": "--fr", "no_contain": false, "no_discordant": true, "no_mixed": false, "no_overlap": false, "paired_options_selector": "yes"}, "type": "paired_collection", "unaligned_file": false}
              reference_genome {"__current_case__": 1, "own_file": {"values": [{"id": 67, "src": "hda"}]}, "source": "history"}
              rg {"__current_case__": 2, "rg_selector": "set_id_auto"}
              sam_options {"__current_case__": 0, "no_unal": true, "omit_sec_seq": false, "reorder": false, "sam_no_qname_trunc": false, "sam_opt": false, "sam_options_selector": "yes", "soft_clipped_unmapped_tlen": false, "xeq": false}
              save_mapping_stats false
      • Step 20: Unlabelled step:

        • step_state: new
      • Step 21: Unlabelled step:

        • step_state: new
      • Step 22: Unlabelled step:

        • step_state: new
      • Step 23: Unlabelled step:

        • step_state: new
      • Step 24: Unlabelled step:

        • step_state: new
      • Step 25: Unlabelled step:

        • step_state: new
      • Step 26: Unlabelled step:

        • step_state: new
    • Other invocation details
      • error_message

        • Failed to run workflow, invocation ended in [failed] state.
      • history_id

        • 2c06a24ce5f708d4
      • history_state

        • new
      • invocation_id

        • 2c06a24ce5f708d4
      • invocation_state

        • failed
      • messages

        • [{'dependent_workflow_step_id': 17, 'output_name': 'quality summary', 'reason': 'output_not_found', 'workflow_step_id': 19}, {'output_name': 'quality summary', 'reason': 'workflow_output_not_found', 'workflow_step_id': 17}, {'output_name': 'complete genomes', 'reason': 'workflow_output_not_found', 'workflow_step_id': 17}, {'output_name': 'output cluster', 'reason': 'workflow_output_not_found', 'workflow_step_id': 15}]
      • workflow_id

        • 2c06a24ce5f708d4

@SantaMcCloud

SantaMcCloud commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

not sure why this test again run out of runtime but checking the log it seems that it has a deadlock because early steps depending on later steps which isnt the case in the workflow. So since a early step depening on the later step this step getting delayed and all other aswell.

galaxy.workflow.run DEBUG 2026-04-28 01:28:49,979 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 21 outputs of invocation 1 delayed (tool [__FILTER_EMPTY_DATASETS__] inputs are not ready, this special tool requires inputs to be ready)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,980 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 22 outputs of invocation 1 delayed (dependent step [21] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,980 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 6 outputs of invocation 1 delayed (dependent step [22] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,980 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 23 outputs of invocation 1 delayed (dependent step [6] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,981 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 7 outputs of invocation 1 delayed (dependent step [23] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,981 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 8 outputs of invocation 1 delayed (dependent step [7] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,981 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 9 outputs of invocation 1 delayed (dependent step [8] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,984 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 10 outputs of invocation 1 delayed (dependent step [8] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,985 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 11 outputs of invocation 1 delayed (dependent step [9] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,985 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 12 outputs of invocation 1 delayed (dependent step [8] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,985 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 25 outputs of invocation 1 delayed (dependent step [9] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,985 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 13 outputs of invocation 1 delayed (dependent step [25] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,986 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 14 outputs of invocation 1 delayed (dependent step [25] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,986 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 26 outputs of invocation 1 delayed (dependent step [25] delayed, so this step must be delayed)
galaxy.workflow.run DEBUG 2026-04-28 01:28:49,987 [pN:main.1,p:19172,tN:WorkflowRequestMonitor.monitor_thread] Marking step 15 outputs of invocation 1 delayed (dependent step [14] delayed, so this step must be delayed)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new IWC microbiome workflow for detecting viral contigs from bulk metagenomic assemblies and running downstream quality, host prediction, annotation, coverage, and vMAG binning analyses.

Changes:

  • Adds a Galaxy workflow using geNomad, CheckV, MMseqs2, iPHoP, Pharokka, Bowtie2, CoverM, and vRhyme.
  • Adds Dockstore metadata and a Planemo test configuration with Zenodo-hosted inputs.
  • Adds README and changelog documentation for the new workflow.

Checklist issues were identified around annotation wording, test file naming, changelog date validity, exposed outputs, and workflow output spelling.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
workflows/microbiome/viral-detection-from-bulk-metagenomes/viral-detection-from-bulk-metagenomes.ga Defines the new Galaxy viral detection workflow and workflow outputs.
workflows/microbiome/viral-detection-from-bulk-metagenomes/viral-detection-from-bulk-metagenomes-test.yml Adds Planemo tests and expected output assertions.
workflows/microbiome/viral-detection-from-bulk-metagenomes/README.md Documents workflow purpose, inputs, steps, outputs, and license.
workflows/microbiome/viral-detection-from-bulk-metagenomes/CHANGELOG.md Adds initial release changelog entry.
workflows/microbiome/viral-detection-from-bulk-metagenomes/.dockstore.yml Adds Dockstore workflow metadata and test file reference.

publish: true
primaryDescriptorPath: /viral-detection-from-bulk-metagenomes.ga
testParameterFiles:
- /viral-detection-from-bulk-metagenomes-test.yml
@@ -0,0 +1,5 @@
# Changelog

## [1.0] - 2026-26-04
]
},
"19": {
"annotation": "Additional filtering of VirSorter contigs using complex custom rules. If all conditions are AND-conditions you may be able to set them on the tool directly.\nBy default this step imposes a stricter score threshold of 0.9 on sequences shorter than 3000 bp, instead of the standard threshold of 0.5.",
"post_job_actions": {
"RenameDatasetActionoutput": {
"action_arguments": {
"newname": "VirSorter filtered scores"
Comment on lines +1296 to +1297
"label": "best bins summery",
"output_name": "best_bins_summery",
asserts:
- that: has_n_lines
n: 0
best bins summery:
"type": "tool",
"uuid": "08681d8f-a9a8-4bfc-b1e6-3c6694907756",
"when": null,
"workflow_outputs": []
@@ -0,0 +1,1618 @@
{
"a_galaxy_workflow": "true",
"annotation": "This workflow identifies viral contigs from metagenomic assemblies using geNomad and supports taxonomy, functional annotation, binning, and host prediction.",

@bernt-matthias bernt-matthias left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you run bowtie on different fasta (filtered / unfiltered)? Or in other words: why does vRhyme get different input as CoverM?

Viral contigs from both geNomad runs are combined.

### 4. Clustering and dereplication
Redundant sequences are removed using **MMseqs2** clustering.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason not to use dRep?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- Bowtie2 is currently run twice as: once against the broader viral-contig set used for vRhyme because of binning purpose, and once against the filtered set used for CoverM with a stricter CheckV-filtered set, including only medium-quality, high-quality, and complete viral contigs. I will check whether the workflow can be simplified by mapping once against the broader dereplicated viral-contig set and then filtering the resulting coverage/abundance table to the final CheckV quality set, instead of running Bowtie2 twice. If this works cleanly in Galaxy, I will update the workflow accordingly.
2- We used MMseqs2 here because at this stage we are still working with viral candidate contigs before vRhyme binning, not final vMAGs. dRep can technically compare and dereplicate genome FASTA files, so it could be more suitable if we want to dereplicate the final vMAGs later. But for this contig-level step, MMseqs2 seemed more appropriate.

"workflow_outputs": []
},
"5": {
"annotation": "Additional filtering of geNomad contigs using complex custom rules. If all conditions are AND-conditions you may be able to set them on the tool directly.\nBy default this step does nothing beyond the conservative preset applied by geNomad.",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default this step does nothing beyond the conservative preset applied by geNomad.
->
By default this step does nothing, i.e. it copies the results from geNomad.

"owner": "bgruening",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"code\": \"/^>/ {print \\\">\\\" FILENAME \\\"_\\\" ++i} !/^>/ {print}\", \"infile\": {\"__class__\": \"ConnectedValue\"}, \"variables\": [], \"__page__\": 0, \"__rerun_remap_job_id__\": null}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand what this is doing. Are you replacing fasta headers using filenames (would be Galaxy's internal filenames)? This seems wrong to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this. I think the intention was to make the FASTA identifiers unique, but I agree the approach need to be fixed.

},
"3": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about the workflow flow this will construct a list:list, or? Is it fine to give partial input to genomad? Why split at all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, this step was added because running geNomad on the full assembly caused runtime/resource issues, so the idea was to process the FASTA in smaller chunks. But I agree that the collection structure needs to be checked, and I will review this part and revise it if needed.

"top": 50
},
"post_job_actions": {},
"tool_id": "toolshed.g2.bx.psu.edu/repos/ufz/genomad_end_to_end/genomad_end_to_end/1.11.1+galaxy0",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License agreement should be a workflow parameter

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the idea of the 2nd genomad round?

I think the README could need extension.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the MVP workflow logic. The second geNomad run is used after CheckV separates proviral sequences, and the results are then combined with the other viral predictions. I will revise the README.

},
"8": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/iuc/qiime_filter_fasta/qiime_filter_fasta/1.9.1.0",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the qiime (v1) tools are a good choice .. since they are EOL? Is there a replacement?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used for FASTA filtering by sequence ID, not for QIIME-specific analysis. I will replace it.

@SantaMcCloud

Copy link
Copy Markdown
Contributor Author

I didnt done the workflow i only should add it to IWC. I did messga the creator and she will answer the question about the tools choices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants