-
Notifications
You must be signed in to change notification settings - Fork 23
BUG when using the custom MAGs #64
Description
Dear Developer,
Thank you for your work. It's a very nice pipeline for identifying the virome!
When I run ViWrap by using custom MAGs, it can work well for almost all of the samples. But for some specific samples, the assembled MAGs do not have any archaea genomes. As a result, in step 7 (07_iPHoP_outdir), gtdbtk identified these MAGs normally to generate files about bac120, but didn't generate files about ar53.
As shown in 07_iPHoP_outdir/custom_MAGs_GTDB-tk_results/gtdbtk.log:
[2025-08-13 17:00:12] INFO: GTDB-Tk v2.3.2
[2025-08-13 17:00:12] INFO: gtdbtk de_novo_wf --genome_dir /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_filtered_dir --bacteria --outgroup_taxon p__Patescibacteria --out_dir /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_GTDB-tk_results --cpus 16 --force --extension fasta
[2025-08-13 17:00:12] INFO: Using GTDB-Tk reference data version r214: /data/home/xinyuguo/database/ViWrap_db/GTDB_db/GTDB_db
[2025-08-13 17:00:12] INFO: Identifying markers in 576 genomes with 16 threads.
[2025-08-13 17:00:12] TASK: Running Prodigal V2.6.3 to identify genes.
[2025-08-13 17:04:25] INFO: Completed 576 genomes in 4.21 minutes (136.71 genomes/minute).
[2025-08-13 17:04:25] TASK: Identifying TIGRFAM protein families.
[2025-08-13 17:08:06] INFO: Completed 576 genomes in 3.68 minutes (156.39 genomes/minute).
[2025-08-13 17:08:06] TASK: Identifying Pfam protein families.
[2025-08-13 17:08:21] INFO: Completed 576 genomes in 14.56 seconds (39.57 genomes/second).
[2025-08-13 17:08:21] INFO: Annotations done using HMMER 3.4 (Aug 2023).
[2025-08-13 17:08:21] TASK: Summarising identified marker genes.
[2025-08-13 17:08:29] INFO: Completed 576 genomes in 7.73 seconds (74.52 genomes/second).
[2025-08-13 17:08:29] INFO: Done.
[2025-08-13 17:08:29] INFO: Aligning markers in 576 genomes with 16 CPUs.
[2025-08-13 17:08:29] INFO: Processing 576 genomes identified as bacterial.
[2025-08-13 17:08:35] INFO: Read concatenated alignment for 80,789 GTDB genomes.
[2025-08-13 17:08:35] TASK: Generating concatenated alignment for each marker.
[2025-08-13 17:08:38] INFO: Completed 576 genomes in 0.58 seconds (998.72 genomes/second).
[2025-08-13 17:08:38] TASK: Aligning 120 identified markers using hmmalign 3.4 (Aug 2023).
[2025-08-13 17:09:13] INFO: Completed 120 markers in 32.50 seconds (3.69 markers/second).
[2025-08-13 17:09:14] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
[2025-08-13 17:10:57] INFO: Completed 81,365 sequences in 1.72 minutes (47,231.97 sequences/minute).
[2025-08-13 17:10:57] INFO: Masked bacterial alignment from 41,084 to 5,035 AAs.
[2025-08-13 17:10:57] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
[2025-08-13 17:10:57] INFO: Creating concatenated alignment for 81,365 bacterial GTDB and user genomes.
[2025-08-13 17:11:23] INFO: Creating concatenated alignment for 576 bacterial user genomes.
[2025-08-13 17:11:24] INFO: Done.
[2025-08-13 17:11:24] INFO: Inferring FastTree (WAG, SH support values) using a maximum of 16 CPUs.
[2025-08-14 05:18:23] INFO: FastTree version: precision
[2025-08-14 05:18:23] INFO: Done.
[2025-08-14 05:18:23] INFO: Reading GTDB taxonomy for representative genomes.
[2025-08-14 05:18:23] INFO: Read taxonomy for 85,205 genomes.
[2025-08-14 05:18:23] INFO: Identifying genomes from the specified outgroup: p__Patescibacteria
[2025-08-14 05:18:30] INFO: Identified 3,374 outgroup taxa in the tree.
[2025-08-14 05:18:30] INFO: Identified 77,991 ingroup taxa in the tree.
[2025-08-14 05:18:36] INFO: Outgroup is monophyletic.
[2025-08-14 05:18:36] INFO: Rerooting tree.
[2025-08-14 05:18:37] INFO: Rerooted tree written to: /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_GTDB-tk_results/infer/intermediate_results/gtdbtk.bac120.rooted.tree
[2025-08-14 05:18:37] INFO: Done.
[2025-08-14 05:18:37] INFO: Reading GTDB taxonomy for representative genomes.
[2025-08-14 05:18:37] INFO: Read taxonomy for 85,205 genomes.
[2025-08-14 05:18:37] INFO: Reading tree.
[2025-08-14 05:18:45] INFO: Removing any previous internal node labels.
[2025-08-14 05:18:45] INFO: Calculating F-measure statistic for each taxa.
[2025-08-14 05:18:47] INFO: Calculating taxa within each lineage.
[2025-08-14 05:19:04] INFO: Processing 1 taxa at Domain rank.
[2025-08-14 05:19:04] INFO: Processing 181 taxa at Phylum rank.
[2025-08-14 05:23:56] INFO: Processing 490 taxa at Class rank.
[2025-08-14 05:24:30] INFO: Processing 1,653 taxa at Order rank.
[2025-08-14 05:26:03] INFO: Processing 4,305 taxa at Family rank.
[2025-08-14 05:26:27] INFO: Processing 19,153 taxa at Genus rank.
[2025-08-14 05:28:09] INFO: Processing 80,789 taxa at Species rank.
[2025-08-14 05:30:07] WARNING: There are 236 taxa with multiple placements of equal quality.
[2025-08-14 05:30:07] WARNING: These were resolved by placing the label at the most terminal position.
[2025-08-14 05:30:07] WARNING: Ideally, taxonomic assignment of all genomes should be established before tree decoration.
[2025-08-14 05:30:07] INFO: Placing labels on tree.
[2025-08-14 05:30:07] INFO: Writing out statistics for taxa.
[2025-08-14 05:30:08] INFO: Writing out inferred taxonomy for each genome.
[2025-08-14 05:30:11] INFO: Writing out decorated tree.
[2025-08-14 05:30:12] INFO: Done.
[2025-08-14 05:30:12] INFO: Done.
[2025-08-14 05:30:12] INFO: Removing intermediate files.
[2025-08-14 05:30:12] INFO: Intermediate files removed.
[2025-08-14 05:30:12] INFO: Done.
[2025-08-14 05:30:17] INFO: GTDB-Tk v2.3.2
[2025-08-14 05:30:17] INFO: gtdbtk de_novo_wf --genome_dir /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_filtered_dir --archaea --outgroup_taxon p__Altiarchaeota --out_dir /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_GTDB-tk_results --cpus 16 --force --extension fasta
[2025-08-14 05:30:17] INFO: Using GTDB-Tk reference data version r214: /data/home/xinyuguo/database/ViWrap_db/GTDB_db/GTDB_db
[2025-08-14 05:30:17] INFO: Identifying markers in 576 genomes with 16 threads.
[2025-08-14 05:30:17] TASK: Running Prodigal V2.6.3 to identify genes.
[2025-08-14 05:34:41] INFO: Completed 576 genomes in 4.40 minutes (130.80 genomes/minute).
[2025-08-14 05:34:41] TASK: Identifying TIGRFAM protein families.
[2025-08-14 05:38:30] INFO: Completed 576 genomes in 3.81 minutes (151.37 genomes/minute).
[2025-08-14 05:38:30] TASK: Identifying Pfam protein families.
[2025-08-14 05:38:44] INFO: Completed 576 genomes in 14.44 seconds (39.88 genomes/second).
[2025-08-14 05:38:44] INFO: Annotations done using HMMER 3.4 (Aug 2023).
[2025-08-14 05:38:44] TASK: Summarising identified marker genes.
[2025-08-14 05:38:52] INFO: Completed 576 genomes in 7.57 seconds (76.06 genomes/second).
[2025-08-14 05:38:52] INFO: Done.
[2025-08-14 05:38:52] INFO: Aligning markers in 576 genomes with 16 CPUs.
[2025-08-14 05:38:52] INFO: Processing 576 genomes identified as bacterial.
[2025-08-14 05:38:58] INFO: Read concatenated alignment for 80,789 GTDB genomes.
[2025-08-14 05:38:58] TASK: Generating concatenated alignment for each marker.
[2025-08-14 05:39:01] INFO: Completed 576 genomes in 0.57 seconds (1,005.07 genomes/second).
[2025-08-14 05:39:02] TASK: Aligning 120 identified markers using hmmalign 3.4 (Aug 2023).
[2025-08-14 05:39:35] INFO: Completed 120 markers in 28.76 seconds (4.17 markers/second).
[2025-08-14 05:39:35] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.
[2025-08-14 05:41:15] INFO: Completed 81,365 sequences in 1.66 minutes (49,093.43 sequences/minute).
[2025-08-14 05:41:15] INFO: Masked bacterial alignment from 41,084 to 5,035 AAs.
[2025-08-14 05:41:15] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
[2025-08-14 05:41:15] INFO: Creating concatenated alignment for 81,365 bacterial GTDB and user genomes.
[2025-08-14 05:41:40] INFO: Creating concatenated alignment for 576 bacterial user genomes.
[2025-08-14 05:41:41] INFO: Done.
[2025-08-14 05:41:41] ERROR: Input file does not exist: /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_GTDB-tk_results/align/gtdbtk.ar53.msa.fasta.gz
[2025-08-14 05:41:41] ERROR: Controlled exit resulting from an unrecoverable error or warning.
================================================================================
EXCEPTION: BioLibFileNotFound
MESSAGE: Input file does not exist: /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_GTDB-tk_results/align/gtdbtk.ar53.msa.fasta.gz
Traceback (most recent call last):
File "/data/home/xinyuguo/software/ViWrap_env/ViWrap-GTDBTk/lib/python3.8/site-packages/gtdbtk/main.py", line 102, in main
gt_parser.parse_options(args)
File "/data/home/xinyuguo/software/ViWrap_env/ViWrap-GTDBTk/lib/python3.8/site-packages/gtdbtk/main.py", line 1052, in parse_options
self.infer(options)
File "/data/home/xinyuguo/software/ViWrap_env/ViWrap-GTDBTk/lib/python3.8/site-packages/gtdbtk/main.py", line 413, in infer
check_file_exists(options.msa_file)
File "/data/home/xinyuguo/software/ViWrap_env/ViWrap-GTDBTk/lib/python3.8/site-packages/gtdbtk/biolib_lite/common.py", line 96, in check_file_exists
raise BioLibFileNotFound('Input file does not exist: ' + input_file)
gtdbtk.biolib_lite.exceptions.BioLibFileNotFound: Input file does not exist: /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/custom_MAGs_GTDB-tk_results/align/gtdbtk.ar53.msa.fasta.gz
And then, ViWrap main log shows:
[BUILD START] RW_01_Day_0_P1 2025年 08月 13日 星期三 10:32:31 CST
Welcome to ViWrap
The issued command is:
/data/home/xinyuguo/software/ViWrap-1.3.1/ViWrap run --input_metagenome /data/home/xinyuguo/data/human_metagenome_diet/s2_assembly/summary/RW_01_Day_0_P1.contigs.fasta --input_reads /data/home/xinyuguo/data/human_metagenome_diet/s1_qc/clean_seq/RW_01_Day_0_P1_clean_1.fastq.gz,/data/home/xinyuguo/data/human_metagenome_diet/s1_qc/clean_seq/RW_01_Day_0_P1_clean_2.fastq.gz --out_dir /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap --db_dir /data/home/xinyuguo/database/ViWrap_db --identify_method vb-vs --conda_env_dir /data/home/xinyuguo/software/ViWrap_env --threads 16 --input_length_limit 5000 --custom_MAGs_dir /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/mags --iPHoP_db_custom /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/iPHoP_db_custom
[2025-08-13 10:32:32] | Pre-check inputings. In processing...
[2025-08-13 10:32:33] | Looks like the input metagenome and reads, database, and custom MAGs dir (if option used) are now set up well, start up to run ViWrap pipeline
[2025-08-13 10:32:33] | Run VIBRANT-VirSorter2 method. Run VIBRANT to identify and annotate viruses from input metagenome. In processing...
[2025-08-13 10:59:18] | Run VIBRANT-VirSorter2 method. Run VIBRANT to identify and annotate viruses from input metagenome. Finished
[2025-08-13 10:59:18] | Run VIBRANT-VirSorter2 method. Run VirSorter2 to identify viruses from input metagenome. Also plus CheckV to QC and trim, and KEGG, Pfam, and VOG HMMs to annotate viruses. In processing...
[2025-08-13 13:06:03] | Run VIBRANT-VirSorter2 method. Run VirSorter2 to identify viruses from input metagenome. Finished
[2025-08-13 13:07:38] | Run VIBRANT-VirSorter2 method. Run CheckV to QC and trim viruses identified from VirSorter2. Finished
[2025-08-13 13:08:35] | Run VIBRANT-VirSorter2 method. Run VIBRANT to check "keep2" and "manual_check" groups and get the final VirSorter2 virus sequences. Finished
[2025-08-13 13:08:35] | Map reads to metagenome. In processing...
Renaming /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.3.bt2.tmp to /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.3.bt2
Renaming /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.4.bt2.tmp to /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.4.bt2
Renaming /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.1.bt2.tmp to /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.1.bt2
Renaming /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.2.bt2.tmp to /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.2.bt2
Renaming /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.rev.1.bt2.tmp to /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.rev.1.bt2
Renaming /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.rev.2.bt2.tmp to /data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/01_Mapping_result_outdir/RW_01_Day_0_P1.contigs.bowtie2_idx.rev.2.bt2
[2025-08-13T05:36:59Z INFO coverm] CoverM version 0.6.1
[2025-08-13T05:36:59Z INFO coverm] Using min-read-percent-identity 97%
[2025-08-13T05:37:58Z INFO coverm] CoverM version 0.6.1
[2025-08-13T05:37:58Z INFO coverm] Setting single read percent identity threshold at 0.97 for MetaBAT adjusted coverage, and not filtering out supplementary, secondary and improper pair alignments
[2025-08-13T05:37:58Z INFO coverm] Using min-covered-fraction 0%
[2025-08-13T05:38:14Z INFO coverm::contig] In sample 'RW_01_Day_0_P1_clean.filtered', found 52949609 reads mapped out of 52950474 total (100.00%)
[2025-08-13 13:38:15] | Map reads to metagenome. Finished
[2025-08-13 13:38:15] | Run vRhyme to bin viral scaffolds. In processing...
[2025-08-13 13:41:39] | Run vRhyme to bin viral scaffolds. Finished
[2025-08-13 13:41:39] | Run vContact2 to cluster viral genomes. In processing...
[2025-08-13 14:23:10] | Run vContact2 to cluster viral genomes. Finished
[2025-08-13 14:23:10] | Run CheckV to evaluate virus genome quality. In processing...
[2025-08-13 14:39:24] | Run CheckV to evaluate virus genome quality. Finished
[2025-08-13 14:39:24] | Run dRep to cluster virus species. In processing...
[2025-08-13 14:39:35] | Run dRep to cluster virus species. Finished
[2025-08-13 14:39:35] | Conduct taxonomic charaterization. In processing...
[2025-08-13 14:49:24] | Conduct taxonomic charaterization. Finished
[2025-08-13 14:49:24] | Conduct Host prediction by iPHoP. In processing...
[2025-08-13 15:55:27] | Conduct Host prediction by iPHoP. Finished
[2025-08-13 15:55:27] | Conduct Host prediction by iPHoP using custom MAGs. In processing...
[2025-08-14 05:41:51] | Conduct Host prediction by iPHoP using custom MAGs. Finished
[2025-08-14 05:41:52] | Get virus genome abundance. Finished
Traceback (most recent call last):
File "/data/home/xinyuguo/software/ViWrap-1.3.1/ViWrap", line 173, in
output = cli()
File "/data/home/xinyuguo/software/ViWrap-1.3.1/ViWrap", line 167, in cli
args"func"
File "/data/home/xinyuguo/software/ViWrap-1.3.1/scripts/master_run.py", line 635, in main
scripts.module.combine_iphop_results(args, combined_host_pred_to_genome_result, combined_host_pred_to_genus_result)
File "/data/home/xinyuguo/software/ViWrap-1.3.1/scripts/module.py", line 1882, in combine_iphop_results
with open(host_pred_to_genome_m90_custom, 'r') as lines:
FileNotFoundError: [Errno 2] No such file or directory: '/data/home/xinyuguo/data/human_metagenome_diet/s4_virome/PBW_Day_0_P1/RW_01_Day_0_P1_ViWrap/07_iPHoP_outdir/iPHoP_outdir_custom_MAGs/Host_prediction_to_genome_m90.csv'
Could you please tell me how to solve this problem?
Thank you!