-
Notifications
You must be signed in to change notification settings - Fork 33
Description
I can use the image to create the relative directories and download the example database but I can not download the query db.
$ docker run --rm ncbi/blast efetch -db protein -format fasta
-id P01349 > queries/P01349.fsa
bash: queries/P01349.fsa: Permission denied
If the problem is related to not having a cloud account, then my question is why must I have a paid account in order to use the ncbi/blast docker image?
Thanks
However the following works the way I like!
docker ps
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6437c2fc9c4f 4afb1f585a96 "bash" 4 minutes ago Up 4 minutes friendly_swartz
docker exec -it friendly_swartz bash
root@6437c2fc9c4f:/blast# ls
bin blastdb blastdb_custom lib
root@6437c2fc9c4f:/blast# update_blastdb.pl --showall pretty --source gcp
Connected to GCP
BLASTDB DESCRIPTION SIZE (GB) LAST_UPDATED
swissprot Non-redundant UniProtKB/SwissProt sequences 0.3573 2023-04-29
nr All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 364.0284 2023-04-27
refseq_protein NCBI Protein Reference Sequences 144.4754 2023-05-05
landmark Landmark database for SmartBLAST 0.3817 2023-04-25
pdbaa PDB protein database 0.1951 2023-04-29
nt Nucleotide collection (nt) 303.7546 2023-04-30
pdbnt PDB nucleotide database 0.0143 2023-04-23
patnt Nucleotide sequences derived from the Patent division of GenBank 15.7333 2023-04-28
refseq_rna NCBI Transcript Reference Sequences 46.6038 2023-05-01
ref_prok_rep_genomes Refseq prokaryote representative genomes (contains refseq assembly) 19.6809 2023-04-29
ref_viruses_rep_genomes Refseq viruses representative genomes 0.1320 2023-04-29
ref_viroids_rep_genomes Refseq viroids representative genomes 0.0001 2022-06-25
ref_euk_rep_genomes RefSeq Eukaryotic Representative Genome Database 350.4509 2023-04-13
split-cdd CDD split into 32 volumes 4.5709 2022-12-18
cdd CDD.v3.20 3.7088 2022-09-21
GCF_000001405.39_top_level Homo sapiens GRCh38.p13 [GCF_000001405.39] chromosomes plus unplaced and unlocalized scaffolds 1.1572 2021-06-02
GCF_000001635.27_top_level Mus musculus GRCm39 [GCF_000001635.27] chromosomes plus unplaced and unlocalized scaffolds 3.6543 2021-06-02
16S_ribosomal_RNA 16S ribosomal RNA (Bacteria and Archaea type strains) 0.0179 2023-04-15
18S_fungal_sequences 18S ribosomal RNA sequences (SSU) from Fungi type and reference material 0.0023 2023-05-04
28S_fungal_sequences 28S ribosomal RNA sequences (LSU) from Fungi type and reference material 0.0053 2023-05-04
ITS_RefSeq_Fungi Internal transcribed spacer region (ITS) from Fungi type and reference material 0.0067 2022-10-28
ITS_eukaryote_sequences ITS eukaryote BLAST 0.0331 2023-05-01
env_nt environmental samples 48.8039 2023-04-05
Betacoronavirus Betacoronavirus 54.0961 2023-05-06
pataa Protein sequences derived from the Patent division of GenBank 1.8011 2023-04-30
refseq_select_prot RefSeq Select proteins 34.3461 2023-04-30
refseq_select_rna RefSeq Select RNA sequences 0.0656 2023-04-30
env_nr Proteins from WGS metagenomic projects (env_nr). 3.9459 2023-04-30
LSU_eukaryote_rRNA Large subunit ribosomal nucleic acid for Eukaryotes 0.0053 2022-12-05
LSU_prokaryote_rRNA Large subunit ribosomal nucleic acid for Prokaryotes 0.0041 2022-12-05
SSU_eukaryote_rRNA Small subunit ribosomal nucleic acid for Eukaryotes 0.0063 2022-12-05
mito NCBI Genomic Mitochondrial Reference Sequences 0.1252 2023-04-20
tsa_nr Transcriptome Shotgun Assembly (TSA) sequences 5.1253 2023-04-30
tsa_nt Transcriptome Shotgun Assembly (TSA) sequences 6.3491 2023-04-27
nt_euk Eukaryota nt 197.6799 2023-04-26
nt_prok Prokaryota (bacteria and archaea) nt 51.1781 2023-05-01
nt_viruses Viruses nt 51.2685 2023-05-01
nt_others Artificial and other seqs nt 0.7473 2023-05-01
taxdb Taxonomy database 0.1670 2021-06-07
root@6437c2fc9c4f:/blast# efetch -db protein -format fasta
-id P01349 > queries/P01349.fsa
root@6437c2fc9c4f:/blast# ls
bin blastdb blastdb_custom fasta lib queries results
root@6437c2fc9c4f:/blast# cd queries
root@6437c2fc9c4f:/blast/queries# ls
P01349.fsa
root@6437c2fc9c4f:/blast/queries# cd ..
root@6437c2fc9c4f:/blast# efetch -db protein -format fasta
-id Q90523,P80049,P83981,P83982,P83983,P83977,P83984,P83985,P27950
> fasta/nurse-shark-proteins.fsa
root@6437c2fc9c4f:/blast# ls
bin blastdb blastdb_custom fasta lib queries results
root@6437c2fc9c4f:/blast# makeblastdb -in /blast/fasta/nurse-shark-proteins.fsa -dbtype prot
-parse_seqids -out nurse-shark-proteins -title "Nurse shark proteins"
-taxid 7801 -blastdb_version 5
Building a new DB, current time: 05/07/2023 05:51:28
New DB name: /blast/nurse-shark-proteins
New DB title: Nurse shark proteins
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 7 sequences in 0.000834227 seconds.
root@6437c2fc9c4f:/blast# blastdbcmd -entry all -db nurse-shark-proteins -outfmt "%a %l %T"
Q90523.1 106 7801
P80049.1 132 7801
P83981.1 53 7801
P83977.1 95 7801
P83984.1 190 7801
P83985.1 195 7801
P27950.1 151 7801
root@6437c2fc9c4f:/blast# blastdbcmd -list /blast/blastdb -remove_redundant_dbs ##### this does not work
root@6437c2fc9c4f:/blast# blastp -query /blast/queries/P01349.fsa -db nurse-shark-proteins
-out /blast/results/blastp.out
root@6437c2fc9c4f:/blast# cd results
root@6437c2fc9c4f:/blast/results# ls
blastp.out
root@6437c2fc9c4f:/blast/results# cat blastp.out
BLASTP 2.14.0+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.
Database: Nurse shark proteins
7 sequences; 922 total letters
Query= sp|P01349.2|RELX_CARTA RecName: Full=Relaxin; Contains: RecName:
Full=Relaxin B chain; Contains: RecName: Full=Relaxin A chain
Length=44
Score E
Sequences producing significant alignments: (Bits) Value
P80049.1 RecName: Full=Fatty acid-binding protein, liver; AltName... 14.2 0.96
P80049.1 RecName: Full=Fatty acid-binding protein, liver; AltName: Full=Liver-type
fatty acid-binding protein; Short=L-FABP
Length=132
Score = 14.2 bits (25), Expect = 0.96, Method: Compositional matrix adjust.
Identities = 3/9 (33%), Positives = 6/9 (67%), Gaps = 0/9 (0%)
Query 2 LCGRGFIRA 10
+C R ++R
Sbjct 123 VCTREYVRE 131
Lambda K H a alpha
0.334 0.143 0.520 0.792 4.96
Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6
Effective search space used: 22680
Database: Nurse shark proteins
Posted date: May 7, 2023 5:51 AM
Number of letters in database: 922
Number of sequences in database: 7
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Neighboring words threshold: 11
Window for multiple hits: 40
root@6437c2fc9c4f:/blast/results#