Skip to content

Using the docker image locally without a cloud account. #29

@AlohaPropolis

Description

@AlohaPropolis

I can use the image to create the relative directories and download the example database but I can not download the query db.
$ docker run --rm ncbi/blast efetch -db protein -format fasta
-id P01349 > queries/P01349.fsa
bash: queries/P01349.fsa: Permission denied
If the problem is related to not having a cloud account, then my question is why must I have a paid account in order to use the ncbi/blast docker image?
Thanks

However the following works the way I like!

docker ps

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6437c2fc9c4f 4afb1f585a96 "bash" 4 minutes ago Up 4 minutes friendly_swartz

docker exec -it friendly_swartz bash
root@6437c2fc9c4f:/blast# ls
bin blastdb blastdb_custom lib
root@6437c2fc9c4f:/blast# update_blastdb.pl --showall pretty --source gcp

Connected to GCP
BLASTDB DESCRIPTION SIZE (GB) LAST_UPDATED
swissprot Non-redundant UniProtKB/SwissProt sequences 0.3573 2023-04-29
nr All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 364.0284 2023-04-27
refseq_protein NCBI Protein Reference Sequences 144.4754 2023-05-05
landmark Landmark database for SmartBLAST 0.3817 2023-04-25
pdbaa PDB protein database 0.1951 2023-04-29
nt Nucleotide collection (nt) 303.7546 2023-04-30
pdbnt PDB nucleotide database 0.0143 2023-04-23
patnt Nucleotide sequences derived from the Patent division of GenBank 15.7333 2023-04-28
refseq_rna NCBI Transcript Reference Sequences 46.6038 2023-05-01
ref_prok_rep_genomes Refseq prokaryote representative genomes (contains refseq assembly) 19.6809 2023-04-29
ref_viruses_rep_genomes Refseq viruses representative genomes 0.1320 2023-04-29
ref_viroids_rep_genomes Refseq viroids representative genomes 0.0001 2022-06-25
ref_euk_rep_genomes RefSeq Eukaryotic Representative Genome Database 350.4509 2023-04-13
split-cdd CDD split into 32 volumes 4.5709 2022-12-18
cdd CDD.v3.20 3.7088 2022-09-21
GCF_000001405.39_top_level Homo sapiens GRCh38.p13 [GCF_000001405.39] chromosomes plus unplaced and unlocalized scaffolds 1.1572 2021-06-02
GCF_000001635.27_top_level Mus musculus GRCm39 [GCF_000001635.27] chromosomes plus unplaced and unlocalized scaffolds 3.6543 2021-06-02
16S_ribosomal_RNA 16S ribosomal RNA (Bacteria and Archaea type strains) 0.0179 2023-04-15
18S_fungal_sequences 18S ribosomal RNA sequences (SSU) from Fungi type and reference material 0.0023 2023-05-04
28S_fungal_sequences 28S ribosomal RNA sequences (LSU) from Fungi type and reference material 0.0053 2023-05-04
ITS_RefSeq_Fungi Internal transcribed spacer region (ITS) from Fungi type and reference material 0.0067 2022-10-28
ITS_eukaryote_sequences ITS eukaryote BLAST 0.0331 2023-05-01
env_nt environmental samples 48.8039 2023-04-05
Betacoronavirus Betacoronavirus 54.0961 2023-05-06
pataa Protein sequences derived from the Patent division of GenBank 1.8011 2023-04-30
refseq_select_prot RefSeq Select proteins 34.3461 2023-04-30
refseq_select_rna RefSeq Select RNA sequences 0.0656 2023-04-30
env_nr Proteins from WGS metagenomic projects (env_nr). 3.9459 2023-04-30
LSU_eukaryote_rRNA Large subunit ribosomal nucleic acid for Eukaryotes 0.0053 2022-12-05
LSU_prokaryote_rRNA Large subunit ribosomal nucleic acid for Prokaryotes 0.0041 2022-12-05
SSU_eukaryote_rRNA Small subunit ribosomal nucleic acid for Eukaryotes 0.0063 2022-12-05
mito NCBI Genomic Mitochondrial Reference Sequences 0.1252 2023-04-20
tsa_nr Transcriptome Shotgun Assembly (TSA) sequences 5.1253 2023-04-30
tsa_nt Transcriptome Shotgun Assembly (TSA) sequences 6.3491 2023-04-27
nt_euk Eukaryota nt 197.6799 2023-04-26
nt_prok Prokaryota (bacteria and archaea) nt 51.1781 2023-05-01
nt_viruses Viruses nt 51.2685 2023-05-01
nt_others Artificial and other seqs nt 0.7473 2023-05-01
taxdb Taxonomy database 0.1670 2021-06-07

root@6437c2fc9c4f:/blast# efetch -db protein -format fasta
-id P01349 > queries/P01349.fsa

root@6437c2fc9c4f:/blast# ls
bin blastdb blastdb_custom fasta lib queries results

root@6437c2fc9c4f:/blast# cd queries

root@6437c2fc9c4f:/blast/queries# ls
P01349.fsa

root@6437c2fc9c4f:/blast/queries# cd ..

root@6437c2fc9c4f:/blast# efetch -db protein -format fasta
-id Q90523,P80049,P83981,P83982,P83983,P83977,P83984,P83985,P27950
> fasta/nurse-shark-proteins.fsa

root@6437c2fc9c4f:/blast# ls
bin blastdb blastdb_custom fasta lib queries results

root@6437c2fc9c4f:/blast# makeblastdb -in /blast/fasta/nurse-shark-proteins.fsa -dbtype prot
-parse_seqids -out nurse-shark-proteins -title "Nurse shark proteins"
-taxid 7801 -blastdb_version 5

Building a new DB, current time: 05/07/2023 05:51:28
New DB name: /blast/nurse-shark-proteins
New DB title: Nurse shark proteins
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 7 sequences in 0.000834227 seconds.

root@6437c2fc9c4f:/blast# blastdbcmd -entry all -db nurse-shark-proteins -outfmt "%a %l %T"
Q90523.1 106 7801
P80049.1 132 7801
P83981.1 53 7801
P83977.1 95 7801
P83984.1 190 7801
P83985.1 195 7801
P27950.1 151 7801

root@6437c2fc9c4f:/blast# blastdbcmd -list /blast/blastdb -remove_redundant_dbs ##### this does not work

root@6437c2fc9c4f:/blast# blastp -query /blast/queries/P01349.fsa -db nurse-shark-proteins
-out /blast/results/blastp.out

root@6437c2fc9c4f:/blast# cd results

root@6437c2fc9c4f:/blast/results# ls
blastp.out

root@6437c2fc9c4f:/blast/results# cat blastp.out
BLASTP 2.14.0+

Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.

Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.

Database: Nurse shark proteins
7 sequences; 922 total letters

Query= sp|P01349.2|RELX_CARTA RecName: Full=Relaxin; Contains: RecName:
Full=Relaxin B chain; Contains: RecName: Full=Relaxin A chain

Length=44
Score E
Sequences producing significant alignments: (Bits) Value

P80049.1 RecName: Full=Fatty acid-binding protein, liver; AltName... 14.2 0.96

P80049.1 RecName: Full=Fatty acid-binding protein, liver; AltName: Full=Liver-type
fatty acid-binding protein; Short=L-FABP
Length=132

Score = 14.2 bits (25), Expect = 0.96, Method: Compositional matrix adjust.
Identities = 3/9 (33%), Positives = 6/9 (67%), Gaps = 0/9 (0%)

Query 2 LCGRGFIRA 10
+C R ++R
Sbjct 123 VCTREYVRE 131

Lambda K H a alpha
0.334 0.143 0.520 0.792 4.96

Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6

Effective search space used: 22680

Database: Nurse shark proteins
Posted date: May 7, 2023 5:51 AM
Number of letters in database: 922
Number of sequences in database: 7

Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Neighboring words threshold: 11
Window for multiple hits: 40

root@6437c2fc9c4f:/blast/results#

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions