Skip to content

Memory error: DIAMOND_analysis_counter.py #87

@AlexGaithuma

Description

@AlexGaithuma

I came across this error while handling an updated Refseq.fa file.

356M lines processed so far in 2344.0744104385376 seconds.
357M lines processed so far in 2350.2084290981293 seconds.
Traceback (most recent call last):
  File "/data2/Alex_MulengaLab/tools/samsa2/python_scripts/DIAMOND_analysis_counter.py", line 185, in <module>
    db_org_dictionary[db_id] = db_org
    ~~~~~~~~~~~~~~~~~^^^^^^^
MemoryError

I made some changes in the python script.
Key Fixes Made:

  1. Primary Key Constraint: The org column in the RefSeqHits table is now defined as a primary key. This allows the ON CONFLICT(org) clause to work correctly.
  2. Track Line Number: Added a line number counter to handle the progress messages correctly.
  3. Summarization and Output Corrections: The top entries output and the percentage calculation have been adjusted accordingly; now using the correct line counter.

Modified script is attached as a text file, Just rename it to 'DIAMOND_analysis_counter.py'

modified_DIAMOND_analysis_counter.txt

377M lines processed so far in 3064.65 seconds.
Success!
Time elapsed: 3072.18 seconds.
Number of lines: 377783847
Dictionary database assembled.
Top 10 matches:
8       XP_068242559.1
5       WP_218412044.1
4       WP_337427971.1
4       WP_337429722.1
4       WP_337566774.1
4       WP_337580821.1
3       WP_006849145.1
3       WP_022044852.1
3       WP_022252402.1
3       WP_089543758.1
Annotations saved to file: 
'/data2/Alex_MulengaLab/tools/samsa2/output_test/step_4_output/control_1_TINY.merged.RefSeq_annot_organism.tsv'.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions