-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Labels
PythonBug or fix related to the Python scripts.Bug or fix related to the Python scripts.bugenhancementhelp wanted
Description
I came across this error while handling an updated Refseq.fa file.
356M lines processed so far in 2344.0744104385376 seconds.
357M lines processed so far in 2350.2084290981293 seconds.
Traceback (most recent call last):
File "/data2/Alex_MulengaLab/tools/samsa2/python_scripts/DIAMOND_analysis_counter.py", line 185, in <module>
db_org_dictionary[db_id] = db_org
~~~~~~~~~~~~~~~~~^^^^^^^
MemoryError
I made some changes in the python script.
Key Fixes Made:
- Primary Key Constraint: The org column in the RefSeqHits table is now defined as a primary key. This allows the ON CONFLICT(org) clause to work correctly.
- Track Line Number: Added a line number counter to handle the progress messages correctly.
- Summarization and Output Corrections: The top entries output and the percentage calculation have been adjusted accordingly; now using the correct line counter.
Modified script is attached as a text file, Just rename it to 'DIAMOND_analysis_counter.py'
modified_DIAMOND_analysis_counter.txt
377M lines processed so far in 3064.65 seconds.
Success!
Time elapsed: 3072.18 seconds.
Number of lines: 377783847
Dictionary database assembled.
Top 10 matches:
8 XP_068242559.1
5 WP_218412044.1
4 WP_337427971.1
4 WP_337429722.1
4 WP_337566774.1
4 WP_337580821.1
3 WP_006849145.1
3 WP_022044852.1
3 WP_022252402.1
3 WP_089543758.1
Annotations saved to file:
'/data2/Alex_MulengaLab/tools/samsa2/output_test/step_4_output/control_1_TINY.merged.RefSeq_annot_organism.tsv'.
Metadata
Metadata
Assignees
Labels
PythonBug or fix related to the Python scripts.Bug or fix related to the Python scripts.bugenhancementhelp wanted