Skip to content

Migrate Python worker containers to Python 3.11#59

Open
alptaciroglu wants to merge 3 commits into
vufuma:masterfrom
alptaciroglu:python3-migration
Open

Migrate Python worker containers to Python 3.11#59
alptaciroglu wants to merge 3 commits into
vufuma:masterfrom
alptaciroglu:python3-migration

Conversation

@alptaciroglu
Copy link
Copy Markdown

Summary

  • All 14 Python worker containers now use FROM python:3.11 (8 migrated from Python 2, 6 pinned from python:3)
  • Full compatibility fixes for pandas 2.0+ (delim_whitespacesep=r'\s+', DataFrame.appendpd.concat)
  • Full compatibility fixes for NumPy 2.0 (np.in1dnp.isin across 10 files, 14 occurrences)
  • Fix for htslib 1.21 stricter validation (tabix -p vcftabix -s 1 -b 2 -e 2 in allSNPs.py)
  • Fix for Python 3 subprocess returning bytes (subprocess.check_output().decode() in getLD.py)
  • Fix pandas dtype strictness (astype(int)astype(int).astype(str) in allSNPs.py)

Files changed (40 files, all in scripts/)

Dockerfiles (14): All set to FROM python:3.11

  • 8 migrated from Python 2: getld, geteqtl, getci, snpannot, genemap, qqsnps_filt, locusPlot, celltype_plot_data
  • 6 pinned from python:3: allsnps, gwas_file, getgwascatalog, annotPlot, magma, create_circos_plot

Python scripts: pandas, numpy, subprocess, and tabix compatibility fixes

Test plan

  • Tested end-to-end with CD.gwas example data on local FUMA installation
  • SNP2GENE pipeline completes all steps (gwas_file → allSNPs → magma → manhattan → QQ → getLD → SNPannot → getGWAScatalog → geteQTL → geneMap → createCircosPlot)
  • Tested with additional annotations (3D chromatin interactions, eQTL mapping)
  • Two successful jobs (18, 19) with no errors
  • Test on Linux server environment
  • Test GENE2FUNC and CELLTYPE modules

🤖 Generated with Claude Code

alptaciroglu and others added 3 commits April 11, 2026 22:05
…ry support

All 14 Python worker containers now use FROM python:3.11 (8 migrated from
Python 2, 6 pinned from python:3). Includes full compatibility fixes for
pandas 2.0+, numpy 2.0, and htslib 1.21. Tested end-to-end with CD.gwas
example data on a local FUMA installation (SNP2GENE pipeline, all steps pass).

Changes:
- Dockerfiles: pin all Python containers to python:3.11
- Python 2→3 syntax: print statements, configparser, .to_numpy()
- Pandas 2.0+: delim_whitespace→sep, DataFrame.append→pd.concat, dtype fix
- NumPy 2.0: np.in1d→np.isin (10 files)
- subprocess.check_output bytes decode (getLD.py)
- tabix -p vcf → -s 1 -b 2 -e 2 (allSNPs.py, htslib 1.21 compat)
- R Dockerfiles: add CMD ["R", "--vanilla"] (5 files)
- MAGMA Dockerfiles: curl -L for SURFsara URL redirect (2 files)
- requirements.txt: remove Python 2 packages, unpin versions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
np.array() on lists of different lengths raises ValueError in modern
NumPy. Adding dtype=object preserves the ragged structure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
numpy int64 values in gene_order/label_order arrays are not
JSON-serializable in Python 3. Use .tolist() which recursively
converts to native Python types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant