Skip to content

fix(import_phoenix_ast): handle wide-format Phoenix exports#103

Open
efosternyarko wants to merge 1 commit intoAMRverse:mainfrom
efosternyarko:fix/import-phoenix-wide-format
Open

fix(import_phoenix_ast): handle wide-format Phoenix exports#103
efosternyarko wants to merge 1 commit intoAMRverse:mainfrom
efosternyarko:fix/import-phoenix-wide-format

Conversation

@efosternyarko
Copy link
Copy Markdown
Collaborator

Problem

import_phoenix_ast() failed silently or produced incorrect results when given Phoenix export files in wide format (one row per sample, with drug results stored as column triplets: XX (MIC), XX (Interp), XX (Expert)).

The function was written for long-format Phoenix exports (one row per drug) and had no path for the wide format common in Phoenix XLSX batch exports.

Changes

R/import_pheno.R

  • Wide-format detection and pivot — after reading the file, detect wide format by checking for multiple (MIC) / (MOC) columns. When detected, pivot to long format: metadata columns are preserved per row, and each drug triplet becomes one row with standardised drug, mic, and Interp columns.
  • use_expertized logic during pivot — expert interpretation is preferred over raw Interp per row when use_expertized = TRUE (default), with per-row fallback to Interp where expert is NA.
  • Drop untested rows — rows where both MIC and SIR are absent (drug not run for that sample) are removed after pivoting.
  • Sample column auto-detection — added "accession" to the sample ID detection patterns, prioritised before "^isolate$". Phoenix wide exports commonly use "Accession" as the sample identifier; previously "Isolate" (the within-batch isolate number) was matched first, collapsing hundreds of samples to a handful of unique IDs.
  • Output cleanup — switched final relocate() to select() so pivot intermediate columns (drug, Interp) are not leaked into the returned data frame.

Testing

Tested on three real Phoenix export files:

File Format Rows Samples Drugs
NMIC-422 MIC Results 20026-Feb.xlsx wide (headed) 18,420 677 25
PMIC-84 MIC Results 2026 -Feb .xlsx wide (headed) 32,818 1,228 18
Phoenix-Antibiogramm-Daten.xls headerless positional 465 20 37

All three return the standardised 8-column output (id, drug_agent, mic, disk, method, platform, pheno_provided, spp_pheno). The existing headerless positional path is unaffected.

…per sample, drug triplet columns)

- Detect wide format via multiple '(MIC)'/'(MOC)' columns and pivot to long
  format before column resolution
- Handle use_expertized logic during pivot (expert falls back to interp per row)
- Drop untested drug rows (both MIC and SIR absent)
- Add 'accession' to sample column auto-detect patterns, prioritised before 'isolate'
- Switch final select from relocate() to select() to drop pivot intermediate columns

Tested on three real Phoenix files: NMIC-422 (677 samples, 25 drugs),
PMIC-84 (1228 samples, 18 drugs), Phoenix-Antibiogramm-Daten.xls (headerless)
@efosternyarko
Copy link
Copy Markdown
Collaborator Author

CI failure note

The two failing jobs (ubuntu-latest (R-devel) and macOS-latest (R-release)) both fail during dependency setup (r-lib/actions/setup-r-dependencies@v2), before the package check runs — so neither failure is caused by the code changes in this PR.

  • ubuntu-latest (R-devel): package or namespace load failed for 'rlang' — binary compiled against an older R-devel ABI, breaks when the daily R-devel build increments
  • macOS-latest (R-release): dependency setup exits with code 1 (likely a source compilation failure of a base dependency)

The most recent push to main (2026-03-23) passed all five jobs including both of these. This is a transient infrastructure issue. Could a maintainer re-run the failed jobs when convenient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant