Skip to content

Feat@use daf#193

Open
aviezerl wants to merge 39 commits intomasterfrom
feat@use-daf
Open

Feat@use daf#193
aviezerl wants to merge 39 commits intomasterfrom
feat@use-daf

Conversation

@aviezerl
Copy link
Copy Markdown
Contributor

No description provided.

aviezerl added 30 commits July 28, 2025 10:28
…ment for MCView. Replace direct variable access with mcv_get and mcv_set functions across the codebase to improve maintainability and testing. Update related functions and documentation accordingly.
Replace ~20 instances of loading full gene×metacell matrices (28K×2.4K)
just for metadata (counts, names, existence checks) with O(1) DAF axis
queries:

- get_mc_sum(): use axis_entries instead of loading mc_mat for names
- has_corrected/projection/network/cell_metadata/samples: use DAF
  has_matrix/has_vector/has_axis instead of loading full data
- qc_value_box, projection_qc: use axis_length instead of ncol(mc_mat)
- common_genes box: use axis_entries for both datasets
- metacell_selector, metacell_names_reactive: use axis_entries
- initial_proj_point_size, initial_scatters_point_size: use axis_length
- app_server tab checks: use has_matrix for inner_fold/stdev
- calc_top_cors: push metacell filter down to DAF query
- calc_obs_exp_mc_df: query single-metacell fractions via DAF instead
  of loading two full corrected/projected matrices
- Fix S4 dgeMatrix incompatibility with tgs_cor in calc_top_cors
- Replace c() growth in loop with list accumulation + unlist in
  daf_query_mc_mat sparse matrix construction (daf_data.R:276-286)
- Add comprehensive vignette documenting exactly what the input DAF
  needs: required axes/vectors/matrices, optional components per tab,
  configuration scalars, precomputable data recommendations
- Convert mc_egc to dense once in calc_marker_genes instead of calling
  as.matrix() twice (for rowMaxs and rowMedians)
- Combine sparse row filter + marker intersection into single subset
  before as.matrix() conversion in get_marker_matrix (4 mode branches)
…licate and improve quality

7 refactoring batches across 51 files (917 insertions, 1116 deletions):

Batch 1 - Bug fixes (14): parenthesis bug in mod_mc_mc groupB selection,
cache key typo in mod_samples, undefined variable refs in plot_mc2d_proj
and plot_metadata, parameter shadowing in mod_atlas, strict passthrough
in daf_core, Manhattan-to-Euclidean distance fix in utils_mc_mc, typo
fixes, epsilon guards on log2 calls, split monolithic observe in app_server.

Batch 2 - Dead code removal (~350 lines): 7 legacy cache functions from
daf_cache, if(0) block in utils_network, dead functions in utils_gene_modules
and utils_dt, unreachable return in utils_markers, commented code in mod_flow.

Batch 3 - DAF layer deduplication: shared helpers (daf_query_named_vector,
daf_query_gene_agg, convert_daf_fraction_to_umi, coerce_vec_for_daf,
set_daf_vectors_from_df), MCVIEW_TAB_NAMES constant replacing 3 copies.

Batch 4 - Module deduplication: move_cell_type helper in mod_annotate
(~100 lines dedup), observe_group_selection in mod_query, purrr::walk
for observers, removed dead variables, simplified tagList patterns,
fixed duplicate HTML id.

Batch 5 - Utility consolidation: egc_to_fp helper, filter_metadata_field_names
helper, .env$ pronoun cleanup, is_gene_color_mode helper, removed redundant
assignments and fetches.

Batch 6 - Plot deduplication: mc2d_add_graph_edges helper (3x dedup),
O(n²) append elimination in plot_vein, config access standardization,
removed wasteful metadata_colors fetches.

Batch 7 - Quality: vertical_gridlines typo fix, deprecated dplyr replacements
(mutate_at/summarise_at/summarise_all -> across), removed self-assignments
and duplicate sanitize_plotly_download call.
- Scatter plots: extract 4 shared helpers (apply_gene_axis_scale,
  apply_scatter_color_layer, resolve_gene_color, resolve_numeric_md_color)
  from 3 near-identical functions in plot_metadata.R
- render_2d_plotly: decompose 280-line monolith into 45-line orchestrator
  + 16 focused handler functions with clean switch() dispatch
- utils_heatmap: split into 3 files (server logic, UI builders, helpers);
  extract heatmap_tooltip_handler and heatmap_download_handlers
- Group management: unify duplicated DT/observer code from mod_mc_mc.R
  and mod_query.R into shared utils_group_box.R (4 functions)

849 insertions, 1109 deletions across 6 files; 2 new files (313 lines).
All tests pass, NAMESPACE unchanged.
Prior DAF fork maintenance: update NAMESPACE exports, regenerate
documentation, add smoke and integration test scaffolding, and
clean up miscellaneous module imports.
…, metadata NA

- Fix duplicate rows in annotate module by adding distinct() before joins (#5)
- Fix markers ordering for subset cell types: convert names to indices (#6)
- Add all-NA / non-finite metadata color-break fallback (#7)
- Add "Update Heatmap" button with applied_params reactive pattern to
  decouple sidebar changes from heatmap recalculation (#1)
- Add categorical metadata filtering with checkbox toggle, dynamic
  category selector, and cached utility function (#2)
… options

- Add plot type selector (boxplot/violin/sina) with ggforce dependency
- Add facet-by metadata variable support
- Add categorical x-axis variable with category selection
- Add coord_flip and log_scale toggles in boxSidebar (#4)
- Add dynamic renderUI for gene, metadata, and gene module selectors
- Add shinyjs::toggle observer to show/hide selectors based on color_proj
- Fix gene_modules selector clobbering gene selector in master
- Fix undefined project reference in atlas tab (#8)
- Add mod_gene_correlation.R (771 lines): gene list input, correlation
  modes (individual/module/gene-gene), heatmap+barplot+table output
- Add 5 helper functions to utils_gene_mc.R: calc_individual_correlations,
  calc_module_correlations, calc_gene_gene_correlations,
  plot_correlation_heatmap, plot_correlation_barplot
- Wire into app_config.R, daf_contracts.R, daf_core.R
- Replace rclipboard clipboard with downloadHandler for gene lists
- Wrap sparse DAF matrices with as.matrix() for compatibility (#3)
…s_cor

- Port clipboard_copy_button_ui/server from master to utils_clipboard.R
- Add rclipboard to DESCRIPTION Imports
- Add "Copy Genes" clipboard button alongside download in gene correlation
- Shorten radio button labels ("Find correlated" / "Gene-gene cor.") to
  prevent text overflow in justified radioGroupButtons
- Replace cor() with BLAS-accelerated tgs_cor in calc_gene_gene_correlations
  and plot_correlation_heatmap for consistent performance
Port the "Copy genes to clipboard" button from master to the markers
heatmap sidebar. Adds rclipboard::rclipboardSetup() to the heatmap box
UI and wires clipboard_copy_button_ui/server for the markers gene list.
- Pass ns to heatmap_download_handlers to fix "object 'ns' not found"
  error when rendering the copy genes clipboard button
- Add mod_gene_correlation_server smoke tests to test-module-smoke.R
- Create test-clipboard.R with unit and integration tests for clipboard
  copy button (UI function, server observer, heatmap integration,
  gene correlation integration)
Replace anonymous testServer call with a function signature check,
since testServer with inline functions doesn't properly initialize
the module input context. All 9 non-DAF tests pass.
…clean imports

- Refactor DAF contracts for clarity and add missing contract coverage
- Add app_config helpers, update daf_cache and daf_data
- Clean up imports and add gene_mc utility functions
- Consolidate duplicate skip_if_no_daf() from test files into helper-daf.R
- Add roxygen docs for gene correlation module and clipboard exports
…ment

Julia offloading infrastructure:
- Add julia_helpers.R with R wrappers for Julia-accelerated computations
- Add mcview_helpers.jl with EGC cache, correlation, top-gene, and marker
  gene functions using BLAS and partialsortperm for top-k extraction
- Add calc_marker_genes Julia path (daf_obj param, ~6.7s R bottleneck)
- Add calc_gg_mc_top_cor Julia path (572s R → 21s Julia, 27x speedup)

Test Julia environment fix:
- Add tests/run_tests.sh to activate conda env and set Julia env vars
- Set dafr.JULIA_HOME from CONDA_PREFIX in helper-daf.R before setup_daf()
- Call init_julia_helpers() after successful DAF setup in test helper
- Add julia_helpers_ready() test (skips gracefully without conda env)

Benchmarking:
- Add benchmark suite (benchmark_daf.R, run_benchmarks.sh, compare tool)
- Add OPTIMIZATION_REPORT.md documenting profiling results and decisions
- Fix EGC normalization to use fractions (t(t(mc_mat)/mc_sum)) instead of median-scaled values
- Fix convert_daf_gene_modules to return tibble(gene, module) with NA filtering
- Add session-level EGC matrix caching in get_mc_egc for repeated access
- Add 50-gene threshold for per-gene DAF queries in daf_query_mc_mat
- Batch metadata loading via get_frame in convert_daf_metadata
- Remove unused cache parameter from 6 function signatures and call sites
- Remove redundant axis_entries call in daf_query_named_vector
- Regenerate stale Rd documentation
Introduce two-DAF architecture that chains metacells and cells DAFs via
dafr::chain_reader(), enabling direct cell-level pseudobulk analysis.

- Add R/daf_cells.R with 12 functions: composition, DE, QC, pseudobulk
- Update Samples tab with interactive grouping field selector
- Add Group Comparison UI (multi-select Group A/B with aggregate DE)
- Add QC Metrics panel (cells/group, UMIs distribution, Wilson CIs)
- Auto-detect cells DAF at startup (sibling directory pattern)
- Wire cleanup for cells_daf references in mcview_env
- All 333 tests pass, backward compatible via NULL defaults
Replace 28K per-gene DAF round-trips with a single Julia call that does
sparse matrix-vector multiplication (mask' * UMIs) for each group.
Falls back to per-gene queries when Julia helpers are unavailable.

- Add mcview_compute_pseudobulk() Julia function
- Add julia_compute_pseudobulk() R wrapper
- Update get_group_pseudobulk_mat() and calc_group_diff_expr() to use Julia path
DAF already has `@ group_field %> Sum` for matrix GroupBy+Reduce,
making our custom mcview_compute_pseudobulk() redundant. The single
DAF query computes grouped sums entirely in Julia with no custom code.

Removes 150 lines across 3 files, replaces with 35 lines using DAF's
built-in API. Per-gene R fallback retained only for the cell_types
filtered case.
Replace manual R aggregation loops and masks with DAF's built-in
GroupBy+Reduce queries and viewer() for cell type filtering:

- get_group_pseudobulk_mat: single DAF query replaces per-gene loop
  for both filtered and unfiltered cases via make_cell_type_view()
- calc_group_diff_expr: same viewer+query pattern, no per-gene fallback
- get_group_gene_expression: DAF GroupBy query for both paths
- get_group_qc_stats: 3 DAF queries (Count, Sum, Median) replace
  manual lapply/tapply aggregation

Net reduction: 118 lines (-220/+102). All computation stays in Julia
via DAF's query engine.
End-to-end tests using chromote that launch the app in a background
process, connect headless Chrome, navigate every tab, and capture
screenshots. Covers 21 test blocks with ~90 assertions including
DOM element presence, plotly rendering, and interactive controls.
New test file with 16 test blocks exercising UI interactions:
Markers heatmap (force_cell_type, lateral/noisy, legends),
Diff Expression (MCs/Types mode, hide genes, table toggle),
Genes (axis type switch, correlation toggle), QC (ECDF/Density,
table toggle), Cell Types (boxplot/violin/sina, coord flip,
select/clear all). Also extends helper-browser.R with 9 new
interaction utilities (click_radio_button, click_checkbox, etc.).
- Replace N² matrix queries in detect_available_tabs() with 2 targeted
  checks, cutting tab detection from ~5s to <1ms
- Defer future::plan(multisession) to first use, saving 5.7s at startup
- Add Julia sysimage auto-detection for ~5.5s faster Julia init
- Add R-level memoization for 22 static DAF data types in get_mc_data()
- Eliminate double scatter layer in all 2D projection plots, halving
  plotly JSON payload
- Remove incorrect bindCache from Annotate tab (incompatible with
  annotation workflow)
- Add tab guards to defer QC, Genes, Annotate, Projection QC computation
- Add defensive req() guards for missing max_expr column
- Remove redundant rm_plotly_grid() and scalars_set() calls
Migrate all DAF query strings to v0.2.0 syntax (@ axis, :: matrix,
[ mask ], >> reduce, -/ group >- reduce). Replace deprecated
dafr::And() with BeginMask()/EndMask(). Update dafr dependency
to >= 0.1.0. Add module-level metadata caching to reduce redundant
DAF fetches.
Guard post-hoc rownames/colnames/names assignments with is.null()
checks so dafr's atomic name attachment is preserved. Replace
double-transpose EGC computation with sweep(). Saves ~260MB per
get_matrix() call (99.9% memory reduction).
jlview integration:
- Use jlview_sweep for EGC normalization (zero-copy division)
- Use jlview_log2p for log2+epsilon transforms across 4 modules
- Use jlview_t for matrix transpose in cache precomputation
- Add jlview to DESCRIPTION Imports

General optimizations:
- get_cell_grouping_fields: read FilesDaf metadata JSON directly
  instead of loading every cell vector (7.5s → 0.15s, 31x speedup)
- convert_daf_metadata: use individual daf_vec() calls instead of
  get_frame() for zero-copy columns
- Precompute metacell top genes during init to avoid 3.8s cold
  computation on first access
- Vectorize chi-squared test in calc_diff_expr (120ms → 5ms)
- Add fast-path cache lookup in get_gene_egc for single genes
Remove 11 as.matrix() calls on jlview ALTREP objects — C code
(tgs_cor, matrixStats, TGL_kmeans) reads via REAL() which works
directly with ALTREP. Use jlview_fp() for egc_to_fp to avoid
521MB materialization for rowMedians. Saves ~1.5GB peak memory.
Single-gene EGC: use session-cached mc_mat instead of uncached
Julia round-trip (23ms → 1ms per gene selection).

Metadata: replace left_join with direct column assignment to
preserve jlview ALTREP views (4/8 columns stay zero-copy).

Remove 3 unnecessary as.numeric() in QC metadata aggregations.
Replace mutate-to-find-top2 pattern with jlview_top2_per_col/row
across 3 sites, avoiding as.matrix + transpose + mutation.
Use cached mc_mat for 2-metacell DE queries instead of DAF query
+ sparse→dense conversion.
Pass epsilon directly to jlview_fp in egc_to_fp, avoiding
intermediate x+eps materialization. Remove 6 static_vars cache
entries that duplicate dafr's new built-in version-counter cache.
Samples:
- Add tab guard to defer computation until first visit
- Share group_composition reactive with bindCache
- Batch 3 QC DAF queries into single Julia call

Markers:
- Cache per-cell-type clustering results for force_cell_type toggle
- Disable slow Julia marker path (45s) in favor of cached R path (5s)
- Optimize mat() and heatmap bindCache keys to avoid unnecessary
  recomputation on legend/metadata-only changes
- Fix deterministic tiebreaker for stable Shiny cache

Annotate:
- Add bindCache to render_2d_plotly with smart Selected-mode key
- Replace full DataTable re-render with DT proxy replaceData
- inst/scripts/convert_mcview_app.R: standalone CLI to convert old-format
  MCView apps to DAF (detects layout, transforms config, generates app.R)
- R/convert_project_to_daf.R: fix inner_fold/stdev matrix conversion when
  gene subset differs from mc_mat (reindex_matrix helper fills missing with 0)
- R/julia_helpers.R: add prewarm_dafr_dispatch() that warms JuliaCall bridge
  JIT at startup via a tiny FilesDaf, and prewarm_julia_cors() for correlations
- R/app_config.R: integrate both warmups into init_defs() startup path
- dev/create-sysimage.jl: add RCall+Suppressor to sysimage packages and
  exercise sexp/rcopy for DAF return types (NamedVector, NamedMatrix, etc.)
- R/plot_metadata.R: fix zero-variance correlation crash
- R/utils_selectors.R: remove redundant req() checks
- R/daf_contracts.R: add optional scalars to core contract (mcview_title,
  mcview_tabs, mcview_excluded_tabs, mcview_light_version,
  mcview_about_markdown, mcview_cache_in_daf, mcview_cache_daf_root,
  mcview_available_tabs). These were implicitly used but not documented.
- inst/scripts/convert_mcview_app.R: call store_available_tabs() after
  conversion so detect_available_tabs() is instant at runtime (0.00s
  vs 1.3s cold start without pre-stored scalar)
- Fix markers-only checkbox in correlation panel: preserve checkbox state
  across renderUI re-renders instead of resetting to TRUE (was preventing
  the toggle from working on Genes, Atlas tabs)
- Fix Samples tab not appearing in sidebar: include optimistically in
  detect_available_tabs when cell axis exists, and don't override when
  config$tabs already includes it
- Fix get_cell_grouping_fields: use raw cells DAF instead of chained DAF
  for vector listing and Julia cardinality checks (returns all 34 fields
  including embryo instead of just batch_set_id)
- Update contracts and pre-computation stubs for DAF-native caching
- Add per-type marker genes and marker correlation support
Strip all fold diagnostic features except Projected-fold: delete
mod_inner_fold.R and mod_stdev_fold.R modules, remove tab definitions,
contracts, data conversion functions, QC plots/tables, heatmap modes
(Inner/Stdev/Outliers), marker gene selection modes, cache keys, and
related tests. Keeps Projected-fold tab and its full infrastructure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant