Releases · tanaylab/misha

29 Apr 07:47

aviezerl

v5.6.23

da59284

5.6.23 Latest

Latest

Input-format ergonomics for BED/GFF/VCF

Improved error messages: "start exceeds or equals to end" now mentions misha's 0-based half-open convention and the GFF/VCF 1-based hint; "chromosome does not exist" lists known chromosomes and points to CHROM_ALIAS.
C++ converter now emits an R warning ("N intervals had start == end and were extended by 1bp") when zero-length intervals from a loaded file are auto-bumped — previously this happened silently.
Added gintervals.import_bed(), gintervals.import_gff(), gintervals.import_vcf() for direct import from common interval file formats. All three normalize chromosome names via the existing CHROM_ALIAS mechanism (so chr1 ↔ 1 works), apply misha's 0-based half-open convention (subtracting 1 from start for the 1-based GFF/GTF/VCF inputs), and preserve common metadata columns (name/score/strand for BED; type/source/score/attrs for GFF; id/ref/alt/qual/filter/info for VCF).

Character strand input (also bundled, was 5.6.22)

Intervals' strand column now accepts character ("+", "-", ".", "*", "") or factor input in addition to numeric 1/-1/0. Strings are normalized to the numeric convention at the R→C++ boundary; output stays numeric.

Assets 2

28 Apr 09:58

aviezerl

v5.6.19

ba88e19

5.6.19

Fixed gintervals.load failing with "invalid columns definition" after gintervals.save of a bigset whose input had character chrom (e.g. a tibble from dplyr). On-disk per-chromosome files and the .meta zeroline now both store chrom/chrom1/chrom2 as factor with full ALLGENOME levels, and the on-disk frame is normalized to plain data.frame. (#102)

Assets 2

28 Apr 05:59

aviezerl

v5.6.18

6402aac

5.6.18

Added getOption("gmultitasking.strategy") for gextract (default "auto"). When the workload is large and many-track, auto routes to a track-parallel mode (each parallel::mclapply worker handles a track subset across all tiles) instead of the legacy tile-parallel mode (each fork-kid handles a tile range across all tracks). On the realistic 3,110 motif tracks × 2.19M tiled_peaks workload measured 57.6 min vs ~3.4 h projected for tile-parallel — a 3.5× per-track speedup. Override per-call via options(gmultitasking.strategy = "tracks" | "tiles" | "auto"). The heuristic stays on "tiles" for streaming iterators (numeric / NULL / 2D rect / track-name), single-track or fewer than 8 tracks, file/intervals.set.out output, or 2D band — so nothing else regresses (validated by a 36-cell matrix bench across iterator types × track counts × cache states).

Assets 2

26 Apr 16:51

aviezerl

v5.6.17

eb30be9

5.6.17

Performance regression fix (vs v5.6.11–v5.6.16)

gextract calls touching many dense tracks (e.g. ~50 motif tracks) became 10–20× slower starting in v5.6.11. Two compounding causes:

MmapFile used MAP_POPULATE, eagerly paging in every mapped track at every chromosome transition (already covered by MADV_SEQUENTIAL).
The two track-validation loops in create_expr_iterator and TrackExpressionVars::init were calling GenomeTrackFixedBin::init_read() once per chromosome per track on every gextract call, paying open + mmap + madvise + close + munmap each time even though they only needed bin size and file size. Replaced with a metadata-only path that stat()s for size and reads bin_size only once per track.

Net effect on a realistic workload (51 LSE motif vtracks × 7000 tiles × 5 chroms): 22s → 0.4s (~55× speedup, also faster than pre-audit baseline).

Added an opt-in performance regression test (MISHA_PERF_TESTS=true R -e "devtools::test(filter='perf-regression')") gated out of the parallel test suite.

Assets 2

19 Apr 11:23

aviezerl

v5.6.15

4628d05

5.6.15

Bug fixes

Fixed gsynth.train(), gsynth.sample(), and gsynth.random_seqs() silently reading sequences from the wrong chromosome when the intervals argument covered a subset of the genome that omitted one or more earlier chromosomes in the chromkey. For every chromosome in the input that came after a missing one, the C++ side opened the wrong chromosome's sequence (shifted by the number of earlier missing chromosomes), producing invalid models and corrupted sampled genomes without any error. Calls that passed intervals = gintervals.all() or left intervals at its default (which is gintervals.all()) were not affected. Users who ran these functions on custom interval subsets should re-run them with this version.

Assets 2

15 Apr 12:19

aviezerl

v5.6.11

e35b6da

5.6.11

What's New

Added ggenome.implant() for replacing intervals in a reference genome with donor sequences and writing a new FASTA. Supports literal donor sequences or extraction from a misha database, with optional trackdb creation.
Added ggenome.transplant() as sugar for cross-genome sequence swaps — extracts from a source genome and implants into a target genome in a single call.

Assets 2

25 Mar 11:42

aviezerl

v5.6.7

91436cb

5.6.7

PWM edit distance virtual track functions (pwm.edit_distance, pwm.edit_distance.pos, pwm.max.edit_distance, pwm.edit_distance.lse, pwm.edit_distance.lse.pos)
gseq.pwm_edits() for detailed per-edit information
Pigeonhole pre-filter for PWM edit distance genome-wide scans
Sub-chromosome range splitting for gscreen, gextract, gsummary, gdist, and gcor (Pearson)
Fixed gscreen returning split intervals at sub-chromosome parallel boundaries

Assets 2

19 Mar 22:32

aviezerl

v5.6.6

08c4602

5.6.6

Replaced non-API C entry point Rf_findVar with R_getVar/R_getVarEx for R 4.6.0 compatibility.
Fixed CRAN check NOTE about non-standard top-level files.
Fixed CRAN check WARNING about pipe.Rd documentation mismatch.

Assets 2

13 Mar 10:33

aviezerl

v5.6.1

25c2567

5.6.1

Changes

gsynth.save() and gsynth.load() now use the cross-platform .gsm format (YAML metadata + binary arrays) instead of R-specific RDS. Models saved with pymisha can now be loaded in R and vice versa. Legacy RDS files are still supported for backward compatibility.
Added compress parameter to gsynth.save() to optionally save as a ZIP archive.
Added gsynth.convert() to convert legacy RDS model files to the new .gsm format.
Fixed gdb.create_genome() example to use \dontrun instead of \donttest to prevent R CMD check failures when S3 download times out.

Assets 2

09 Mar 15:27

aviezerl

v5.6.0

53fd5ca

5.6.0

Added gintervals.attr.get(), gintervals.attr.set(), gintervals.attr.export(), and gintervals.attr.import() for managing interval set attributes. Attributes are stored as .iattr binary files (null-separated key/value pairs) next to .interv files for small interval sets, or inside the directory for big interval sets.
gintervals.rm() now cleans up companion .iattr attribute files when deleting interval sets.

Assets 2

Releases: tanaylab/misha

5.6.23

Input-format ergonomics for BED/GFF/VCF

Character strand input (also bundled, was 5.6.22)

Uh oh!

5.6.19

Uh oh!

5.6.18

Uh oh!

5.6.17

Performance regression fix (vs v5.6.11–v5.6.16)

Uh oh!

5.6.15

Bug fixes

Uh oh!

5.6.11

What's New

Uh oh!

5.6.7

Uh oh!

5.6.6

Uh oh!

5.6.1

Changes

Uh oh!

5.6.0

Uh oh!