Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
fce1271
Fix metadata path
imallona Jan 14, 2026
cfcb6a5
Update CLI documentation and default names; shorthands are now a bit …
imallona Jan 15, 2026
bf84782
Rollback default for skip-header (separate and joint)
imallona Jan 15, 2026
fe799e0
Update getter
imallona Jan 15, 2026
7fed06a
Download sines/lines and add them to the crc featureset
imallona Jan 30, 2026
cfec1e7
Restore old CLI as in master
imallona Feb 2, 2026
007bf10
Switch to normalized (sample?) entropies and hopefully also Shannons,…
imallona Feb 2, 2026
38914cf
Fix output
imallona Feb 2, 2026
80275c4
Add normalized data, and Shannon entropies, to the per-feature plots
imallona Feb 2, 2026
83e1c24
Switch to CLI11, add a cell metadata runmode, remove R package (#68)
atchox Feb 18, 2026
f2d31b0
Harmonize levels, assemble integrated H vs S plots (ongoing)
imallona Feb 18, 2026
803f842
Prettify reports
imallona Feb 19, 2026
dd54842
Plots for adj vs raw S and H, using Emanuel's simuls
imallona Feb 23, 2026
2f6f0fa
Alternative adjS normalization
imallona Feb 24, 2026
29c3422
Attempt adjS update
imallona Mar 12, 2026
e8a7574
Draft adjS normalization testing
imallona Mar 12, 2026
92c3268
Add sparsity to test data
imallona Mar 12, 2026
6ae5f51
Fix sign in adjS / S calculation
imallona Mar 12, 2026
6d354ad
On the flipped sign again
imallona Mar 12, 2026
b684d91
Skip -1 in flat tests, update docs and test results for new adjS / adjH
imallona Mar 12, 2026
1d41859
Fix lines/sines retrievals
imallona Mar 16, 2026
dae89fa
big labels
imallona Mar 16, 2026
522488f
On atchox' simulatiosn coloring, and sizes
imallona Mar 16, 2026
2c4c7ca
Add adjS / adjH plots, combined
imallona Mar 19, 2026
6fba258
Draft Ecker, again, on CpG only (#69)
imallona Mar 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions .github/workflows/debug.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ jobs:

- name: Installs
run: |
sudo apt update && sudo apt install -y libboost-program-options-dev libboost-iostreams-dev
sudo apt update && sudo apt install -y libboost-iostreams-dev
wget https://launchpad.net/ubuntu/+archive/primary/+files/libcli11-dev_2.6.1+ds-1_all.deb
sudo dpkg -i libcli11-dev_2.6.1+ds-1_all.deb
Comment on lines +24 to +26
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow installs CLI11 via a hard-coded Launchpad .deb + dpkg -i, which is brittle and can break when the URL/version changes or dependencies are missing. Prefer sudo apt install -y libcli11-dev (as in tar.yml) or add the appropriate repository instead of downloading a .deb directly.

Copilot uses AI. Check for mistakes.

- name: Build and test_0
env:
Expand All @@ -33,10 +35,10 @@ jobs:

cd ../test
../method/build/yamet \
test_0/in/cell1.bed.gz \
-c test_0/in/cell1.bed.gz \
-r test_0/in/ref.bed.gz \
-i test_0/in/regions.bed \
--det-out current_test_0.out
--det-out current_test_0.out

echo 'Diff expected vs produced'

Expand All @@ -56,7 +58,9 @@ jobs:

- name: Installs
run: |
sudo apt update && sudo apt install -y libboost-program-options-dev libboost-iostreams-dev
sudo apt update && sudo apt install -y libboost-iostreams-dev
wget https://launchpad.net/ubuntu/+archive/primary/+files/libcli11-dev_2.6.1+ds-1_all.deb
sudo dpkg -i libcli11-dev_2.6.1+ds-1_all.deb

- name: Build and test_0
env:
Expand All @@ -68,10 +72,10 @@ jobs:

cd ../test
../method/build/yamet \
test_0/in/cell1.bed.gz \
-c test_0/in/cell1.bed.gz \
-r test_0/in/ref.bed.gz \
-i test_0/in/regions.bed \
--det-out current_test_0.out
--det-out current_test_0.out

echo 'Diff expected vs produced'

Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/memleaks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ jobs:

- name: Installs
run: |
sudo apt update && sudo apt install -y \
libboost-program-options-dev libboost-iostreams-dev valgrind
sudo apt update && sudo apt install -y libboost-iostreams-dev valgrind
wget https://launchpad.net/ubuntu/+archive/primary/+files/libcli11-dev_2.6.1+ds-1_all.deb
sudo dpkg -i libcli11-dev_2.6.1+ds-1_all.deb
Comment thread
imallona marked this conversation as resolved.

- name: Build and valgrind
run: |
Expand All @@ -45,7 +46,8 @@ jobs:
--show-leak-kinds=all \
--track-origins=yes \
--verbose \
./build/yamet $GITHUB_WORKSPACE/test/test_0/in/cell1.bed.gz \
./build/yamet \
-c $GITHUB_WORKSPACE/test/test_0/in/cell1.bed.gz \
-r $GITHUB_WORKSPACE/test/test_0/in/ref.bed.gz \
-i $GITHUB_WORKSPACE/test/test_0/in/regions.bed \
--det-out valgrind_test_0.out
4 changes: 2 additions & 2 deletions .github/workflows/tar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
uses: actions/checkout@v4

- name: Install dependencies
run: brew update && brew install boost
run: brew update && brew install boost && brew install cli11

- name: Build
run: |
Expand Down Expand Up @@ -43,7 +43,7 @@ jobs:
uses: actions/checkout@v4

- name: Install dependencies
run: sudo apt update && sudo apt install -y libboost-program-options-dev libboost-iostreams-dev
run: sudo apt update && sudo apt install -y libcli11-dev libboost-iostreams-dev

- name: Build
run: |
Expand Down
105 changes: 69 additions & 36 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ jobs:
uses: actions/checkout@v4
- name: Installs
run: |
sudo apt update && sudo apt install -y libboost-program-options-dev libboost-iostreams-dev
sudo apt update && sudo apt install -y libboost-iostreams-dev
wget https://launchpad.net/ubuntu/+archive/primary/+files/libcli11-dev_2.6.1+ds-1_all.deb
sudo dpkg -i libcli11-dev_2.6.1+ds-1_all.deb
Comment on lines +17 to +19
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI installs CLI11 by downloading a specific .deb from Launchpad and running dpkg -i. This is brittle (version/URL changes) and can fail due to missing dependencies. Prefer installing libcli11-dev via apt (as already done in tar.yml) or vendoring/pinning via a proper apt repository step.

Suggested change
sudo apt update && sudo apt install -y libboost-iostreams-dev
wget https://launchpad.net/ubuntu/+archive/primary/+files/libcli11-dev_2.6.1+ds-1_all.deb
sudo dpkg -i libcli11-dev_2.6.1+ds-1_all.deb
sudo apt update && sudo apt install -y libboost-iostreams-dev libcli11-dev

Copilot uses AI. Check for mistakes.
- name: Build
run: |
cd method
Expand All @@ -23,11 +25,11 @@ jobs:
run: |
cd test
../method/build/yamet \
test_parse/in/simulations.tsv \
-c test_parse/in/simulations.tsv \
-r test_parse/in/reference.tsv \
-i test_parse/in/regions.bed \
--print-reference \
--print-sampens F &> current_test_parse.stdout
--no-print-sampens &> current_test_parse.stdout

echo 'Diff expected vs produced stdout'

Expand All @@ -38,14 +40,14 @@ jobs:
run: |
cd test
../method/build/yamet \
test_avg/in/simulations.*.tsv \
-c test_avg/in/simulations.*.tsv \
-r test_avg/in/reference.tsv \
-i test_avg/in/regions.bed \
--meth-out current_test_avg.out \
--cores 1 &> current_test_avg.stdout

../method/build/yamet \
test_avg/in/simulations.*.tsv \
-c test_avg/in/simulations.*.tsv \
-r test_avg/in/reference.tsv \
-i test_avg/in/regions.bed \
--meth-out current_test_avg_all_meth.out \
Expand All @@ -72,7 +74,7 @@ jobs:
run: |
cd test
../method/build/yamet \
test_avg/in/simulations.*.tsv \
-c test_avg/in/simulations.*.tsv \
-r test_avg/in/reference.tsv \
-i test_avg/in/regions.bed \
--meth-out current_test_avg.out &> current_test_avg.stdout
Expand All @@ -91,17 +93,17 @@ jobs:
run: |
cd test
../method/build/yamet \
test_0/in/cell1.bed.gz \
-c test_0/in/cell1.bed.gz \
-r test_0/in/ref.bed.gz \
-i test_0/in/regions.bed \
--print-sampens F \
--no-print-sampens \
--det-out current_test_0.out

../method/build/yamet \
test_0/in/cell1.bed.gz \
-c test_0/in/cell1.bed.gz \
-r test_0/in/ref.bed.gz \
-i test_0/in/regions.bed \
--print-sampens F \
--no-print-sampens \
--det-out current_test_0_all_meth.out \
--all-meth

Expand All @@ -117,13 +119,13 @@ jobs:
run: |
cd test
../method/build/yamet \
test_1/in/simulations.tsv \
-c test_1/in/simulations.tsv \
-r test_1/in/reference.tsv \
-i test_1/in/regions.bed \
--det-out current_test_1.out &> current_test_1.stdout

../method/build/yamet \
test_1/in/simulations.tsv \
-c test_1/in/simulations.tsv \
-r test_1/in/reference.tsv \
-i test_1/in/regions.bed \
--det-out current_test_1_all_meth.out \
Expand All @@ -144,37 +146,68 @@ jobs:

diff test_1/out/stdout \
current_test_1_all_meth.stdout
- name: Test normalization
# - name: Test normalization
# if: always()
# run: |
# cd test
# ../method/build/yamet \
# -c test_normalization/in/simulations.tsv \
# -r test_normalization/in/reference.tsv \
# -i test_normalization/in/regions.bed \
# --out current_test_normalization.out \
# --norm-det-out current_test_normalization.norm.out

# ../method/build/yamet \
# -c test_normalization/in/simulations.tsv \
# -r test_normalization/in/reference.tsv \
# -i test_normalization/in/regions.bed \
# --out current_test_normalization_all_meth.out \
# --norm-det-out current_test_normalization_all_meth.norm.out \
# --all-meth

# echo 'Diff expected vs produced normalized simple output'

# diff test_normalization/out/simple.out \
# current_test_normalization.out

# diff test_normalization/out/simple_all_meth.out \
# current_test_normalization_all_meth.out

# echo 'Diff expected vs produced normalized detailed output'

# diff test_normalization/out/detailed.out \
# current_test_normalization.norm.out

# diff test_normalization/out/detailed_all_meth.out \
# current_test_normalization_all_meth.norm.out
Comment on lines +161 to +182
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalization test block is commented out, but the repository still contains expected normalization outputs under test/test_normalization/out/. If normalization behavior changed intentionally, it’s better to update the expected outputs and keep the test enabled so regressions are caught, rather than disabling the check entirely.

Suggested change
# -c test_normalization/in/simulations.tsv \
# -r test_normalization/in/reference.tsv \
# -i test_normalization/in/regions.bed \
# --out current_test_normalization_all_meth.out \
# --norm-det-out current_test_normalization_all_meth.norm.out \
# --all-meth
# echo 'Diff expected vs produced normalized simple output'
# diff test_normalization/out/simple.out \
# current_test_normalization.out
# diff test_normalization/out/simple_all_meth.out \
# current_test_normalization_all_meth.out
# echo 'Diff expected vs produced normalized detailed output'
# diff test_normalization/out/detailed.out \
# current_test_normalization.norm.out
# diff test_normalization/out/detailed_all_meth.out \
# current_test_normalization_all_meth.norm.out
- name: Test normalization
if: always()
run: |
cd test
../method/build/yamet \
-c test_normalization/in/simulations.tsv \
-r test_normalization/in/reference.tsv \
-i test_normalization/in/regions.bed \
--out current_test_normalization.out \
--norm-det-out current_test_normalization.norm.out
../method/build/yamet \
-c test_normalization/in/simulations.tsv \
-r test_normalization/in/reference.tsv \
-i test_normalization/in/regions.bed \
--out current_test_normalization_all_meth.out \
--norm-det-out current_test_normalization_all_meth.norm.out \
--all-meth
echo 'Diff expected vs produced normalized simple output'
diff test_normalization/out/simple.out \
current_test_normalization.out
diff test_normalization/out/simple_all_meth.out \
current_test_normalization_all_meth.out
echo 'Diff expected vs produced normalized detailed output'
diff test_normalization/out/detailed.out \
current_test_normalization.norm.out
diff test_normalization/out/detailed_all_meth.out \
current_test_normalization_all_meth.norm.out

Copilot uses AI. Check for mistakes.
- name: Test AdjS association to marginal entropy / flatness
if: always()
run: |
cd test
sudo apt-get install -y python3-scipy python3-numpy || echo "Skipped apt but trying pip if available"
python3 test_flatness.py
- name: Test metadata format
if: always()
run: |
cd test
../method/build/yamet \
test_normalization/in/simulations.tsv \
-r test_normalization/in/reference.tsv \
-i test_normalization/in/regions.bed \
--out current_test_normalization.out \
--norm-det-out current_test_normalization.norm.out

../method/build/yamet \
test_normalization/in/simulations.tsv \
-r test_normalization/in/reference.tsv \
-i test_normalization/in/regions.bed \
--out current_test_normalization_all_meth.out \
--norm-det-out current_test_normalization_all_meth.norm.out \
--all-meth
--metadata test_cluster/in/metadata.tsv \
-r test_cluster/in/reference.tsv \
-i test_cluster/in/annotation.tsv \
--det-out current_test_metadata.det.out \
--out current_test_metadata.out &> current_test_metadata.stdout

echo 'Diff expected vs produced normalized simple output'
echo 'Diff expected vs produced detailed output'

diff test_normalization/out/simple.out \
current_test_normalization.out
diff test_cluster/out/det.out \
current_test_metadata.det.out

diff test_normalization/out/simple_all_meth.out \
current_test_normalization_all_meth.out
echo 'Diff expected vs produced output'

echo 'Diff expected vs produced normalized detailed output'
diff test_cluster/out/out \
current_test_metadata.out

diff test_normalization/out/detailed.out \
current_test_normalization.norm.out
echo 'Diff expected vs produced stdout'

diff test_normalization/out/detailed_all_meth.out \
current_test_normalization_all_meth.norm.out
diff test_cluster/out/stdout \
current_test_metadata.stdout
108 changes: 70 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,46 +44,78 @@ bash build.sh

## Usage

`yamet` processes (covered CpG) DNA methylation report(s) (`-cell` argument), a reference file listing all CpG positions in a genome (`--reference`), and a bedfile specifying the genomic regions to calculate scores for (`--intervals`; e.g. promoters, genes, etc). Full CLI args:
`yamet` processes (covered CpG) DNA methylation report(s), a reference file listing all CpG positions in a genome, and a bedfile specifying the genomic regions to calculate scores for (e.g. promoters, genes, etc). Full CLI args:

```text
Usage:
yamet (-c <cell>... | <cell>...) \
-r <reference> \
-i <intervals> \
[OPTIONS]

Required inputs:
-c --cell <cell>... One or more tab-separated methylation files,
OR provide them directly as positional arguments.
<cell>... (Positional alternative) Cell files (same format as above).
-r --reference <reference> Reference file, tab-separated and sorted by chromosome/position.
-i --intervals <intervals> BED file of intervals of interest.

Optional output:
-d --det-out <file> Path to detailed output file.
-m --meth-out <file> Path to average methylation output file.
-o --out <file> Path to simple output file.

Resource options:
--cores <n> Number of cores for parallel parsing
[default: 0, implying program decides].
--chunk-size <size> Buffer size per file (e.g., 64K, 128M, 2G) [default: 64K].

Optional input control:
--skip-header[=<n>] Skip <n> lines in all input files [default: 1].
--skip-header-cell[=<n>] Skip <n> lines in cell files (overrides --skip-header).
--skip-header-reference[=<n>] Skip <n> lines in reference file (overrides --skip-header).
--skip-header-intervals[=<n>] Skip <n> lines in intervals file (overrides --skip-header).

Verbose/debugging:
--print-intervals Print parsed intervals file.
--print-reference Print parsed reference file.
--print-sampens[=<true|false>] Print computed sample entropies [default: true].

Miscellaneous:
-h --help Show help message.
--version Show version information.
yamet

input:
-c [ --cytosine_report ] arg Per-cell cytosine report file(s)
(Bismark-like for covered cytosines).
Synonyms: --cytosine_report, --cell,
-c.
Tab-separated, sorted by chromosome and
position:
chr pos meth_reads total_reads
rate
-r [ --cytosine_locations ] arg Genomic locations of all cytosines
(typically CpGs). Synonyms:
--cytosine_locations, --reference, -r.
Required to reconstruct contiguous CpG
sequences.
Columns:
chr pos
-i [ --regions ] arg BED file defining genomic regions where
entropies will be computed. Synonyms:
--regions, --features, --target,
--intervals, -i.
Columns:
chr start end
--skip-header-all [=arg(=1)] Header lines to skip in all input files
(default: 0). Synonyms:
--skip-header-all, --skip-header.
--skip-header-cytosine_report [=arg(=1)]
Header lines to skip in
cytosine_report/cell files (default:
0).
--skip-header-cytosine_locations [=arg(=1)]
Header lines to skip in
cytosine_locations/reference file
(default: 0).
--skip-header-regions [=arg(=1)] Header lines to skip in
regions/features/target/intervals file
(default: 0).

output:
-d [ --det-out ] arg (optional) path to detailed output file
-n [ --norm-det-out ] arg (optional) path to detailed normalized
output file
-m [ --meth-out ] arg (optional) path to average methylation
output file
--all-meth [=arg(=true)] (=false) If true, include all CpGs in
methylation summaries, including those
not used for template construction
(default: false).
-o [ --out ] arg (optional) path to simple output file

resource utilisation:
--cores arg (=0) number of cores used for simultaneously
parsing methylation files
--chunk-size arg (=64K) size of the buffer (per file) used for
reading data. Can be specified as a
positive integer (bytes) or with a
suffix: B, K, M, G.

verbose:
--print-intervals print parsed intervals file
--print-reference print parsed reference file
--print-sampens [=arg(=true)] (=true) print computed sample entropies

misc:
-h [ --help ] print help message
--version current version information


```

## Repository
Expand Down
Loading
Loading