Bidirectional Active Processing (BAP) Implementation

Reimplementation of the BAP algorithm from Bidirectional Active Processing.md as specified in our Electronics paper available both online and locally. Single-file implementations in Python and Julia with TOML configuration.

Requirements

Python

Python 3.11
numpy, pandas, scikit-learn
tomli (for Python < 3.11)

Use the project venv when present (dependencies are installed there):

source .venv311/bin/activate   # macOS / Linux
python bap.py -c examples/configs/config_iris.toml

Or run without activating:

.venv311/bin/python bap.py -c examples/configs/config_iris.toml

To create or refresh the venv:

python3.11 -m venv .venv311
.venv311/bin/python -m pip install -r requirements.txt

Configs point at CSVs under computing/machine learning datasets (not copies inside other app repos).

Julia

Julia 1.8+
Packages: CSV, DataFrames, ScikitLearn, TOML
(ScikitLearn requires Python with scikit-learn)

julia -e 'using Pkg; Pkg.activate("./julia"); Pkg.instantiate()'

(Run from the repository root.)

Quick Start

Single CSV (train/test split)

Example: Fisher Iris dataset

# Python (after: source .venv311/bin/activate)
python bap.py -c examples/configs/config_iris.toml

# Or with flags
python bap.py --train "../../machine learning datasets/default/fisher_iris.csv" --testing split --split 0.8,0.2 --classifier dt -t 0.95 -n 10 -m 5

Separate train and test CSVs

Example: MNIST

# Python
python bap.py -c examples/configs/config_mnist.toml

# Or with flags
python bap.py --train "../../machine learning datasets/default/mnist_train.csv" --test "../../machine learning datasets/default/mnist_test.csv" --testing fixed --classifier knn --k 5 -n 5 -m 100

Julia

julia julia/bap.jl -c examples/configs/config_iris.toml
julia julia/bap.jl -c examples/configs/config_mnist.toml
julia julia/bap.jl --train "../../machine learning datasets/default/fisher_iris.csv" -c examples/configs/config_iris.toml  # override train path

Configuration

All options can be set via TOML config (preferred) or CLI flags. TOML overrides defaults; flags override TOML.

TOML structure

Parameter	TOML	Definition
`train`	`train = "path.csv"`	Training CSV (or single dataset to split)
`test`	`test = "path.csv"`	Test CSV (for testing.fixed)
`testing`	`[testing.fixed]` or `[testing.split]`	How to obtain test set
`testing.fixed`	`[testing.fixed]` + `test = "..."`	Use separate test file
`testing.split`	`[testing.split]` + `split = [0.8, 0.2]`	Split train ratio : test ratio
`classifier`	`classifier = "dt"`	`dt`, `knn`, or `svm`
`parameters`	`[parameters]` + `k = 5`	Classifier hyperparameters (e.g. `k` for KNN)
`distance`	`distance = "euclidean"`	Distance metric for KNN
`goal.t`	`[goal]` + `t = 0.95`	Accuracy threshold (0–1)
`direction`	`[direction.forward]` or `[direction.backward]`	Forward (additive) or backward (subtractive)
`splits`	`splits = 1`	Number of train/test splits
`n`	`n = 10`	Iterations per split
`m`	`m = 5`	Cases added/removed per iteration
`sampling`	`[sampling.stratified]` or `[sampling.random]`	Sampling method
`seed`	`seed = 42`	PRNG seed
`output_dir`	`output_dir = "results"`	Output directory

CLI flags (Python)

Flag	Description
`-c`, `--config`	TOML config file
`--train`	Training CSV
`--test`	Test CSV (for fixed)
`--testing`	`fixed` \| `split` \| `cv`
`--split`	Train,test ratio, e.g. `0.8,0.2`
`--classifier`	`dt` \| `knn` \| `svm`
`--k`	K for KNN (default 3)
`--distance`	Metric for KNN
`-t`, `--threshold`	Accuracy threshold
`--direction`	`forward` \| `backward`
`--splits`	Number of splits
`-n`, `--iterations`	Iterations per split
`-m`	Cases per iteration
`--sampling`	`random` \| `stratified`
`--seed`	Random seed
`-o`, `--output-dir`	Output directory

Data format

CSVs must have a class column whose header matches class, label, or target (case-insensitive).
All other columns are features (column order may follow your benchmark file, e.g. fisher_iris.csv).

fisher_iris.csv: class column
mnist_train.csv, mnist_test.csv: label column

Classifiers

Code	Classifier
`dt`	Decision Tree
`knn`	K-Nearest Neighbors
`svm`	Support Vector Machine (RBF)
`hb_vis`	Hyperblock (VisCanvas-style)
`hb_dv`	Hyperblock (DV-style, interval-based)

Output

Results are written to {output_dir}/bap_{timestamp}/ (default results/bap_YYYYMMDD_HHMMSS/).

Exported case and hyperblock CSVs use the same tabular shape as the shared datasets (e.g. computing/machine learning datasets/default/fisher_iris.csv):

One header class (lowercase), case label for data rows.
One column per attribute (same names as the training CSV). No separate *_min / *_max columns and no extra ID column.
split_N/converged_exp_{id}_seed{seed}.csv – converged training cases: class holds the dataset label (e.g. Setosa).
split_N/converged_exp_{id}_seed{seed}_hyperblocks.csv – when using hb_vis or hb_dv: two rows per hyperblock. Each row has the same attribute columns; values are the box minimum (…__bottom) and maximum (…__top) corners. The class cell encodes label, HB id, and edge, e.g. Setosa__HB0__bottom / Setosa__HB0__top. Any __ in the dataset label is replaced by _ so the suffix pattern stays parseable.

Other output:

config.txt – Settings used
statistics.csv – Aggregate statistics (mean/min/max cases, convergence rate, etc.)

Every converged result CSV has a matching _hyperblocks.csv in the same directory when using hyperblock classifiers.

Algorithm (summary)

Set PRNG seed
For each split: load data (fixed test or split)
For each iteration: start with empty set (forward) or full set (backward)
While accuracy < threshold and cases remain: add/remove m cases via sampling
Record converged subsets and compute statistics

Bidirectional Processing Definition Notes

This repository originally exposed BAP behavior through rebuild.py; that interface has now been folded into the TOML/CLI model used by bap.py and julia/bap.jl.

Parameter mapping

Legacy --data maps to train
Legacy --test-data maps to test with testing.fixed
Legacy train/test split flags map to testing.split with split = [train_ratio, test_ratio]
Legacy --action {additive,subtractive} maps to direction {forward,backward}
Legacy --iterations maps to n
Legacy --threshold maps to goal.t

Formal algorithm input

Core BAP input fields are:

train, test/testing, classifier, parameters, distance
goal.t, direction, splits, n, m
sampling, seed

Formal algorithm output

Run configuration export (config.txt)
Converged case-set CSV artifacts per successful iteration
Aggregate statistics including convergence rate and sureness-related measures

Procedure (expanded)

Initialize PRNG with seed.
Repeat for each split:
- Build train/test partitions (testing.fixed or testing.split).
Repeat n times:
- Start with empty set (direction.forward) or full train set (direction.backward).
- Train/test until threshold goal.t is met, adding/removing m cases each step.
- Mark iteration as failed if no cases remain before meeting threshold.
- Persist converged set and increment seed on success.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bidirectional Active Processing (BAP) Implementation

Requirements

Python

Julia

Quick Start

Single CSV (train/test split)

Separate train and test CSVs

Julia

Configuration

TOML structure

CLI flags (Python)

Data format

Classifiers

Output

Algorithm (summary)

Bidirectional Processing Definition Notes

Parameter mapping

Formal algorithm input

Formal algorithm output

Procedure (expanded)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Bidirectional Active Processing (BAP) Implementation

Requirements

Python

Julia

Quick Start

Single CSV (train/test split)

Separate train and test CSVs

Julia

Configuration

TOML structure

CLI flags (Python)

Data format

Classifiers

Output

Algorithm (summary)

Bidirectional Processing Definition Notes

Parameter mapping

Formal algorithm input

Formal algorithm output

Procedure (expanded)