Baryon is a domain-specific language (DSL) and transpiler toolkit for bioinformatics workflow definition and execution. It allows users to write workflows in a unified syntax and transpile them into various target languages and workflow engines, such as Bash, Python, R, Galaxy, Nextflow, and Streamflow.
You can find the detailed language specification in the SPECIFICATION.md file. It describes the syntax, semantics, and structure of the Baryon language, including how to define workflows, metadata, implementation blocks, and parameters.
- Unified Workflow DSL: Write your workflow once, target multiple platforms.
- Multiple Transpilers: Generate code for Bash, Python, R, Galaxy, Nextflow, and Streamflow.
- Extensible Architecture: Easily add new target languages or workflow engines.
- Integrated Testing: Unit and integration tests for core components.
- Go 1.18+ installed on your system (https://golang.org/dl/)
Clone the repository:
git clone https://github.com/yourusername/baryon-lang.git
cd baryon-lang
go build -o baryon-lang main.goTranspile a workflow file to a target language:
./baryon-lang -input examples/enrichment_analysis.bala -target pythonSupported targets: bash, python, r, galaxy, nextflow, streamflow
internal/ast/— Abstract syntax tree definitionsinternal/lexer/— Lexer for the Baryon DSLinternal/parser/— Parser for the Baryon DSLinternal/transpiler/— Transpilers for supported targetsexamples/— Example workflow filesmain.go— CLI entry point
Contributions are welcome! Please open issues or submit pull requests.
This documentation will guide you through the concepts, structure, and tooling of baryon-lang, a language for defining reproducible bioinformatics workflows. The tutorial is based on actual code and examples from the baryon-lang repository, with a focus on practical insights.
baryon-lang is a domain-specific language (DSL) designed to author, check, and transpile workflow definitions for bioinformatics, especially in contexts where reproducibility and platform-independence are critical. Its syntax is Lisp-inspired (S-expressions), and it is intended to be transpiled to languages such as R, Python, Nextflow, or bash, maximizing portability and collaboration.
You can now use the CLI to check or transpile baryon files:
./baryon-lang -input examples/enrichment_analysis.bala -lang r
./baryon-lang -input another_program.bala -lang galaxyYou can also get the latest version of baryon-lang from the GitHub releases section.
Supported values for -lang include: r, python, bash, nextflow,
galaxy and streamflow.
baryon-lang files are structured as S-expressions (like Lisp), where every construct is wrapped in parentheses. Here's the opening of a typical file:
(bala enrichment_analysis (
(matrix_file string (desc "Path to the CSV file."))
...
))Key S-expression elements:
- Program node:
(bala program_name (...))This will be the name of the program that will be transpiled. - Parameters:
(param_name type (desc "..."))This contains the list of parameters for you contained function. - Implementation blocks: e.g.,
(run_docker ...)This expressess the core of your function that will be run inside a Docker container. It will specify how to interact with the parameters in docker. - Descriptions:
(desc "...")provides human-readable documentation for the program or parameters. - Outputs:
(outputs ...)specifies expected outputs, their types, and locations.
Parameters are defined with a name, type, and optional metadata. Example:
(matrix_file string (desc "CSV file of differential expression results."))
(species (enum ("hsapiens" "mmusculus" "dmelanogaster"))
(desc "Species being analyzed."))
(separator character (desc "Separator character in the table."))
(max_terms number (desc "Max terms in the output."))Non-obvious detail:
- For enums, values are provided as a list of quoted strings.
- The type can be
string,number,character,enum, etc.
The core logic is described in implementation blocks. The most common is
run_docker, which specifies a Docker image, volume mappings, and command-line
arguments.
Example:
(run_docker
(image "repbioinfo/singlecelldownstream:latest")
(volumes (parent_folder "/scratch"))
(arguments
"Rscript /home/enrichment_analysis.r"
matrix_file
species
source
separator
max_terms
)
)argumentscan mix literals and references to parameters (unquoted).- Volume source can be a parameter name; the transpiler will resolve it.
Here is a minimal but complete program, as in
examples/enrichment_analysis.bala:
(bala enrichment_analysis (
(matrix_file string (desc "..."))
(species (enum ("hsapiens" "mmusculus" "dmelanogaster")) (desc "..."))
(parent_folder string (desc "..."))
(separator character (desc "..."))
(max_terms number (desc "..."))
(run_docker
(image "repbioinfo/singlecelldownstream:latest")
(volumes (parent_folder "/scratch"))
(arguments ...))
(desc "Process results and perform pathway enrichment.")
(outputs (scratch directory /scratch))
))- Comments start with
;and are ignored by the parser.
Before transpiling, always check your baryon file for syntax errors:
./baryon-lang -input myprogram.bala -checkThe tool will print a summary or detailed error messages (including line/column).
To generate code in your target language:
./baryon-lang -input enrichment_analysis.bala -lang pythonThis produces a .py file (or .R, .sh, .nf, etc.) with:
- Function definitions matching the baryon program and its parameters
- Docstrings or comments from
desc - Parameter validation (type checks, enum enforcement)
- Secure handling of file paths and Docker calls
- The transpilers generate not only code, but also validation and security checks.
baryon-lang validates enums at transpile-time and at runtime in the target language:
(species (enum ("hsapiens" "mmusculus")) (desc "..."))Technical detail:
- In Python, this becomes a check like
if species not in ['hsapiens', 'mmusculus']: raise ... - In R, the function checks
species %in% c("hsapiens", "mmusculus")
This prevents accidental mis-specification and improves reproducibility.
The outputs section describes expected outputs, their types, and locations:
(outputs (scratch directory /scratch))Note:
- This is for documentation and for downstream workflow integration. The actual code handling output directories is generated based on these specifications. Please refer to galaxy output type documentation.
You can add new parameter types or implementation blocks by editing the Go source:
- New types: update
internal/astand relevant transpilers - New transpilers: implement the
Transpilerinterface frominternal/transpiler/transpiler.go
- Run
go test ./...to execute the comprehensive test suite (lexer, parser, transpilers). - Use the parser’s error messages for debugging (they report line/column precisely).
Using the prompt within the file named prompt.txt, users can get help from LLM to generate a scratch version of a bala file based on their script. The bala file needs to be checked and verified by the user.