Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ea12f58
add configuration options for namespaces
jhidding Oct 20, 2025
f956254
work namespaces into the data model
jhidding Oct 28, 2025
dd940ad
halfway implementing new markdown reader
jhidding Oct 29, 2025
ae24029
Merge branch 'main' into 73-namespaces
jhidding Oct 29, 2025
ee22c3f
Merge branch 'main' into 73-namespaces
jhidding Oct 29, 2025
a5e450c
generic function for delimited block reading
jhidding Oct 29, 2025
387610f
...
jhidding Oct 30, 2025
aca86d1
improve config code
jhidding Nov 1, 2025
486817a
make tests pass again
jhidding Nov 1, 2025
46321c3
implement markdown reader
jhidding Nov 2, 2025
affa80e
implement file processing back into new markdown reader
jhidding Nov 2, 2025
789b6ae
add test for ignore block
jhidding Nov 3, 2025
2d5bdb1
add tests
jhidding Nov 3, 2025
476ceea
implement namespaces object
jhidding Nov 3, 2025
0e00e5f
...
jhidding Nov 5, 2025
95f8c4d
new readers seem to be working somewhat; namespaces work as advertised
jhidding Nov 6, 2025
174fadc
get test coverage for readers up to 100%
jhidding Nov 6, 2025
832a6f7
create architecture.md; increase test coverage
jhidding Nov 6, 2025
206a9f6
move document to interface module; start work on code reader
jhidding Nov 7, 2025
91f844e
reimplement code reader and tests
jhidding Nov 9, 2025
d534ee5
reworking commands to use click... (in progress)
jhidding Nov 10, 2025
566a894
create new rich click main interface
jhidding Nov 11, 2025
d2ffc6d
update db upon reading files
jhidding Nov 13, 2025
c232b7e
track dependencies when stitching
jhidding Nov 13, 2025
2c1a1d8
port the new command to click
jhidding Nov 14, 2025
02be917
port reset command to click and new architecture
jhidding Nov 14, 2025
e82907a
bring rest of commands back
jhidding Nov 15, 2025
9150638
add manual
jhidding Nov 15, 2025
0d1bf9a
move architecture docs
jhidding Nov 15, 2025
56f6b71
include man page in wheel
jhidding Nov 15, 2025
6e0e51f
test config module
jhidding Nov 16, 2025
ee7430c
more tests
jhidding Nov 17, 2025
b7d6372
fix tangle and stitch commands loading config
jhidding Nov 17, 2025
a0d6a3c
fix shebang and spdx-license hooks; more small fixes; migrate some ol…
jhidding Nov 18, 2025
7724f2b
migrate some hook tests; make sure that hook classes are singletons
jhidding Nov 19, 2025
0a29ce6
add hooks field to document
jhidding Nov 19, 2025
a8f10df
add hooks field to document data class and update tests
jhidding Nov 19, 2025
a8fa7c7
introduce Context type holding state for a single session
jhidding Nov 19, 2025
39e010c
get build test working
jhidding Nov 19, 2025
29ad7e5
fix sync and daemon commands
jhidding Nov 20, 2025
43f7cd0
fix daemon, all tests passing
jhidding Nov 20, 2025
a1a2a37
track generated man-pages
jhidding Nov 20, 2025
9c1db53
add authors sections to man page
jhidding Nov 20, 2025
c5d6b78
fix mypy complaints; type check task hook attributes
jhidding Nov 20, 2025
a4d7cbb
add future import for python < 3.14
jhidding Nov 20, 2025
4a3a4ee
add future import for python < 3.14; context.py
jhidding Nov 20, 2025
b8c357d
use whatever line-ending is there in code_block.py
jhidding Nov 20, 2025
d682089
test_code_block line endings
jhidding Nov 20, 2025
e920ad1
just use unix line-endings everywhere
jhidding Nov 20, 2025
eb1bc9e
remove more os.linesep use
jhidding Nov 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .entangled/filedb.json

This file was deleted.

Empty file removed .entangled/filedb.lock
Empty file.
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
fail-fast: false
matrix:
os: ["windows-latest", "ubuntu-latest", "macos-latest"]
python-version: ["3.12", "3.13"]
python-version: ["3.12", "3.13", "3.14"]

runs-on: ${{matrix.os}}

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ __pycache__
coverage.*
dist
site

26 changes: 25 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: test
.PHONY: test man-pages

test:
uv run coverage run --source=entangled -m pytest
Expand All @@ -8,3 +8,27 @@ test:

docs:
uv run mkdocs build

define test_template =
.PHONY: test-$(1)

test-$(1):
uv run pytest test/$(1) --cov=entangled/$(1)
uv run coverage xml
endef

modules = readers io iterators model interface config commands

$(foreach mod,$(modules),$(eval $(call test_template,$(mod))))

.PHONY: test-modules

test-modules:
uv run pytest $(modules:%=test/%) --cov=entangled -x

man-pages: man/entangled.1

man/entangled.1: docs/man-pages/english.md
mkdir -p $(@D)
pandoc -s -t man $< -o $@

26 changes: 6 additions & 20 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,35 +7,21 @@ This is internal API documentation. This may be of use if you want to use Entang
heading_level: 3

## Document structure
::: entangled.document
::: entangled.interface
options:
heading_level: 3

## Readers
::: entangled.markdown_reader
::: entangled.readers
options:
heading_level: 3

::: entangled.code_reader
## I/O
::: entangled.io
options:
heading_level: 3

## FileDB
::: entangled.filedb
## Model
::: entangled.model
options:
heading_level: 3

## Transactions
::: entangled.transaction
options:
heading_level: 3

## Parsing
::: entangled.parsing
options:
heading_level: 3

## Properties
::: entangled.properties
options:
heading_level: 3
161 changes: 155 additions & 6 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,163 @@
# Architecture
Entangled Architecture
======================

## Line processor
All parsing in Entangled is done on a per-line basis using a primitive line processor `mawk`. This makes Entangled both simple in design and easy to configure.
Entangled is organised into several sub-modules with clearly defined responsibilities:

## Transactions
Whenever we write a file, this is done through the `Transaction` class.
- `commands`, all sub-commands for the command line.
- `config`, all data types related to configuring Entangled.
- `hooks`, the hook subsystem.
- `interface`, interfaces the model and io with commands.
- `io`, manages file I/O.
- `iterators`, support functions for iterators.
- `model`, the data model for Entangled, also contains the tangler.
- `readers`, reading data into the model.

Imports in Python need to be acyclical, as follows:

``` mermaid
graph TD;
iterators --> model;
config --> hooks;
config --> interface;
config --> model;
config --> readers;
hooks --> readers;
iterators --> readers;
model --> readers;
readers --> interface;
io --> interface;
hooks --> interface;
interface --> commands;
```

Commands
--------

We use `click` to make the command line interface, and `rich` and `rich-click` to make it pretty. Every command is encapsulated in a transaction:

```python
with transaction() as t:
...
```

This transaction is the front-end for all I/O based operations. Communication with the user is all handled through the `logging` system.

Note: in the past we used `argh` to parse arguments, but this package doesn't have the same level of support from the community.

Config
------

Config is read from `entangled.toml` using the `msgspec` package. The config is separated into an in-memory representation `Config`, and a loadable structure `ConfigUpdate`. We load an update from `entangled.toml` or from a YAML header at the top of a Markdown file. This `ConfigUpdate` is merged with an existing `Config` using the `|` operator. This way we can stack different layers of configuration on top of each other. We can even have different Markdown dialects between files working together.

Hooks
-----

A hook is a class derived from `HookBase`, where you can override the following. A nested `Config` class that can be loaded by `msgspec`:

```python
class Config(msgspec.Struct):
pass
```

An `__init__` method:

```python
def __init__(self, config: Config):
super().__init__(config)
```

The `check_prerequisites` method checks that prerequisites are met. For instance, the build hook can use this to see that GNU Make is available.

```python
def check_prerequisites(self):
pass
```

The `on_read` method is called right after a code block is being read. Example: `quarto_attributes` uses this method to translate the YAML mini header into code block attributes.

```python
def on_read(self, code: CodeBlock):
pass
```

The `pre_tangle` method is run after all the Markdown is read, but before any output is written. Here you can define any additional output targets or modify the reference map in place.

```python
def pre_tangle(self, refs: ReferenceMap):
pass
```

The `on_tangle` method lets you add actions to the I/O transaction.

```python
def on_tangle(self, t: Transaction, refs: ReferenceMap):
pass
```

Lastly, `post_tangle` lets you do clean-up after tangle is complete. I've never used this.

```python
def post_tangle(self, refs: ReferenceMap):
pass
```

Hooks can be used to implement many things that feel to the user like features.

I/O
---

Offers a virtualization layer on top of all file IO. All IO in Entangled is organized into transactions. When conflicts are found that could endanger your data integrity, Entangled will fail to run the entire transaction. For instance, if you have a markdown file called `model.md` which generates a file called `model.py`, and you have edits in both of them, either `entangled tangle` or `entangled stitch` will see that and refuse to overwrite changes, unless you run with `-f/--force`.

A file database is kept containing MD5 hashes of all input files, to check that content hasn't changed without Entangled knowing about it. All input (and their hashes) are cached in `entangled.virtual.FileCache`.

On the top level, all I/O is encapsulated in transactions:

```python
with transaction() as t:
t.read(...)
t.write(...)
...
```

At the end of a transaction all write actions are checked against a database of known file contents. If any conflicts are found, the entire transaction is not executed (unless run with `-f/--force`).
Iterators
---------

Internally, Entangled makes heavy use of generators to read files and process text line-by-line. Because both the `model` and `readers` modules use these operations, they need to be in a separate module. Crucially, this module contains the `Peekable` iterator, which allows us to peek into the future of an iterator by caching a single element.

Model
-----

The `model` contains some of the core functionality of Entangled. It defines the in-memory representation of a Markdown document, as well as the graph representing the code blocks and their references.

- `ReferenceName` contains a `namespace: tuple[str, ...]` and `name: str`, representing a named code entity that may consist of multiple linked code blocks by the same name.
- `ReferenceId` is a unique identifier for every code block. This stores the reference `name`, but also its Markdown source `file` and a `ref_count` for when there are multiple code blocks of the same name.
- `Content` is either `PlainText` which is ignored by Entangled unless stitching, or a `ReferenceId`.
- `CodeBlock` contains all information on a code block including enclosing lines (i.e. the lines containing the three back-tics), its attributes, indentation and the origin of the content.
- `ReferenceMap` fundamentally acts as a `Mapping[ReferenceId, CodeBlock]`, but also contains an index for searching by `ReferenceName` or target file.
- `Document` collects configuration, a dictionary of content and the reference map for ease of use.

Readers
-------

Readers are implemented as `Callable[[InputStream], Generator[RawContent, None, T]]`. Here, `RawContent` is a form of `Content` where we're still dealing with `CodeBlock`s directly instead of `ReferenceId`. The third type-argument to `Generator` is kept abstract here. We can use it to pass values from one generator to the other. For instance (a simplified version):

```python
def read_yaml_header(inp: InputStream) -> Generator[RawContent, None, ConfigUpdate]:
...
yield plain_text
return config_update

def read_markdown(inp: InputStream, refs: ReferenceMap) -> Generator[RawContent, None, Config]:
config_update = yield from read_yaml_header(inp)
config = Config() | config_update
yield from rest_of_markdown(config, inp, refs)
return config
```

Here we have a `read_yaml_header` reader that emits `PlainText`, but also parses the YAML header into a `ConfigUpdate`. We subsequently use that configuration to determine how to further read the rest of the Markdown file. This way we can completely process a Markdown file in a single pass, buffering only a single line at a time.

Test Coverage
=============

Unit tests for each module should cover most of that module. The `Makefile` contains test targets for every module that measure only the coverage on that module.

Loading