Skip to content

The Structural Dynamics of Flow: finally a formalized modular transport layer -> tractor.ipc 😎 #375

Merged
goodboy merged 74 commits intomainfrom
structural_dynamics_of_flow
Jul 13, 2025
Merged

The Structural Dynamics of Flow: finally a formalized modular transport layer -> tractor.ipc 😎 #375
goodboy merged 74 commits intomainfrom
structural_dynamics_of_flow

Conversation

@goodboy
Copy link
Owner

@goodboy goodboy commented Apr 11, 2025

Although most of this work transpired on an alt (and now quite
distributed) set of (sometimes-private) git hosting services, we
figured getting some lurker hype going again on the project couldn't
hurt.

This initial tractor.ipc effort was spearheaded by our very own
@guilledk surrounding their work on a new linux-only high-performance
impl of a shared-mem ring-buffer with native trio async blocking
support/semantics using eventfd 😎

The longstanding original idea was to get a more generic and
performant, soft-real-time targeting, ShmList-like interface which
does not require read/write event signalling over another IPC
transport (like TCP as was the case with the current state of things
in #338).

That ringbuf code will now land later on top of this history since
I've made a buncha of reworks and refinements to that original attempt
at modularizing our IPC transport layering.

Those original ideas along with many others are being published in
a hot new lib which
you should all checkout once it refines/matures a bit 😉

Details on what's actually landing here below!


The grand .ipc-sub-system summary

Detailed writeup coming later.. get your tickets at the box
office.

TLDR: this intros a tractor.ipc subpkg which begins our attempt at
being IPC protocol agnostic from the perspective of all our
higher-level RPC APIs, namely:

  • ActorNursery.start_actor() -> Portal:
  • Portal.open_context() -> Conext:
  • Context.open_stream() -> MsgStream:

That means including a new type-protocol MsgTransport and (for now
given our lone msgpack interchange) a top-most implementation
MsgpackTransport which contains,

  • bulk of of the (original-prior) implementation for msg-IO over
    a socket iface with, and as prior, the socket assumed to be already
    wrapped-n-delivered ( for/by trio ) in a trio.abc.Stream.

  • impls for each proto backend: MsgpackUDSStream & MsgpackTCPStream

  • a new Address type-proto which equivalents allows conversion
    to/from each proto-specific (msgpack) wire-compat address-data-type
    (normally some tuple) with concrete impls to match:

    • TCPAddress -> tuple[str, int]
    • UDSAddress -> tuple[str, str]
  • with what lands here you can toggle the global tpt proto with,

    • setting a key using on the internal
      _state._def_tpt_proto: TransportProtocolKey
    • OR more properly passing enable_transports: list to
      open_root_actor(), though right now we only support a single
      entry until more testing is written Xp

Outstanding feat requests this relates-to/resolves


Still pending bugs and/or (minor) requirements

  • enable_transports=['uds'] breaks subactor debugging XD
    • it appears this is due to 486f4a3 which breaks
      ._state._runtime_vars['_root_mailbox'] passing to subactors
    • fixed in 28e32b8 which now passes _enable_tpts down through
      subactors.
    • (also in 28e32b8) add _root_addrs to the rtvars to get the
      same semantic as the registry key, and implying that the root can
      (eventually) be contacted on multiple addrs over varying
      transports.

moved to #380

  • a (couple?) suite(s) of tests to audit .ipc._server
    primitives
  • decide on .ipc._server types naming/polish

Idealistic todos before this lands

Gosh it'd sure be nice but I'd rather get UDS in sooner then later
sin unit-tests if it comes to it.

  • fix CI to use uv (also see follow thru below)

  • hopefully tweak wtv to get at least one run green..

    • need to add back typing-extensions for test_tooling
      audit of our stackscope integration
      • 309360d to add latest version of ^
      • cherry-picked and modded c2e7dc7 (with fix for pre .devx.debug
        subpkg history) to get .devx.test_tooling suite green.
  • factor out the Actor._stream_handler() method to a mod-level
    func either in .ipc._server or similar.

  • also move over all the Actor._peers: dict[tuple, Channel]
    (and related state) tracking to the IPCServer and change all
    dependent code to match.


Coming in follow-up patch(es)

  • see Improved .__repr__() on all primitives #381 for pretty-format follow ups.

  • see IPC subsys refinements: connection-tracking, multi-proto-endpoint-mgmt.. etc. #382 for follow-up on,

    • fill out the IPCEndpoint.peer_tpt tables (implicitly) by pre-hooking into a
      new (likely) Endpoint (class) method which first registers
      every MsgTransport/Channel prior to processing and handover to
      the RPC layer.

      • ideally in a way that let's us granular-ly know which peers
        are on which transports and which .stream_handler_tn: Nurserys
        so we can eventually do fancy stuff like connection resetting,
        peer filtering, dynamic transport-proto swapping etc.
    • multiple-transports in tandem (eventually per actor) by first ensuring
      the server layer is correct before exposing upward via
      enable_transports: list[str] to runtime-booting
      entrypoints (open_nursery()/open_root_actor()).

@goodboy
Copy link
Owner Author

goodboy commented Apr 11, 2025

Lol well first thing is getting uv in CI i suppose 😂

@goodboy goodboy self-assigned this Apr 11, 2025
@goodboy goodboy added testing IPC and transport api messaging messaging patterns and protocols labels Apr 11, 2025
@goodboy
Copy link
Owner Author

goodboy commented Apr 11, 2025

The IPC server subsys is fully factored in close-to-original form as of now 🥳

I'd still like to get the per-IPCEndpoint peer-Channel tracking going along with a collections.ChainMap view into all per-actor instances for a similar read-interface to what (was Actor._peers) is IPCServer._peers: dict[tuple, Channel].

So that would be one_ring_to_rule_them_all branch found on other remotes/git-servers (not sure if it's been pushed here yet).

goodboy added a commit that referenced this pull request Apr 26, 2025
Shared memory array API and optional tight integration with `numpy`

Landing this so many downstream major feature branches depend on it namely,
- #375
- #376
- and the eventual https://pikers.dev/goodboy/tractor/pulls/10
Base automatically changed from shm_apis to main April 26, 2025 03:20
@goodboy
Copy link
Owner Author

goodboy commented Apr 26, 2025

There changed base to main now that i just landed #338.

Copy link
Collaborator

@salotz salotz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate being added and glad to see work is being done here. I'm a bit far from this in my current engagements, but I'll cheer on the side lines :)

goodboy added 10 commits July 8, 2025 18:05
As much as is possible given we currently do some graceful
cancellation join-waiting on any connected sub-actors whenever an active
`local_nursery: AcrtorNursery` in the post-rpc teardown sequence of
`handle_stream_from_peer()` is detected. In such cases we try to allow
the higher level inter-actor (task) context(s) to fully cancelled-ack
before conducting IPC machinery shutdown.

The main immediate motivation for all this is to support unit testing
the `.ipc._server` APIs but in the future may be useful for anyone
wanting to use our modular IPC transport layer sin-"actors".

Impl deats,
- drop passing an `actor: Actor` ref from as many routines in
  `.ipc._server` as possible instead opting to use
  `._state.current_actor()` where abs needed; thus the fns dropping an
  `actor` input param are:
  - `open_ipc_server()`
  - `IPCServer.listen_on()`
  - `._serve_ipc_eps()`
  - `.handle_stream_from_peer()`
- factor the above mentioned graceful remote-cancel-ack waiting into
  a new `maybe_wait_on_canced_subs()` which is called from
  `handle_stream_from_peer()` and delivers a
  maybe-`local_nursery: ActorNursery` for downstream logic; it's this
  new fn which primarily still needs to call `current_actor()`.
- in `handle_stream_from_peer()` also use `current_actor()` to check if
  a handshake is needed (or if it was called as part of some
  actor-runtime-less operation like our unit test suite!).
- also don't pass an `actor` to `._rpc.process_messages()` see how-n-why
  below..

Surrounding ipc-server client/caller adjustments,
- `._rpc.process_messages()` no longer takes an `actor` input and
  now calls `current_actor()` instead.
- `._portal.open_portal()` is adjusted to ^.
- `._runtime.async_main()` is adjusted to the `.ipc._server`'s removal
  of `actor` ref passing.

Also,
- drop some server `log.info()`s to `.runtime()`
As per the outstanding TODO just above the redic `setattr()` loop in
`Actor._from_parent()`!!

Instead of all that risk-ay monkeying, add detailed comment-sections
around each explicit assignment of each `SpawnSpec` field, including
those that were already being explicitly set.

Those and other deats,
- ONLY enable the `.devx._debug` (CHERRY-CONFLICT: later changed to
  `.debug._tty_lock`) module from `Actor.__init__()` in the root actor.
- ONLY enable the `.devx.debug._tty_lock` module from `Actor.__init__()`
  in the root actor.
- add a new `get_mod_nsps2fps()` to replace the loop in init and assign
  the initial `.enable_modules: dict[str, str]` from it.
- do `self.enable_modules.update(spawnspec.enable_modules)` instead of
  an overwrite and assert the table is by default empty in all
  subs.
Buncha either new AOTc lib whls and they added an `upload-time` field.
Actually applying the input it in the root as well as all sub-actors by
passing it down to sub-actors through runtime-vars as delivered by the
initial `SpawnSpec` msg during child runtime init.

Impl deats,
- add a new `_state._runtime_vars['_enable_tpts']: list[str]` field set
  by the input param (if provided) to `.open_root_actor()`.
- mk `current_ipc_protos()` return the runtime-var entry with instead
  the default in the `_runtime_vars: dict` set to `[_def_tpt_proto]`.
- in `.open_root_actor()`, still error on this being a >1 `list[str]`
  until we have more testing infra/suites to audit multi-protos per
  actor.
- return the new value (as 3rd element) from `Actor._from_parent()` as
  per the todo note; means `_runtime.async_main()` will allocate
  `accept_addrs` as tpt-specific `Address` entries and pass them to
  `IPCServer.listen_on()`.

Also,
- also add a new `_state._runtime_vars['_root_addrs']: list = []` field
  with the intent of fully replacing the `'_root_mailbox'` field since,
  * it will need to be a collection to support multi-tpt,
  * it's a more cohesive field name alongside `_registry_addrs`,
  * the root actor of every tree needs to have a dedicated addr set
    (separate from any host-singleton registry actor) so that all its
    subs can contact it for capabilities mgmt including debugger
    access/locking.
- in the root, populate the field in `._runtime.async_main()` and for
  now just set '_root_mailbox' to the first entry in that list in
  anticipation of future multi-homing/transport support.
Both via a post-init method to validate the original input `._host: str`
and in `.is_valid` to ensure the host-part isn't something, esoteric..
@goodboy goodboy force-pushed the structural_dynamics_of_flow branch from fbc9325 to 27e6ad1 Compare July 11, 2025 02:05
@goodboy
Copy link
Owner Author

goodboy commented Jul 11, 2025

Welp just force pushed after a bunch of rebasing to get in as much of the final impl minus testing (which will come in a follow up patchset).

goodboy added 5 commits July 13, 2025 13:23
See docs: https://docs.astral.sh/uv/guides/integration/github/

Summary,
- drop `mypy` job for now since I'd like to move to trying `ty`.
- convert sdist built to `uv build`
- just run test suite on py3.13 for now, not sure if 3.12 will break due
  to the eg stuff or not?
Oddly my env was borked bc a missing sub-dep (`typing-extensions`
apparently not added by `uv` for `stackscope`?) and then `stackscope`
was silently failing import and caused the shield-pause test to also
fail (since it couldn't match the expected `log.devx()` on console). The
import failure is not very explanatory due to the `log.warning()`;
change it to `.error()` level.

Also, explicitly import `_sync_pause_from_builtin` in
`examples/debugging/restore_builtin_breakpoint.py` to ensure the ref is
exported properly from `.devx.debug` (which it wasn't during dev of the
prior commit Bp).
Bc this history is pre `.devx.debug` subpkg creation..
@goodboy
Copy link
Owner Author

goodboy commented Jul 13, 2025

Ehyyo, we green in CI with uv 😎

@goodboy
Copy link
Owner Author

goodboy commented Jul 13, 2025

See #381 for .__repr__() formatting follow up.

@goodboy
Copy link
Owner Author

goodboy commented Jul 13, 2025

New badge links added to the readme.. though let's see if it flips green 🙏🏼

@goodboy
Copy link
Owner Author

goodboy commented Jul 13, 2025

Eyyyo***2, guess she's ready B)

@goodboy goodboy merged commit ba384ca into main Jul 13, 2025
2 checks passed
@goodboy goodboy deleted the structural_dynamics_of_flow branch July 13, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api IPC and transport messaging messaging patterns and protocols testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants