Skip to content

Simplify luos_engine compilation as a shared lib#474

Draft
nicolas-rabault wants to merge 58 commits intomainfrom
feat/shared_lib
Draft

Simplify luos_engine compilation as a shared lib#474
nicolas-rabault wants to merge 58 commits intomainfrom
feat/shared_lib

Conversation

@nicolas-rabault
Copy link
Copy Markdown
Member

@nicolas-rabault nicolas-rabault commented Feb 1, 2024

By submiting this PR, you agree with the associated license MIT) and with our Contributor License Agreement (CLA).

Before to begin

Thank you for contributing to the Luos project!

Before to begin, please follow these steps:

  • Ensure that this PR is not a duplicate.

Feel free to read the Luos contribution's guidelines and the documentation page to have more insight about how to contribute to Luos.

PR Description section

Description and dependencies

Please include here a summary of the changes and the related issue. List any dependencies that are required for this change.

Changes

Please choose the relevant options:

  • New feature (non-breaking change which adds functionality)

Related issue(s)

Provide a list of the related issues that will be fixed by this PR.


WARNING: Do not edit the checklist below.


Developer section

  • [Documentation] is up to date with new feature
  • [Tests] are passed OK (non regression, new features & bug fixes)
  • [Code Quality] please check if:
    • Each function has a header (description, inputs, outputs)
    • Code is commented (particularly in hard to understand areas)
    • There are no new warnings that can be corrected
    • Commits policy is respected (constitancy commits, clear commits comments)

QA section

  • [Review] tests for new features have been reviewed
  • [Changelog] is up-to-date with expected tags
    🆕 Feature: [Feature] Description...
    🆕 Added: [Feature] Description...
    🆕 Changed: [Feature] Description...
    🛠️ Fix: [Feature] Description...

@nicolas-rabault nicolas-rabault added this to the 3.1.0 milestone Feb 1, 2024
@nicolas-rabault nicolas-rabault self-assigned this Feb 1, 2024
@nicolas-rabault nicolas-rabault changed the base branch from main to rc_3.1.0 February 1, 2024 13:55
@nicolas-rabault nicolas-rabault changed the title Simplify luos_engine as a shared lib compilation Simplify luos_engine compilation as a shared lib Feb 1, 2024
@nicolas-rabault
Copy link
Copy Markdown
Member Author

@nicolas-rabault nicolas-rabault removed this from the 3.1.0 milestone Feb 15, 2024
Base automatically changed from rc_3.1.0 to main February 18, 2026 14:47
@sonarqubecloud
Copy link
Copy Markdown

nicolas-rabault and others added 11 commits April 10, 2026 18:57
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On BCM-family PL011 (RPi), tcdrain() blocks ~7.9 ms per call regardless
of frame size. The driver polls FR.BUSY which is sticky on this hardware;
TIOCOUTQ returns 0 immediately after write(), confirming the kernel TX
buffer is drained well before tcdrain returns. Measured on a 10-byte
frame at 1 Mbps (100 µs wire time): tcdrain=7578 µs p50, write=20 µs
p50 — tcdrain accounted for 96% of the P1 RTT floor.

Replace tcdrain() with clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME)
for (size * 10 bits * ns_per_bit) + 50 µs margin. The per-bit timing
uses the existing timeout_ns_per_bit, so higher baudrates work without
further changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After a valid frame's CRC, the Robus state machine sets
ctx.rx.callback = Recep_Drop to ignore trailing garbage until
Recep_Timeout fires DEFAULT_TIMEOUT bit-times later. This assumes
an inter-frame silence at least as long as the timeout.

On RS485 half-duplex without bus arbitration (the Linux HAL has
GetTxLockState stubbed to false), two uncoordinated nodes can TX
back-to-back with a gap far shorter than that. Observed in S1:
Tartine's bg_pub and its CMD_ECHO_REPLY arrived 14 µs apart — all
ten reply bytes were dropped by Recep_Drop, producing a 300 ms
timeout on iter 528.

When the RX thread reads a byte while the callback is Recep_Drop,
immediately call Recep_Reset so the byte is treated as the start
of a new frame. Fix is Linux-HAL-local; MCU behavior is untouched.

Verified via event trace: echo bytes matched cleanly during the
timeout (no TX collision), and reply bytes were fully received
within 700 µs but not dispatched until the next iter's TX kicked
the state machine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors the native gate_wscom project structure but runs the Luos
network over Robus (RS485 on RPI GPIO via the LINUX Robus HAL). The
Gate exposes itself to external clients over a WebSocket pipe bound
on all interfaces, so pyluos can connect from anywhere on the LAN.

Two build envs: rpi_robus (default, real RS485 hardware) and rpi_ws
(WebSocket broker for dev without hardware).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
port_manager.h conditionally declares PortMng_WatchdogCheck() when
NORT is defined, but on the LINUX HAL NORT is defined inside
robus_hal_config.h (pulled in via robus_hal.h). The previous include
order parsed port_manager.h first, so the prototype was skipped and
robus.c/port_manager.c fell back to an implicit declaration once
NORT got defined later by robus_hal.h. Reorder the includes so the
HAL header is parsed first and NORT is visible to port_manager.h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align assignments, wrap multi-line macro, and fix whitespace to match
the CI clang-format 1.5.0 expectations on files touched by this branch
(engine/IO, robus_network/HAL/LINUX, robus_network/src).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolas-rabault and others added 10 commits April 27, 2026 16:29
Add a Robus support natively running on Raspberry PI
Adds a writable 128-byte buffer behind the existing s_url pointer and
exposes ws_hal_set_broker / Ws_SetBroker to override the compile-time
WS_NETWORK_BROKER_ADDR default. Must be called before Ws_Init; no effect
afterwards (Ws_Init captures the URL during mg_ws_connect).

Additive change: existing callers that rely on the compile-time default
are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidates the three WIP commits:
- Add cmult() with visibility("default") as an exported smoke-test symbol
- Switch platformio default env to native_lib
- Iterate shared_lib_build.py (network archive discovery, LINKFLAGS)
Passing libluos_engine.a directly to the linker produced a dylib/so
with zero symbols: a static archive is only searched for already-
referenced undefined symbols, and the output library starts empty so
no archive members are ever pulled in.

Use -Wl,--whole-archive / -Wl,--no-whole-archive on Linux and Windows
and -Wl,-force_load on macOS so every object in libluos_engine.a is
linked into the resulting shared library.

Ref: https://stackoverflow.com/questions/77983254/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-review follow-ups from Task 1: the package-data entry was dead
config (setuptools installs .py files automatically), and the root
.gitignore was missing a .venv*/ rule so the Python binding venv
could accidentally be staged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolas-rabault and others added 29 commits April 27, 2026 17:05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code-review follow-up: document that the cdef intentionally exposes
only the named-field half of the engine's packed union so the next
reader doesn't think unmap[3] was accidentally dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also fix NameError in Service.__init__ (handle → self._handle).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Appends target_mode_t, entry_mode_t, luos_type_t, routing_table_t
(opaque), search_result_t with [...] array, RTFilter_* signatures,
and rtb_* C helpers to _build.py so Python can inspect routing table
entries without exposing the packed union layout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two findings during debugging:

1. The receiver callback fires for internal engine messages (e.g. END_DETECTION,
   cmd=4) during the detection phase triggered by find_peer.  Handlers that care
   only about application traffic must skip reserved commands (0–42); user-defined
   commands start at 43.  Both tests now filter msg.cmd < 43.

2. The message allocator ring has MAX_MSG_NB = 2 * MAX_LOCAL_SERVICE_NUMBER = 10
   slots.  Burst-sending 10 messages without giving the loop thread a chance to
   drain overflows the ring and hits the LUOS_ASSERT in msg_alloc.c:305.  A 1 ms
   inter-send sleep (> one loop tick of 0.5 ms) is sufficient to drain each
   message before the next is queued.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…de, atexit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends the post-build action to walk network static archives and force-
load each into its own shared library (libws_network.dylib, etc.) on
macOS and Linux. Uses -undefined dynamic_lookup / --unresolved-symbols=
ignore-in-object-files so phy dylibs don't link against libluos_engine
at build time — engine symbols resolve at load time against whatever
RTLD_GLOBAL preloaded.

Also fixes a pre-existing bug in the networklibs gathering loop where
an inner `break` only let one archive per directory through.

Phase 1 consumers see no regression: libluos_engine.dylib is produced
exactly as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Ws_Init, Ws_Loop, Ws_SetBroker to the cdef and wires in the
ws_network include dir so the cffi build picks up the declarations.
Symbols are forward-declared in set_source to avoid the HAL transitive
include chain. _ffi.load_dylib now auto-preloads any co-located phy
dylibs (libws_network) so the .so resolves at dlopen time on macOS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends _registry with a PHYS list and a LoadedPhy dataclass holding
(descriptor, handle, loop). clear() now also empties PHYS so stop()
drops Python-side refs. The dylib itself remains in-process by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the happy path (preloaded ws dylib returns a CDLL) and the
error path (unknown phy basename raises LuosEngineNotFoundError).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a frozen Phy dataclass and a ws_network descriptor with a
_configure_ws hook. The hook validates kwargs (rejecting typos via
TypeError) and, when broker= is given, calls Ws_SetBroker. URL length
is checked against the C-side 128-byte buffer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds luos.load_phy(descriptor, **kwargs): runs descriptor.configure,
calls the phy's init symbol once per process, and appends its loop
callable to the registry. Rejects calls after start(), rejects unknown
kwargs, and is idempotent per descriptor name within a stop/start
cycle. A module-level _INITIALIZED_PHYS set guards against multiple
Ws_Init calls that would leak Mongoose resources.

The engine's loop thread now snapshots the phy tick list at start()
and calls each after Luos_Loop in registration order, matching the
C gate example's flow.

Re-exports luos.phy (descriptor module) and luos.load_phy from
__init__.py. Also fixes stop() to always clear _registry.PHYS even
when called without a matching start(), ensuring the conftest fixture
properly resets state between tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/_broker.py binds 127.0.0.1:0 and relays every binary frame to
every other connected client. Used by the pytest broker fixture.
Adds websockets>=12 under [project.optional-dependencies.test].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawns tests/_broker.py as a subprocess, blocks on the LISTEN stdout
line, yields the ws:// URL, and tears down with terminate/wait/kill on
session end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Module-level run(role, broker_url, q, toggle_count) function used by
multiprocessing.Process under the 'spawn' context. Handles the 'led'
and 'blinker' roles and reports observations via the queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix the ws_network topology protocol so that two Python processes can
successfully detect each other and exchange IO_STATE messages:

• _broker.py: rewrite as topology-aware broker — tracks which clients
  have sent END (branch closed) and returns NOK to subsequent PINGs
  from the master once all peers are assigned, preventing infinite
  re-detection loops.  Resets topology state on START_DETECTION.

• _engine.py: call init() before phy init symbol in load_phy so that
  Phy_Init reserves phy slot 0 for luos_phy before Phy_Create assigns
  ws_network to slot 1 (fixes phy slot corruption that silenced all
  ws_network traffic).

• _service.py: make find_peer retry detect() in a loop until the peer
  appears or the timeout expires, accommodating the asynchronous WS
  connection establishment.

• _ws_worker.py: send toggle_count+5 frames at 50 ms intervals to
  absorb the occasional single-frame drop under concurrent C threads.

• test_network_blinker.py: new acceptance test – two spawn-context
  processes exchange ≥10 IO_STATE toggles over a local broker.
  31 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
examples/led_ws.py prints IO_STATE events; examples/blinker_ws.py sends
toggles every second. Both read LUOS_WS_BROKER (default
ws://127.0.0.1:8000). Intended to run in two terminals after a broker
is started in a third.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a Network phy section with a three-terminal example (broker + two
Python processes) and replaces the Phase 1 placeholder Limitations
with the actual Phase 2 shape: ws only, load-once-per-process, fixed
broker at load_phy time, macOS+Linux. Calls out that the broker must
implement the Luos topology-control protocol (PING/OK/NOK/END) — a
naive relay doesn't work. Adds Phase 2 spec and plan links.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Luos WebSocket broker already exists as the canonical implementation
in the Pyluos package (pyluos.tools.ws_broker, originally at
network/ws_network/broker.py before commit 26ba43e moved it out of
this repo in 2024). Our tests/_broker.py was an inadvertent
re-implementation of the same PING/OK/NOK/END topology protocol.

Replace [test] dep websockets>=12 with pyluos>=3.1; rewrite the broker
fixture to spawn `python -m pyluos.tools.ws_broker --ip 127.0.0.1
--port <chosen>`, picking the port via socket.bind(('127.0.0.1', 0)).
Wait for the broker's "opened on" stdout line before yielding the URL.
Delete tests/_broker.py.

README updated to point users at `pip install pyluos` rather than the
deleted in-tree script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant