A focused review of the selector / where-predicate / broadcast paths
added in PRs #28 and #29 surfaced three gaps. Each is small in
isolation; raising as one issue since they cluster around input
validation and resource bounds on the same code paths.
1. No length cap on selector strings or where expressions
device-connect-edge/device_connect_edge/selector.py and
predicate.py:compile_where accept arbitrarily long inputs. A
pathological selector (megabytes of nested filters) or a
megabyte-scale CEL expression goes through the full parse + compile
path before any complexity check. Suggested constants:
MAX_SELECTOR_LEN (~4096 chars) and MAX_WHERE_LEN (~2048 chars),
enforced at the parse / compile entry points.
2. fire_at is not checked for finite
device-connect-edge/device_connect_edge/device.py:1284-1296
(_handle_broadcast_envelope):
fire_at = envelope.get("fire_at")
on_late = envelope.get("on_late", "skip")
if fire_at is not None:
delay = float(fire_at) - time.time()
...
if delay > 0:
await asyncio.sleep(delay)
A producer that sends fire_at: NaN makes delay = NaN. The
subsequent delay > 0 is False (NaN compares false to everything),
so the sleep is skipped and the procedure fires immediately -- the
opposite of the producer's intent. fire_at: Infinity -> the sleep
becomes unbounded; fire_at: -Infinity -> delay = -inf, wrong
branch. A math.isfinite(fire_at) guard at the top of the
if fire_at is not None: block catches all three cases (suggested
behavior: reject the envelope, or treat as fire_at=None).
3. CEL where evaluation blocks the event loop, no timeout
device-connect-edge/device_connect_edge/predicate.py:
WherePredicate.evaluate is synchronous and returns bool(result)
directly. Called from the async broadcast handler
(_handle_broadcast_envelope -> _evaluate_where_with_timeout), a
slow or expensive CEL expression stalls the whole event loop for the
duration of evaluation, with no hard timeout enforcement at the call
site.
Two complementary options:
- Wrap the CEL call in
asyncio.wait_for(loop.run_in_executor(...))
with a bounded timeout so a single slow predicate cannot stall the
edge loop for arbitrary time.
- Bound the executor pool size + queue depth, otherwise a flood of
slow predicates exhausts the pool. cel-python's evaluator is
single-threaded, so the pool is the only knob.
A pure-best-effort approach (document the bound, let callers opt in)
is also defensible if hard isolation is judged too heavy for the
typical workload -- but the current state has no documented bound at
all.
Why these together
All three are unvalidated / unbounded edges on input the broadcast
handler and predicate evaluator trust implicitly. Each individually
is small; together they describe a consistent posture shift the edge
code could make: bound inputs at entry, then trust the typed values
downstream.
Happy to put up a PR if useful; raising as an issue first so the
direction can be confirmed.
A focused review of the selector / where-predicate / broadcast paths
added in PRs #28 and #29 surfaced three gaps. Each is small in
isolation; raising as one issue since they cluster around input
validation and resource bounds on the same code paths.
1. No length cap on selector strings or
whereexpressionsdevice-connect-edge/device_connect_edge/selector.pyandpredicate.py:compile_whereaccept arbitrarily long inputs. Apathological selector (megabytes of nested filters) or a
megabyte-scale CEL expression goes through the full parse + compile
path before any complexity check. Suggested constants:
MAX_SELECTOR_LEN(~4096 chars) andMAX_WHERE_LEN(~2048 chars),enforced at the parse / compile entry points.
2.
fire_atis not checked for finitedevice-connect-edge/device_connect_edge/device.py:1284-1296(
_handle_broadcast_envelope):A producer that sends
fire_at: NaNmakesdelay = NaN. Thesubsequent
delay > 0is False (NaN compares false to everything),so the sleep is skipped and the procedure fires immediately -- the
opposite of the producer's intent.
fire_at: Infinity-> the sleepbecomes unbounded;
fire_at: -Infinity->delay = -inf, wrongbranch. A
math.isfinite(fire_at)guard at the top of theif fire_at is not None:block catches all three cases (suggestedbehavior: reject the envelope, or treat as
fire_at=None).3. CEL
whereevaluation blocks the event loop, no timeoutdevice-connect-edge/device_connect_edge/predicate.py:WherePredicate.evaluateis synchronous and returnsbool(result)directly. Called from the async broadcast handler
(
_handle_broadcast_envelope->_evaluate_where_with_timeout), aslow or expensive CEL expression stalls the whole event loop for the
duration of evaluation, with no hard timeout enforcement at the call
site.
Two complementary options:
asyncio.wait_for(loop.run_in_executor(...))with a bounded timeout so a single slow predicate cannot stall the
edge loop for arbitrary time.
slow predicates exhausts the pool.
cel-python's evaluator issingle-threaded, so the pool is the only knob.
A pure-best-effort approach (document the bound, let callers opt in)
is also defensible if hard isolation is judged too heavy for the
typical workload -- but the current state has no documented bound at
all.
Why these together
All three are unvalidated / unbounded edges on input the broadcast
handler and predicate evaluator trust implicitly. Each individually
is small; together they describe a consistent posture shift the edge
code could make: bound inputs at entry, then trust the typed values
downstream.
Happy to put up a PR if useful; raising as an issue first so the
direction can be confirmed.