Skip to content

Hardening gaps in selector / where / broadcast input paths #50

Description

@soupat

A focused review of the selector / where-predicate / broadcast paths
added in PRs #28 and #29 surfaced three gaps. Each is small in
isolation; raising as one issue since they cluster around input
validation and resource bounds on the same code paths.

1. No length cap on selector strings or where expressions

device-connect-edge/device_connect_edge/selector.py and
predicate.py:compile_where accept arbitrarily long inputs. A
pathological selector (megabytes of nested filters) or a
megabyte-scale CEL expression goes through the full parse + compile
path before any complexity check. Suggested constants:
MAX_SELECTOR_LEN (~4096 chars) and MAX_WHERE_LEN (~2048 chars),
enforced at the parse / compile entry points.

2. fire_at is not checked for finite

device-connect-edge/device_connect_edge/device.py:1284-1296
(_handle_broadcast_envelope):

fire_at = envelope.get("fire_at")
on_late = envelope.get("on_late", "skip")
if fire_at is not None:
    delay = float(fire_at) - time.time()
    ...
    if delay > 0:
        await asyncio.sleep(delay)

A producer that sends fire_at: NaN makes delay = NaN. The
subsequent delay > 0 is False (NaN compares false to everything),
so the sleep is skipped and the procedure fires immediately -- the
opposite of the producer's intent. fire_at: Infinity -> the sleep
becomes unbounded; fire_at: -Infinity -> delay = -inf, wrong
branch. A math.isfinite(fire_at) guard at the top of the
if fire_at is not None: block catches all three cases (suggested
behavior: reject the envelope, or treat as fire_at=None).

3. CEL where evaluation blocks the event loop, no timeout

device-connect-edge/device_connect_edge/predicate.py:
WherePredicate.evaluate is synchronous and returns bool(result)
directly. Called from the async broadcast handler
(_handle_broadcast_envelope -> _evaluate_where_with_timeout), a
slow or expensive CEL expression stalls the whole event loop for the
duration of evaluation, with no hard timeout enforcement at the call
site.

Two complementary options:

  • Wrap the CEL call in asyncio.wait_for(loop.run_in_executor(...))
    with a bounded timeout so a single slow predicate cannot stall the
    edge loop for arbitrary time.
  • Bound the executor pool size + queue depth, otherwise a flood of
    slow predicates exhausts the pool. cel-python's evaluator is
    single-threaded, so the pool is the only knob.

A pure-best-effort approach (document the bound, let callers opt in)
is also defensible if hard isolation is judged too heavy for the
typical workload -- but the current state has no documented bound at
all.

Why these together

All three are unvalidated / unbounded edges on input the broadcast
handler and predicate evaluator trust implicitly. Each individually
is small; together they describe a consistent posture shift the edge
code could make: bound inputs at entry, then trust the typed values
downstream.

Happy to put up a PR if useful; raising as an issue first so the
direction can be confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions