Hardening gaps in selector / where / broadcast input paths

A focused review of the selector / where-predicate / broadcast paths
added in PRs #28 and #29 surfaced three gaps. Each is small in
isolation; raising as one issue since they cluster around input
validation and resource bounds on the same code paths.

## 1. No length cap on selector strings or `where` expressions

`device-connect-edge/device_connect_edge/selector.py` and
`predicate.py:compile_where` accept arbitrarily long inputs. A
pathological selector (megabytes of nested filters) or a
megabyte-scale CEL expression goes through the full parse + compile
path before any complexity check. Suggested constants:
`MAX_SELECTOR_LEN` (~4096 chars) and `MAX_WHERE_LEN` (~2048 chars),
enforced at the parse / compile entry points.

## 2. `fire_at` is not checked for finite

`device-connect-edge/device_connect_edge/device.py:1284-1296`
(`_handle_broadcast_envelope`):

```python
fire_at = envelope.get("fire_at")
on_late = envelope.get("on_late", "skip")
if fire_at is not None:
    delay = float(fire_at) - time.time()
    ...
    if delay > 0:
        await asyncio.sleep(delay)
```

A producer that sends `fire_at: NaN` makes `delay = NaN`. The
subsequent `delay > 0` is False (NaN compares false to everything),
so the sleep is skipped and the procedure fires immediately -- the
opposite of the producer's intent. `fire_at: Infinity` -> the sleep
becomes unbounded; `fire_at: -Infinity` -> `delay = -inf`, wrong
branch. A `math.isfinite(fire_at)` guard at the top of the
`if fire_at is not None:` block catches all three cases (suggested
behavior: reject the envelope, or treat as `fire_at=None`).

## 3. CEL `where` evaluation blocks the event loop, no timeout

`device-connect-edge/device_connect_edge/predicate.py`:
`WherePredicate.evaluate` is synchronous and returns `bool(result)`
directly. Called from the async broadcast handler
(`_handle_broadcast_envelope` -> `_evaluate_where_with_timeout`), a
slow or expensive CEL expression stalls the whole event loop for the
duration of evaluation, with no hard timeout enforcement at the call
site.

Two complementary options:

- Wrap the CEL call in `asyncio.wait_for(loop.run_in_executor(...))`
  with a bounded timeout so a single slow predicate cannot stall the
  edge loop for arbitrary time.
- Bound the executor pool size + queue depth, otherwise a flood of
  slow predicates exhausts the pool. `cel-python`'s evaluator is
  single-threaded, so the pool is the only knob.

A pure-best-effort approach (document the bound, let callers opt in)
is also defensible if hard isolation is judged too heavy for the
typical workload -- but the current state has no documented bound at
all.

## Why these together

All three are unvalidated / unbounded edges on input the broadcast
handler and predicate evaluator trust implicitly. Each individually
is small; together they describe a consistent posture shift the edge
code could make: bound inputs at entry, then trust the typed values
downstream.

Happy to put up a PR if useful; raising as an issue first so the
direction can be confirmed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hardening gaps in selector / where / broadcast input paths #50

1. No length cap on selector strings or `where` expressions

2. `fire_at` is not checked for finite

3. CEL `where` evaluation blocks the event loop, no timeout

Why these together

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Hardening gaps in selector / where / broadcast input paths #50

Description

1. No length cap on selector strings or where expressions

2. fire_at is not checked for finite

3. CEL where evaluation blocks the event loop, no timeout

Why these together

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. No length cap on selector strings or `where` expressions

2. `fire_at` is not checked for finite

3. CEL `where` evaluation blocks the event loop, no timeout