Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 3 additions & 14 deletions v1/demand_forecasting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@ tags:

Retail planners forecast unit sales at fine granularity — per store, per item, per day — to drive replenishment, promotions, and labour planning. Classical demand-forecasting models score one (store, item) series in isolation; they miss the fact that store A's sales of bread move with bakery sales across the chain, and that bakery sells through similarly to dairy. This template wires those hierarchies into a **Predictive** reasoner: a regression GNN trained over a heterogeneous Sale → Store, Sale → Item, Item → ItemFamily graph, so the model propagates signal through the store and product hierarchies while predicting per-Sale unit sales.

> [!IMPORTANT]
> The RelationalAI **predictive reasoner (GNN)** used in this template is in early access. The API surface (`GNN`, `PropertyTransformer`, task relationships) may still change between releases; check the `rai-predictive-modeling` and `rai-predictive-training` skills for current guidance before adapting to production data.

## Who this is for

- Retail data scientists building per-(store, item, day) demand-forecasting pipelines who want to add hierarchical signal (item family, store cluster) without manually engineering features
Expand Down Expand Up @@ -67,7 +64,7 @@ GRANT ALL PRIVILEGES ON SCHEMA FAVORITA_MINI.EXPERIMENTS TO APPLICATION RELATION
### Tools

- Python >= 3.10
- RelationalAI Python SDK (`relationalai`)
- RelationalAI Python SDK with the predictive extra (`relationalai[gnn] == 1.4.2`)

## Quickstart

Expand Down Expand Up @@ -140,7 +137,7 @@ Test-set RMSE (per (city, family, week)): 150.8997
```

> [!NOTE]
> The GNN learns base-level demand and weekday/weekend seasonality cleanly. The December holiday spike is partially captured but under-shot — that's because the SDK's `has_time_column=True` temporal indexing is currently disabled in this template (see [Customize this template](#customize-this-template) and the troubleshooting note below). The pandas-level temporal split is preserved (we still train on the past and evaluate on the future), but the GNN itself sees the date as a flat datetime feature rather than a temporal index. When the SDK's time-aware mode is stable for this dataset shape, re-enabling it should improve the December-spike capture.
> The GNN learns base-level demand and weekday/weekend seasonality cleanly. The December holiday spike is partially captured but under-shot — Sale.date is exposed as a flat datetime feature, not a temporal index, so the GNN doesn't aggregate over time windows. The pandas-level temporal split is preserved (we still train on the past and evaluate on the future). To trade simplicity for tighter spike capture, see the "Use temporal indexing" variant in [Customize this template](#customize-this-template).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before I think.


## Template structure

Expand Down Expand Up @@ -291,7 +288,7 @@ results_df = (

## Customize this template

- **Re-enable temporal indexing** when the SDK ships a stable fix — set `has_time_column=True`, restore `time_col=[Sale.date]` in the PropertyTransformer, restore the date arg in the Train/Val/Test relationships (`f"{Sale} at {Any:date} has {Any:value}"`), and add `temporal_strategy="last"` to the `GNN(...)` constructor. The December holiday spike should predict better.
- **Use temporal indexing instead** — for tighter holiday/seasonal spike capture, set `has_time_column=True`, restore `time_col=[Sale.date]` in the PropertyTransformer, restore the date arg in the Train/Val/Test relationships (`f"{Sale} at {Any:date} has {Any:value}"`), and add `temporal_strategy="last"` to the `GNN(...)` constructor. Trades simplicity for the GNN aggregating over time windows.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last place with the same issue.

- **Forecast different granularity** — change `TEST_DAYS` / `VAL_DAYS` at the top of the script. Default is a 60-day test window after a 60-day val window.
- **Add weather, promotions calendar, holiday flags** — extend `Sale` with extra columns and add them to `PropertyTransformer.category` or `.continuous` as appropriate. The same hierarchical-graph + GNN scaffold absorbs new features without restructuring.
- **Bring more hierarchy in** — the bundled data has Item → ItemFamily. Real Favorita data has Item → Class → Family → Department. Define a `Class` and `Department` concept the same way `ItemFamily` is defined, add `Class → Family` and `Family → Department` edges, and the GNN propagates through deeper product hierarchies.
Expand Down Expand Up @@ -353,14 +350,6 @@ model = Model("demand_forecasting_local_v2") # bump on each re-run if needed
```
</details>

<details>
<summary>Train job failures with date columns at scale (<code>has_time_column=True</code>)</summary>

PyRel 1.0.x has a server-side `DateTime/VString` signature mismatch when `has_time_column=True` is paired with a date column at non-trivial dataset sizes. Symptoms include train jobs that hang at "Step 2/4: Preparing model for prediction" with no JOBS row, or fail with a SQL signature error.

Workaround (used as the default in this template): keep the date as a plain `datetime` feature in `PropertyTransformer`, but set `has_time_column=False` and drop `time_col` / `temporal_strategy`. Preserve the temporal split in pandas before building task tables. See [Customize this template](#customize-this-template) for the instructions to re-enable temporal indexing once the SDK fix lands.
</details>

## Related templates

- **`subscriber_retention`** — sibling Predictive template using a regression GNN on a homogeneous call graph (no time column); useful as a comparison for the simpler-graph case
Expand Down
17 changes: 7 additions & 10 deletions v1/demand_forecasting/demand_forecasting.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@
graph so the GNN can propagate signal through the store and item
hierarchies.
3. Train a regression GNN predicting Sale.unit_sales. Sale.date is fed
as a plain datetime feature, not as a temporal index -- see the
`has_time_column=False` NOTE inline for the SDK workaround and the
README "Customize this template" for re-enabling temporal indexing.
as a plain datetime feature, not as a temporal index; the temporal
split is done in pandas before the task tables are built (see step 4).
4. Generate per-Sale predictions on a forward-looking 60-day test window
(temporal split done in pandas before the task tables are built) and
aggregate to weekly per-(store, family) forecasts.
Expand Down Expand Up @@ -131,13 +130,11 @@
continuous=[Store.cluster],
integer=[Item.item_class],
datetime=[Sale.date],
# NOTE: time_col disabled. The PyRel 1.0.x predictive backend has a known
# DateTime/VString signature mismatch when has_time_column=True is used
# with a date column at scale; the workaround is to keep the date as a
# plain datetime feature (above) and disable temporal indexing. The
# split is still temporal — see the train_mask / val_mask / test_mask
# assignments below — so we still train on the past and evaluate on
# the future, just without temporal-strategy aggregation in the GNN.
# Sale.date is exposed as a plain datetime feature above; we don't set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen the whole script end to end, however this statement seems to me a bit confusing. Even though the train/val/test split is temporal, if we do not provide a time column, the subgraphs of neighbors created by the gnn engine for every training node will include neighbors from the future. For example in product recommendation, for a customer node on 1/1/2026 its subgraph will include links to future transactions (after 1/1) if the task is not temporal, and this means information leakage. So I think either this sentence should change, or we should use the time column in the code, based on this specific case.

# time_col here so the GNN treats it as a regular feature rather than a
# temporal index. The split is still temporal — see the
# train_mask / val_mask / test_mask assignments below — so we still
# train on the past and evaluate on the future.
)

# --------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion v1/demand_forecasting/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "RelationalAI template: demand_forecasting (PyRel v1)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"relationalai==1.0.14",
"relationalai[gnn]==1.4.2",
"pandas",
"numpy",
]
Expand Down
31 changes: 4 additions & 27 deletions v1/fraud-detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,6 @@ Fraud and risk teams face four interconnected problems: discovering suspicious s

**For the older rule-based-only take** (no ML), see `fraud_detection_rules.ipynb` -- a standalone notebook using Weakly Connected Components on shared-identifier edges to flag suspicious users.

> [!IMPORTANT]
> The RelationalAI **predictive reasoner (GNN)** used in this template is in
> early access. The API surface (`GNN`, `PropertyTransformer`, task
> relationships) may still change between releases; check the
> `rai-predictive-modeling` and `rai-predictive-training` skills for the
> current guidance before adapting to production data.

## Who this is for

- Data scientists building end-to-end ML-to-optimization pipelines on transaction graphs
Expand Down Expand Up @@ -89,7 +82,7 @@ you'll additionally need:
### Tools

- Python >= 3.10
- RelationalAI Python SDK (`relationalai`) `==1.0.14`
- RelationalAI Python SDK with the predictive extra (`relationalai[gnn] == 1.4.2`)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkouki is this true that you need to install using pip install relationalai[gnn]? Since this is public preview now, shouldn't it just come with the standard pip install relationalai?

- For the rule-based notebook only: `jupyter`

## Quickstart
Expand Down Expand Up @@ -136,10 +129,8 @@ Snowflake dataset (accounts + transactions + train/val/test task tables):
SCHEMA = "YOUR_SCHEMA" # schema with ACCOUNTS, TRANSACTIONS, TRAIN, VAL, TEST
```
2. Adjust the `PropertyTransformer` to match your columns -- drop your PKs/FKs
explicitly, annotate categoricals and continuous fields, and -- if your
data is small enough that the GNN's datetime pipeline doesn't choke on it
-- set `time_col` on your timestamp column. (See the "has_time_column"
troubleshooting note below for the workaround at scale.)
explicitly, annotate categoricals and continuous fields, and set `time_col`
on your timestamp column.
3. If your task tables use different column names, update the `Relationship`
templates (and any `TrainTable.<column>` accesses) to match.
4. Run against a GPU-enabled RAI engine:
Expand Down Expand Up @@ -342,10 +333,7 @@ alongside the raw transaction fields.

Task relationships encode the `isFraud` label on train/val and omit it on
test. Both the local and Snowflake reference scripts use temporal
Relationships (`at {Any:step_ts}`) and `has_time_column=True`. At
multi-million-row scale the GNN's datetime pipeline can hit a server-side
`ValidationError` -- if you encounter that adapting to your own data, see
the troubleshooting block below for the workaround (drop temporal handling).
Relationships (`at {Any:step_ts}`) and `has_time_column=True`.

```python
Train = Relationship(f"{Transaction} at {Any:step_ts} has {Any:label}")
Expand Down Expand Up @@ -455,17 +443,6 @@ the tradeoff is visible.
- Degenerate (selects 0 transactions): no transactions have an alert_score. Confirm `Transaction.predictions` was populated (test split present + GNN fit succeeded).
</details>

<details>
<summary><code>has_time_column=True</code> fails validation (two known triggers)</summary>

Known limitation in the predictive reasoner — the GNN's datetime feature pipeline can fail in two distinct cases:

1. **Edge-intermediary case** (small-data trigger, documented in `rai-predictive-training`): when the concept carrying `time_col` is used only as an edge intermediary (not a node), validation fails with *"no time column defined in data tables"*.
2. **Large-data trigger** (encountered while scaling this template's full Snowflake path): with a Snowflake `VARCHAR` ISO-8601 timestamp column loaded via `Table().to_schema()`, training fails server-side with *"ValidationError: Error processing datetime column 'step_ts'"* — even when the column is a node property, format is correct, and there are no NULLs. The bundled local CSV path (which uses `model.data(df).to_schema()` after `parse_dates=...`) does not hit this.

**Workaround for both:** set `has_time_column=False` in the `GNN(...)` constructor, drop `temporal_strategy=...`, strip the `at {Any:step_ts}` clauses from your Train/Val/Test relationship templates, and comment out `datetime=` and `time_col=` from your `PropertyTransformer`. Build the train/val/test split tables by `step` cutoff in SQL (the temporal split is preserved in the data even if the GNN can't use the timestamp as a feature).
</details>

<details>
<summary>Spinner floods the log when running in CI / non-TTY</summary>

Expand Down
2 changes: 1 addition & 1 deletion v1/fraud-detection/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "RelationalAI template: fraud_detection (PyRel v1)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"relationalai==1.0.14",
"relationalai[gnn]==1.4.2",
"pandas>=2.0",
"numpy",
"jupyter",
Expand Down
9 changes: 1 addition & 8 deletions v1/retail_planning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,6 @@ Retailers face interconnected decisions: which items will sell, which customers

**Then adapt the pattern to your own Snowflake data** using `retail_planning.py` as a reference. It trains three GNNs (sales regression, customer-churn classification, user-article link prediction) against the full Kaggle H&M dataset in Snowflake, aggregates all three signals into an adjusted demand estimate, and feeds that into the same two optimizers. The H&M pipeline is the worked example -- the structure (graph concepts → GNN tasks → aggregation bridge → prescriptive constraints) is what carries over to your own retail, pricing, or demand-planning data.

> [!IMPORTANT]
> The RelationalAI **predictive reasoner (GNN)** used in this template is in
> private preview. The API surface (`GNN`, `PropertyTransformer`, task
> relationships) may still change between releases; check the
> `rai-predictive-modeling` and `rai-predictive-training` skills for the
> current guidance before adapting to production data.

## Who this is for

- Data scientists building end-to-end ML-to-optimization pipelines
Expand Down Expand Up @@ -83,7 +76,7 @@ you'll additionally need:
### Tools

- Python >= 3.10
- RelationalAI Python SDK (`relationalai`) == 1.4.1
- RelationalAI Python SDK with the predictive extra (`relationalai[gnn] == 1.4.2`)

## Quickstart

Expand Down
2 changes: 1 addition & 1 deletion v1/retail_planning/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "RelationalAI template: retail_planning (PyRel v1)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"relationalai[gnn]==1.4.1",
"relationalai[gnn]==1.4.2",
"pandas>=2.0",
]

Expand Down
5 changes: 1 addition & 4 deletions v1/smoker_status_prediction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,6 @@ tags:

Predicting health-related behaviors like smoking status from medical and demographic data is a common tabular machine learning task. In practice, though, these behaviors are also shaped by social context: friends, family, and peers often influence one another. This template demonstrates how to model both individual attributes and social relationships with a Graph Neural Network (GNN), using the RelationalAI **Predictive** reasoner to train a single end-to-end model.

> [!IMPORTANT]
> The RelationalAI **predictive reasoner (GNN)** used in this template is in early access. The API surface (`GNN`, `PropertyTransformer`, task relationships) may still change between releases; check the `rai-predictive-modeling` and `rai-predictive-training` skills for current guidance before adapting to production data.

## Who this is for

- Data scientists who want to leverage the relational structure of data stored across connected tables
Expand Down Expand Up @@ -57,7 +54,7 @@ Assumes familiarity with Python and basic ML concepts (binary classification, tr
### Tools

- Python >= 3.10
- RelationalAI Python SDK (`relationalai`) >= 1.4.2
- RelationalAI Python SDK with the predictive extra (`relationalai[gnn] == 1.4.2`)

## Quickstart

Expand Down
2 changes: 1 addition & 1 deletion v1/smoker_status_prediction/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "RelationalAI template: smoker_status_prediction (PyRel v1)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"relationalai==1.4.2",
"relationalai[gnn]==1.4.2",
"pandas>=2.0",
]

Expand Down
5 changes: 1 addition & 4 deletions v1/subscriber_retention/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,6 @@ tags:

Telco retention teams need to score every active subscriber for churn risk so they can target proactive offers at the right people before contracts roll over. Traditional churn models lean on plan attributes (rate, term, auto-renew) and demographics; they ignore the network around each subscriber. This template wires a call-graph signal into the model: who you call, who calls you, and how central you sit in the call network all become features, and the **Predictive** reasoner trains a GNN regression head over them. The graph features come from the **Graph** reasoner (PageRank on the Subscriber→Subscriber call graph); aggregate-derived `outgoing_calls` / `incoming_calls` properties round out the per-subscriber feature row.

> [!IMPORTANT]
> The RelationalAI **predictive reasoner (GNN)** used in this template is in early access. The API surface (`GNN`, `PropertyTransformer`, task relationships) may still change between releases; check the `rai-predictive-modeling` and `rai-predictive-training` skills for current guidance before adapting to production data.

## Who this is for

- Telco data scientists building churn-risk scoring pipelines that combine static plan attributes with relational/network signal
Expand Down Expand Up @@ -69,7 +66,7 @@ GRANT ALL PRIVILEGES ON SCHEMA TELCO_ENRICHMENT.EXPERIMENTS TO APPLICATION RELAT
### Tools

- Python >= 3.10
- RelationalAI Python SDK (`relationalai`)
- RelationalAI Python SDK with the predictive extra (`relationalai[gnn] == 1.4.2`)

## Quickstart

Expand Down
2 changes: 1 addition & 1 deletion v1/subscriber_retention/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "RelationalAI template: subscriber_retention (PyRel v1)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"relationalai==1.0.14",
"relationalai[gnn]==1.4.2",
"pandas",
"numpy",
]
Expand Down
11 changes: 2 additions & 9 deletions v1/telco_network_recovery/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Each stage writes derived properties back to the same ontology that downstream s

- **Accretive ontology enrichment** — each stage writes derived properties that downstream stages consume as first-class attributes. No glue code, no DataFrame round-trips between stages (except where the GNN's prediction shape needs a one-step pandas aggregation before binding back).
- **Heterogeneous-graph GNN** — three FK / shared-MODEL edges (`EquipmentHealth → NetworkEquipment`, `NetworkEquipment → CellTower`, `ModelAdvisory → NetworkEquipment`) so advisory severity propagates to every fleet sibling AND reaches tower-mate equipment via 2-hop paths.
- **Property-equality edges** — the GNN graph defines edges via `==` between FK columns instead of `model.Relationship` traversal. This pattern sidesteps an SDK iteration-mutation bug and is the recommended shape for any concept that participates in a GNN graph and has cross-pointing relationships.
- **Property-equality edges** — the GNN graph defines edges via `==` between FK columns instead of `model.Relationship` traversal. FK properties on `NetworkEquipment` and `EquipmentHealth` carry the join keys explicitly so heterogeneous edges read as property-level equality conditions.
- **Bridge concept** — per-equipment predictions are aggregated in pandas (`sum`) and loaded back as a `CellTower.failure_intensity` property via a small `TowerFailureScore` concept. Same pattern as in `retail_planning`.
- **Three-branch rule** — `CellTower.is_critical_restore` is defined three times (OR semantics). A tower is critical if any branch fires; the third branch lets the GNN broaden scope beyond WEST.
- **Three-factor MIP objective** — `capacity_increase × weighted_impact × failure_intensity`. Each factor comes from a different reasoner upstream.
Expand Down Expand Up @@ -90,7 +90,7 @@ Each stage writes derived properties back to the same ontology that downstream s
### Tools

- Python ≥ 3.10.
- RelationalAI Python SDK with the predictive submodule (`relationalai.semantics.reasoners.predictive`).
- RelationalAI Python SDK with the predictive extra (`relationalai[gnn] == 1.4.2`).

### One-time Snowflake setup for GNN experiment artifacts

Expand Down Expand Up @@ -309,13 +309,6 @@ Verify with `SHOW GRANTS ON SCHEMA <DB>.EXPERIMENTS` — you should see `OWNERSH

</details>

<details>
<summary>GNN training raises <code>RuntimeError: dictionary changed size during iteration</code></summary>

This is a known SDK issue when a concept that participates in the GNN graph also carries a `model.Relationship` (the iteration over `concept._relationships` mutates mid-loop). The template works around it by using **property-equality edges** — FK columns (`tower_id_fk`, `equipment_id_fk`) joined via `==` in edge definitions instead of relationship traversal. If you add new edges, keep this pattern.

</details>

<details>
<summary>Stage 4 returns an infeasible status</summary>

Expand Down
2 changes: 1 addition & 1 deletion v1/telco_network_recovery/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "RelationalAI template: telco_network_recovery (PyRel v1)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"relationalai==1.4.2",
"relationalai[gnn]==1.4.2",
"pandas>=2.0",
]

Expand Down
Loading
Loading