Add Real-Time Mode (RTM) sub-second latency streaming demo by jiteshsoni · Pull Request #78 · databricks-solutions/databricks-blogposts

jiteshsoni · 2026-03-31T18:29:33Z

Summary

This PR adds a Real-Time Mode demo that reads Ethereum-style block events from Kafka, applies a set of stateless guardrail checks, and routes each record to either an allowed or quarantine topic.

The intent is to show the full Kafka -> Databricks RTM -> Kafka loop in a way that is easy to run, inspect, and explain.

Included in the PR

rtm_stateless_guardrail.py: main notebook
cluster_config.template.json: RTM-capable cluster template
test_rtm_guardrail.py: local validation-pattern tests
e2e_local_test.py: local end-to-end validation logic via Databricks Connect
produce_test_data.py: sample Kafka producer
README.md: setup and testing notes

Setup Notes

A few setup details turned out to matter in practice and are now called out in the code and docs:

use a dedicated single-user cluster
disable autoscaling
keep Photon off
use outputMode("update")
create the Kafka topics explicitly if your provider does not auto-create them
use startingOffsets = "earliest" for demos and integration tests so seeded backlog is replayed

The notebook also now includes an explicit verification section that reads the target Kafka topics back and shows what was actually written to -allowed and -quarantine, instead of stopping at query status alone.

How I tested it

Local checks

ran python test_rtm_guardrail.py
verified the pattern checks for email, SSN, credit card, AWS key, JWT, and Ethereum private key detection

Workspace / integration validation

Validated on e2-dogfood using:

cluster: rtm-guardrail-cluster (0313-063110-u4ldfaiy)
Kafka: Redpanda Serverless with SASL/SCRAM over SSL
submit run: 875702710968733
notebook task run: 447049974860353

For the final validation run I created a fresh set of Kafka topics, seeded deterministic records, ran the notebook as a Databricks submit run, and then verified the output topics directly.

Confirmed routing:

4000001 -> ethereum-validated-jobrun-20260401000541-allowed with decision=ALLOW
4000002 -> ethereum-validated-jobrun-20260401000541-quarantine with validation_reasons=["HIGH_GAS_USAGE"]
4000003 -> ethereum-validated-jobrun-20260401000541-quarantine with validation_reasons=["PII_EMAIL"]

Output verification screenshot:

That verified both halves of the demo:

the RTM query runs correctly on the configured cluster
the records are actually written to the expected target topics with the expected routing decisions

This demo showcases Databricks Real-Time Mode for achieving sub-second latency in streaming pipelines. It implements a stateless guardrail pipeline that validates Ethereum blockchain events in real-time. Features: - RTM-enabled streaming pipeline (Kafka to Kafka) - Sensitive data detection (PII, credentials) - Validation rules for operational guardrails - Dynamic topic routing (ALLOW/QUARANTINE) - Parse error handling for malformed JSON - End-to-end and unit tests Requirements: - Databricks Runtime 16.4 LTS or later - Dedicated clusters (serverless not supported) - outputMode("update") required for RTM Files: - rtm_stateless_guardrail.py - Main notebook - cluster_config.template.json - Cluster config template - test_rtm_guardrail.py - Unit tests - e2e_local_test.py - End-to-end tests - produce_test_data.py - Test data producer - README.md - Documentation Blog: https://canadiandataguy.com/p/unlocking-sub-second-latency-with

Reflect the Databricks + Redpanda integration path that worked in practice by requiring single-user clusters for UC volume checkpoints, documenting explicit topic setup, and replaying backlog from earliest offsets during demos.

Replace the old canned validation notes with the actual staging workspace and Redpanda checks used to verify the demo, including the live routing cases that were confirmed end to end.

Extend the RTM notebook with an explicit output-topic verification section so the demo shows what was actually written to the allowed and quarantine topics, instead of only showing query status.

Point readers to the new output-topic verification and stream management sections so the README matches the notebook flow during demos and testing.

Store the Kafka output verification screenshot in the PR branch so it can be embedded directly in the pull request description.

matthewmoorcroft · 2026-04-13T08:07:16Z

@jiteshsoni I can see jwt tokens, also missing readme and your name in codeowners file, can you review the code to make sure that there are no security issues.

jiteshsoni requested review from QuentinAmbard, alanreese-dbrx, alexott, anupkalburgi, kwulffert23, matthewmoorcroft, slcc2c and srinivasadmala as code owners March 31, 2026 18:29

jiteshsoni added 5 commits March 31, 2026 16:38

Update RTM demo setup for single-user replay testing

724a29d

Reflect the Databricks + Redpanda integration path that worked in practice by requiring single-user clusters for UC volume checkpoints, documenting explicit topic setup, and replaying backlog from earliest offsets during demos.

Clarify RTM demo test coverage in README

8967a27

Replace the old canned validation notes with the actual staging workspace and Redpanda checks used to verify the demo, including the live routing cases that were confirmed end to end.

Add notebook verification for routed Kafka output

490e2c3

Extend the RTM notebook with an explicit output-topic verification section so the demo shows what was actually written to the allowed and quarantine topics, instead of only showing query status.

Call out notebook verification sections in README

fdc507b

Point readers to the new output-topic verification and stream management sections so the README matches the notebook flow during demos and testing.

Add RTM output verification screenshot

c6a6344

Store the Kafka output verification screenshot in the PR branch so it can be embedded directly in the pull request description.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Real-Time Mode (RTM) sub-second latency streaming demo#78

Add Real-Time Mode (RTM) sub-second latency streaming demo#78
jiteshsoni wants to merge 6 commits intodatabricks-solutions:mainfrom
jiteshsoni:rtm-pr-review-fixes

jiteshsoni commented Mar 31, 2026 •

edited

Loading

Uh oh!

matthewmoorcroft commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiteshsoni commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Included in the PR

Setup Notes

How I tested it

Local checks

Workspace / integration validation

Related Post

Uh oh!

matthewmoorcroft commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiteshsoni commented Mar 31, 2026 •

edited

Loading