Skip to content

feat(doc-1771): add CDC cookbooks for MySQL, PostgreSQL, MongoDB, SQL Server, TigerBeetle, and GCP Spanner#392

Merged
JakeSCahill merged 48 commits into
mainfrom
feat/doc-1771-cdc-cookbooks
Jun 17, 2026
Merged

feat(doc-1771): add CDC cookbooks for MySQL, PostgreSQL, MongoDB, SQL Server, TigerBeetle, and GCP Spanner#392
JakeSCahill merged 48 commits into
mainfrom
feat/doc-1771-cdc-cookbooks

Conversation

@mfernest

@mfernest mfernest commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds six new CDC cookbooks covering the full set of CDC input connectors
  • Each cookbook follows a consistent pattern: capture, filter, route to Redpanda, route to S3, route by event type, replication/streaming mode, troubleshoot
  • Includes 5 YAML example files per cookbook (30 total) and nav.adoc entries
  • Cookbooks added: mysql_cdc, postgres_cdc, mongodb_cdc, microsoft_sql_server_cdc, tigerbeetle_cdc, gcp_spanner_cdc

Resolves DOC-1771.

Test plan

  • Local build passes (no errors for new pages)
  • All six cookbook pages appear in the nav under Cookbooks
  • All included YAML example files resolve correctly
  • Netlify deploy preview renders correctly
  • CodeRabbit review addressed

🤖 Generated with Claude Code

Page previews

mfernest and others added 11 commits March 4, 2026 11:15
The broker output page lacked explanation of what a broker is and when
to use it. Adds:
- A plain-English description of broker as a meta-output
- A concrete use case (writing to Kafka and a database simultaneously)
- A note disambiguating broker from Redpanda broker (cluster node)
- A pattern-selection guide before the detailed patterns list

Fixes DOC-1880.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add mysql_cdc.adoc cookbook with patterns for capture, filter,
  route to Redpanda, route to S3, and route by event type
- Add YAML example files for each pattern
- Add nav entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Capitalize title: MySQL CDC Patterns
- Replace em dashes with definition list in replication modes section
- Standardize [source,yaml] to [,yaml] throughout
- Improve opening paragraph voice
- Add explicit link text to rpk xrefs
- Fix "respectively" phrasing
- Change "This example:" to "This pattern:" throughout
- Remove redundant callout annotation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mfernest mfernest requested a review from a team as a code owner March 12, 2026 23:00
@netlify

netlify Bot commented Mar 12, 2026

Copy link
Copy Markdown

Deploy Preview for redpanda-connect ready!

Name Link
🔨 Latest commit 099e0b8
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-connect/deploys/6a31774285d920000845e094
😎 Deploy Preview https://deploy-preview-392--redpanda-connect.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai

coderabbitai Bot commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6916cd13-2509-407f-aaa2-e2139666ed0a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds an AWS CloudWatch Logs input (docs, schema, examples, and navigation), rewrites the broker output page to describe broker as a meta-output with multiple distribution patterns, and introduces comprehensive CDC cookbooks and example configs for MySQL, PostgreSQL, MongoDB, Microsoft SQL Server, GCP Spanner, and TigerBeetle (pages and many YAML examples). It also updates manifests/docs-data for the 4.81.0 release, adjusts several field descriptions (e.g., seed_brokers), adds example snippets, and tweaks test scripts to support skipping lint in certain test configs.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Source
participant Broker as "Broker (meta-output)"
participant OutputA as "Child Output A"
participant OutputB as "Child Output B"
Note over Broker: fan_out (parallel delivery)
Source->>Broker: Emit message
Broker->>OutputA: Deliver (parallel)
Broker->>OutputB: Deliver (parallel)
OutputA-->>Broker: Ack / Nack
OutputB-->>Broker: Ack / Nack
alt Any Nack / Backpressure
Broker->>Broker: Retry / Block (unless drop_on configured)
end

mermaid
sequenceDiagram
participant Source
participant Broker as "Broker (meta-output)"
participant Output1 as "Child Output 1"
participant Output2 as "Child Output 2"
Note over Broker: round_robin (sequential distribution)
Source->>Broker: Emit message M1
Broker->>Output1: Deliver M1
Output1-->>Broker: Ack
Source->>Broker: Emit message M2
Broker->>Output2: Deliver M2
Output2-->>Broker: Ack

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • micheleRP
  • supahcraig
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding six new CDC cookbooks for multiple database systems with consistent patterns.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description clearly relates to the changeset, detailing the addition of six CDC cookbooks with consistent patterns and providing specific files, test plans, and references.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/doc-1771-cdc-cookbooks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
modules/cookbooks/examples/microsoft_sql_server_cdc/to-redpanda.yaml (1)

19-20: Key extraction assumes all tables have an id field.

Same concern as the MySQL to-redpanda example: key: ${! json("id") } assumes records contain an id field. Tables like dbo.orders might use order_id as the primary key.

Consider adding a note in the cookbook documentation that users should adjust the key field to match their table's primary key column.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/examples/microsoft_sql_server_cdc/to-redpanda.yaml` around
lines 19 - 20, The example hardcodes key extraction as key: ${! json("id") }
which assumes every table has an id column; update the cookbook documentation
and this example to call out that users must replace the json key expression
with their table's primary key (e.g., order_id) or provide a configurable
variable for the key, and reference the example symbol key: ${! json("id") } so
readers know to change that expression (and optionally mention topic: ${!
meta("topic") } is independent).
modules/cookbooks/examples/mysql_cdc/to-redpanda.yaml (1)

20-22: Key extraction assumes all tables have an id field.

The key: ${! json("id") } configuration assumes all CDC records contain an id field. If mydb.customers or mydb.orders uses a different primary key (e.g., order_id, customer_id), this will produce empty keys or fail.

Consider either:

  • Documenting this assumption in the cookbook page
  • Using a more generic approach like deriving the key from a configurable field or primary key metadata
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/examples/mysql_cdc/to-redpanda.yaml` around lines 20 - 22,
The config currently hardcodes key extraction with key: ${! json("id") }, which
fails when tables use different PK names; update the template to accept a
configurable key field (e.g., use meta("key_field") or meta("primary_key")) and
fall back to common variants or PK metadata rather than assuming "id" — change
the key reference from ${! json("id") } to read the field name from
meta("key_field") (and optionally try fallback names like "id", "order_id",
"customer_id" or primary-key metadata) and document the new meta parameter in
the cookbook so consumers can set the correct key per table or source.
modules/cookbooks/pages/gcp_spanner_cdc.adoc (1)

124-129: Clarify the exact key field name in the Redpanda pattern bullets.

Line 128 says “transaction ID”, but the included example uses server_transaction_id as the key. Naming this explicitly here will reduce cross-example ambiguity.

Suggested wording update
-- Sets the message key to the transaction ID, grouping related changes together
+- Sets the message key to `server_transaction_id`, grouping related changes together
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/pages/gcp_spanner_cdc.adoc` around lines 124 - 129, Update
the Redpanda pattern bullets to name the exact key field used by the example:
replace the generic "transaction ID" wording with "server_transaction_id" so the
bullet reads something like "Sets the message key to the server_transaction_id,
grouping related changes together"; ensure the wording matches the example that
uses server_transaction_id to avoid ambiguity.
modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml (1)

9-29: Align snapshot mode with your routing filters.

With stream_snapshot: true (Line 9), read events are still dropped by both branches (Lines 21 and 28). That makes the initial snapshot pass effectively discarded. Consider setting stream_snapshot: false, or route read with create if you want snapshot bootstrap data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml` around lines 9 -
29, The snapshot mode is incompatible with the routing filters: with
stream_snapshot: true the snapshot 'read' events are being dropped by both
processor mappings (they call deleted() when meta("operation") != "create" for
mongodb.orders.inserts and when meta("operation") == "create" ||
meta("operation") == "read" they call deleted() for mongodb.orders.changes), so
the bootstrap snapshot is discarded; either set stream_snapshot: false or change
the processors' mapping expressions to route "read" the same way as "create"
(e.g., ensure the mapping for mongodb.orders.inserts includes meta("operation")
== "read" alongside "create", or remove the deleted() branch for reads) so
snapshot reads are forwarded to the desired topic.
modules/cookbooks/examples/mongodb_cdc/filter-events.yaml (1)

9-15: Snapshot setting and filter behavior are working against each other.

Line 9 enables snapshot streaming, but Line 14 drops read events. If this cookbook is intended to process only live changes, disable snapshot to avoid unnecessary startup and load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/examples/mongodb_cdc/filter-events.yaml` around lines 9 -
15, The config enables snapshot streaming via stream_snapshot but the pipeline
mapping drops events where meta("operation") == "read" (using deleted()), which
conflicts if you only want live changes; either disable snapshot by setting
stream_snapshot: false or removing the stream_snapshot key, or if you do need
snapshot semantics keep stream_snapshot true and adjust the pipeline mapping
(the mapping block that checks meta("operation") and calls deleted()) so it does
not drop "read" events during snapshot (only drop "delete" events) — update the
mapping accordingly.
modules/cookbooks/examples/postgres_cdc/filter-events.yaml (1)

9-15: Avoid enabling snapshot when filtered logic drops snapshot events.

Line 9 enables stream_snapshot, while Line 14 drops every non-insert/update operation. If snapshot rows come through as read, this example does extra startup work only to delete those rows. Prefer stream_snapshot: false for this pattern, or explicitly keep snapshot events.

Suggested adjustment
-    stream_snapshot: true
+    stream_snapshot: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/examples/postgres_cdc/filter-events.yaml` around lines 9 -
15, The pipeline currently enables stream_snapshot while the mapping processor
drops every event whose meta("operation") is not "insert" or "update" (using
deleted()), which wastes work for snapshot "read" rows; either set
stream_snapshot: false to avoid streaming snapshot rows, or update the mapping
in the mapping block to treat meta("operation") == "read" as a keepable case
(i.e., do not call deleted() for "read") so snapshot events are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/cookbooks/examples/tigerbeetle_cdc/route-by-event.yaml`:
- Around line 25-28: The mapping currently keeps every event that is not
meta("event_type") == "single_phase", which can let unexpected types into the
tigerbeetle.two_phase topic; update the mapping expression that uses
meta("event_type") so it only preserves events where meta("event_type") ==
"two_phase_posted" and deletes all others (i.e., replace the current conditional
in the mapping block with one that explicitly tests for "two_phase_posted" and
calls deleted() for non-matching events), referencing the existing mapping
statement that feeds tigerbeetle.two_phase.

In `@modules/cookbooks/pages/gcp_spanner_cdc.adoc`:
- Around line 57-62: Add a short YAML example showing how to bind the exported
GCP_CREDENTIALS env var into the input.gcp_spanner_cdc config using the
credentials_json field (reference symbols: credentials_json,
input.gcp_spanner_cdc, GCP_CREDENTIALS); the snippet should demonstrate setting
credentials_json to the environment variable (e.g., credentials_json:
${GCP_CREDENTIALS}) and include a brief note that if the value is base64-encoded
(as shown earlier) it must be base64-decoded before being used or written to
disk by the consumer.

In `@modules/cookbooks/pages/microsoft_sql_server_cdc.adoc`:
- Line 143: The sentence that currently says "Routes `insert` events to one
Redpanda topic and `update` and `delete` events to another" is ambiguous; update
it to use exact SQL Server CDC operation names by replacing `update` with
`update_before` and `update_after` (so the routing description explicitly lists
`insert`, `update_before`, `update_after`, and `delete`) and clarify which of
those map to each Redpanda topic to match the rest of the document (see
references to `update_before` and `update_after` elsewhere).

In `@modules/cookbooks/pages/mysql_cdc.adoc`:
- Around line 114-119: The bullets claiming "Uses the table name as the Redpanda
topic", "Batches messages for efficient delivery", and "Sets the message key to
the row's primary key field" are inaccurate for the example config; either
update the text in mysql_cdc.adoc to accurately describe what
modules/cookbooks/examples/mysql_cdc/to-redpanda.yaml actually configures, or
modify to-redpanda.yaml to enable table-based topic routing, batching, and
primary-key keying. Locate the three bullets in mysql_cdc.adoc and either remove
or reword them to match the example, or add the corresponding settings (topic
routing per table, batching parameters, and primary-key-to-message-key mapping)
to to-redpanda.yaml so the docs and example are consistent.
- Around line 100-104: The prose in modules/cookbooks/pages/mysql_cdc.adoc
incorrectly states that the filter keeps "insert and update" events while the
example in modules/cookbooks/examples/mysql_cdc/filter-events.yaml uses the
expression value.before == null (which only keeps inserts); either update the
prose to say the filter keeps inserts only, referencing the value.before == null
condition, or modify filter-events.yaml to match the described behavior (e.g.,
change the filter expression to include updates as well); ensure the text and
the example (filter-events.yaml and the value.before == null expression) are
consistent.

In `@modules/cookbooks/pages/postgres_cdc.adoc`:
- Around line 174-180: The SQL shown uses invalid syntax "GRANT REPLICATION ON
DATABASE ..."; replace that step by setting the role-level REPLICATION attribute
with an ALTER ROLE statement for your_user (e.g., ALTER ROLE your_user WITH
REPLICATION) instead of granting it on the database, and keep the existing
"GRANT SELECT ON ALL TABLES IN SCHEMA public TO your_user" line; also add a
short note that ALTER ROLE requires superuser privileges.

---

Nitpick comments:
In `@modules/cookbooks/examples/microsoft_sql_server_cdc/to-redpanda.yaml`:
- Around line 19-20: The example hardcodes key extraction as key: ${! json("id")
} which assumes every table has an id column; update the cookbook documentation
and this example to call out that users must replace the json key expression
with their table's primary key (e.g., order_id) or provide a configurable
variable for the key, and reference the example symbol key: ${! json("id") } so
readers know to change that expression (and optionally mention topic: ${!
meta("topic") } is independent).

In `@modules/cookbooks/examples/mongodb_cdc/filter-events.yaml`:
- Around line 9-15: The config enables snapshot streaming via stream_snapshot
but the pipeline mapping drops events where meta("operation") == "read" (using
deleted()), which conflicts if you only want live changes; either disable
snapshot by setting stream_snapshot: false or removing the stream_snapshot key,
or if you do need snapshot semantics keep stream_snapshot true and adjust the
pipeline mapping (the mapping block that checks meta("operation") and calls
deleted()) so it does not drop "read" events during snapshot (only drop "delete"
events) — update the mapping accordingly.

In `@modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml`:
- Around line 9-29: The snapshot mode is incompatible with the routing filters:
with stream_snapshot: true the snapshot 'read' events are being dropped by both
processor mappings (they call deleted() when meta("operation") != "create" for
mongodb.orders.inserts and when meta("operation") == "create" ||
meta("operation") == "read" they call deleted() for mongodb.orders.changes), so
the bootstrap snapshot is discarded; either set stream_snapshot: false or change
the processors' mapping expressions to route "read" the same way as "create"
(e.g., ensure the mapping for mongodb.orders.inserts includes meta("operation")
== "read" alongside "create", or remove the deleted() branch for reads) so
snapshot reads are forwarded to the desired topic.

In `@modules/cookbooks/examples/mysql_cdc/to-redpanda.yaml`:
- Around line 20-22: The config currently hardcodes key extraction with key: ${!
json("id") }, which fails when tables use different PK names; update the
template to accept a configurable key field (e.g., use meta("key_field") or
meta("primary_key")) and fall back to common variants or PK metadata rather than
assuming "id" — change the key reference from ${! json("id") } to read the field
name from meta("key_field") (and optionally try fallback names like "id",
"order_id", "customer_id" or primary-key metadata) and document the new meta
parameter in the cookbook so consumers can set the correct key per table or
source.

In `@modules/cookbooks/examples/postgres_cdc/filter-events.yaml`:
- Around line 9-15: The pipeline currently enables stream_snapshot while the
mapping processor drops every event whose meta("operation") is not "insert" or
"update" (using deleted()), which wastes work for snapshot "read" rows; either
set stream_snapshot: false to avoid streaming snapshot rows, or update the
mapping in the mapping block to treat meta("operation") == "read" as a keepable
case (i.e., do not call deleted() for "read") so snapshot events are preserved.

In `@modules/cookbooks/pages/gcp_spanner_cdc.adoc`:
- Around line 124-129: Update the Redpanda pattern bullets to name the exact key
field used by the example: replace the generic "transaction ID" wording with
"server_transaction_id" so the bullet reads something like "Sets the message key
to the server_transaction_id, grouping related changes together"; ensure the
wording matches the example that uses server_transaction_id to avoid ambiguity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf0ea994-0794-43e5-88cc-f16c3c0d78bd

📥 Commits

Reviewing files that changed from the base of the PR and between 1dbd3fc and d07196d.

📒 Files selected for processing (38)
  • modules/ROOT/nav.adoc
  • modules/components/pages/outputs/broker.adoc
  • modules/cookbooks/examples/gcp_spanner_cdc/basic-capture.yaml
  • modules/cookbooks/examples/gcp_spanner_cdc/filter-events.yaml
  • modules/cookbooks/examples/gcp_spanner_cdc/route-by-event.yaml
  • modules/cookbooks/examples/gcp_spanner_cdc/to-redpanda.yaml
  • modules/cookbooks/examples/gcp_spanner_cdc/to-s3.yaml
  • modules/cookbooks/examples/microsoft_sql_server_cdc/basic-capture.yaml
  • modules/cookbooks/examples/microsoft_sql_server_cdc/filter-events.yaml
  • modules/cookbooks/examples/microsoft_sql_server_cdc/route-by-event.yaml
  • modules/cookbooks/examples/microsoft_sql_server_cdc/to-redpanda.yaml
  • modules/cookbooks/examples/microsoft_sql_server_cdc/to-s3.yaml
  • modules/cookbooks/examples/mongodb_cdc/basic-capture.yaml
  • modules/cookbooks/examples/mongodb_cdc/filter-events.yaml
  • modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml
  • modules/cookbooks/examples/mongodb_cdc/to-redpanda.yaml
  • modules/cookbooks/examples/mongodb_cdc/to-s3.yaml
  • modules/cookbooks/examples/mysql_cdc/basic-capture.yaml
  • modules/cookbooks/examples/mysql_cdc/filter-events.yaml
  • modules/cookbooks/examples/mysql_cdc/route-by-event.yaml
  • modules/cookbooks/examples/mysql_cdc/to-redpanda.yaml
  • modules/cookbooks/examples/mysql_cdc/to-s3.yaml
  • modules/cookbooks/examples/postgres_cdc/basic-capture.yaml
  • modules/cookbooks/examples/postgres_cdc/filter-events.yaml
  • modules/cookbooks/examples/postgres_cdc/route-by-event.yaml
  • modules/cookbooks/examples/postgres_cdc/to-redpanda.yaml
  • modules/cookbooks/examples/postgres_cdc/to-s3.yaml
  • modules/cookbooks/examples/tigerbeetle_cdc/basic-capture.yaml
  • modules/cookbooks/examples/tigerbeetle_cdc/filter-events.yaml
  • modules/cookbooks/examples/tigerbeetle_cdc/route-by-event.yaml
  • modules/cookbooks/examples/tigerbeetle_cdc/to-redpanda.yaml
  • modules/cookbooks/examples/tigerbeetle_cdc/to-s3.yaml
  • modules/cookbooks/pages/gcp_spanner_cdc.adoc
  • modules/cookbooks/pages/microsoft_sql_server_cdc.adoc
  • modules/cookbooks/pages/mongodb_cdc.adoc
  • modules/cookbooks/pages/mysql_cdc.adoc
  • modules/cookbooks/pages/postgres_cdc.adoc
  • modules/cookbooks/pages/tigerbeetle_cdc.adoc

Comment thread modules/cookbooks/examples/tigerbeetle_cdc/route-by-event.yaml Outdated
Comment thread modules/cookbooks/pages/gcp_spanner_cdc.adoc Outdated
Comment thread modules/cookbooks/pages/microsoft_sql_server_cdc.adoc Outdated
Comment thread modules/cookbooks/pages/mysql_cdc.adoc
Comment thread modules/cookbooks/pages/mysql_cdc.adoc
Comment thread modules/cookbooks/pages/postgres_cdc.adoc Outdated
mfernest and others added 10 commits March 12, 2026 16:24
- Replace broker/fan_out pattern (which fails lint) with switch output in all
  6 route-by-event.yaml files (mysql_cdc, postgres_cdc, mongodb_cdc,
  microsoft_sql_server_cdc, gcp_spanner_cdc, tigerbeetle_cdc)
- Update route-by-event prose to match switch semantics
- Add credentials_json YAML snippet to gcp_spanner_cdc prerequisites
- Clarify server_transaction_id key field in gcp_spanner_cdc Route to Redpanda
- Add json("id") assumption note to mysql_cdc Route to Redpanda
- Fix microsoft_sql_server_cdc route-by-event description to match actual events
- Update tigerbeetle_cdc route-by-event to explicitly route two_phase_posted
  (drop other two-phase types)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tigerbeetle_cdc requires a cgo-enabled binary and is not recognized by
the standard rpk connect linter. Add SKIP_LINT flag to test-config.sh
mechanism so cgo-only connector examples can skip lint without failing CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…amples (#381)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: JakeSCahill <45230295+JakeSCahill@users.noreply.github.com>
Co-authored-by: Michael Ernest <michael.ernest@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Joyce Fee <joyce@redpanda.com>
Co-authored-by: micheleRP <michele@redpanda.com>
@mfernest

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Mar 17, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/components/partials/fields/outputs/iceberg.adoc (1)

458-489: ⚠️ Potential issue | 🟠 Major

Drop duplicated field sections for endpoint and region.

This block repeats headings and descriptions that are already defined above, resulting in duplicate field docs.

Suggested fix
-=== `storage.aws_s3.endpoint`
-
-Allows you to specify a custom endpoint for the AWS API.
-
-*Type*: `string`
-
 === `storage.aws_s3.force_path_style_urls`
@@
-=== `storage.aws_s3.region`
-
-The AWS region to target.
-
-*Type*: `string`
-
 === `storage.aws_s3.tcp`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/components/partials/fields/outputs/iceberg.adoc` around lines 458 -
489, Duplicate documentation blocks for the fields storage.aws_s3.endpoint and
storage.aws_s3.region appear in iceberg.adoc; remove the repeated sections so
each field (storage.aws_s3.endpoint and storage.aws_s3.region) is documented
only once, keeping the intended content (types, defaults, and any examples) from
the canonical occurrence and deleting the redundant heading/description entries
(also verify storage.aws_s3.force_path_style_urls remains singular and intact).
♻️ Duplicate comments (2)
modules/cookbooks/pages/postgres_cdc.adoc (1)

174-180: ⚠️ Potential issue | 🟠 Major

PostgreSQL replication privilege command is invalid.

Line 178 uses GRANT REPLICATION ON DATABASE ..., which is not valid PostgreSQL syntax. Use role-level replication privilege instead.

Proposed fix
- GRANT REPLICATION ON DATABASE mydb TO your_user;
+ ALTER ROLE your_user WITH REPLICATION;
  GRANT SELECT ON ALL TABLES IN SCHEMA public TO your_user;
In PostgreSQL, is `GRANT REPLICATION ON DATABASE mydb TO your_user;` valid syntax, or must replication be granted via `ALTER ROLE your_user WITH REPLICATION`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/pages/postgres_cdc.adoc` around lines 174 - 180, The SQL
snippet uses invalid syntax `GRANT REPLICATION ON DATABASE mydb TO your_user;`;
replace this with a role-level replication grant by running `ALTER ROLE
your_user WITH REPLICATION` (and keep the existing `GRANT SELECT ON ALL TABLES
IN SCHEMA public TO your_user;` line), ensuring the example shows granting
replication at the role level rather than per-database; update the documented
SQL example accordingly and keep the placeholder `your_user` for clarity.
modules/cookbooks/pages/gcp_spanner_cdc.adoc (1)

57-62: ⚠️ Potential issue | 🟠 Major

Use a portable base64 command to avoid syntax errors and unwanted line wrapping.

Line 61 uses base64 -i /path/to/service-account.json, which fails on BSD/macOS (where -i requires an argument and the file syntax is invalid) and wraps output at 76 characters on GNU systems, breaking YAML ${GCP_CREDENTIALS} interpolation.

Proposed fix
-export GCP_CREDENTIALS=$(base64 -i /path/to/service-account.json)
+export GCP_CREDENTIALS="$(base64 /path/to/service-account.json | tr -d '\n')"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/cookbooks/pages/gcp_spanner_cdc.adoc` around lines 57 - 62, The docs
use `base64 -i /path/to/service-account.json` which is non-portable and can
insert line wraps; update the example for setting GCP_CREDENTIALS (and the
credentials_json guidance) to use a portable, newline-free base64 command such
as piping the file into base64 and stripping newlines (e.g. use `base64 <
/path/to/service-account.json | tr -d '\n'` or `openssl base64 -A`) so
`${GCP_CREDENTIALS}` does not get wrapped and works on both GNU and BSD/macOS
systems.
🧹 Nitpick comments (1)
modules/components/pages/caches/ttlru.adoc (1)

41-41: Link to the canonical Go documentation for stability instead of tree/main.

Replace the GitHub tree/main link with https://pkg.go.dev/github.com/hashicorp/golang-lru/v2/expirable, which is the official, stable reference for this package. Using tree/main can drift over time and break links in documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/components/pages/caches/ttlru.adoc` at line 41, Update the markdown
link in modules/components/pages/caches/ttlru.adoc that currently points to
"https://github.com/hashicorp/golang-lru/tree/main/expirable" to instead use the
canonical pkg.go.dev URL
"https://pkg.go.dev/github.com/hashicorp/golang-lru/v2/expirable"; locate the
literal string "expirable" link in the file and replace the GitHub tree/main URL
with the pkg.go.dev stable reference so the documentation points to the official
Go docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs-data/connect-4.81.0.json`:
- Line 23729: Update the description string that references `start_time` to
remove the incorrect "exactly-once" claim: find the paragraph that currently
states "IMPORTANT: This input tracks its position in memory only... For
exactly-once processing, you should configure an appropriate start_time..." and
change it to state that the connector provides at-least-once behavior and that
restarts will resume from `start_time` (or the beginning) but can replay data;
instruct users to achieve exactly-once semantics via idempotent downstream
processing or external checkpointing rather than relying on `start_time`.
- Line 29549: The upsert example's comments are reversed relative to the
payload: update the explanatory comments that accompany root.doc.product_price
and root.upsert.product_price so they match Elasticsearch semantics (doc is
applied to existing documents, upsert is used when the document does not exist);
specifically, change the sentence describing root.doc.product_price = 50 to
state it updates existing documents, and change the sentence describing
root.upsert.product_price = 100 to state it is inserted when the document does
not exist (adjust only the comment text near root.doc.product_price and
root.upsert.product_price).

In `@docs-data/overrides.json`:
- Around line 1785-1786: Create a reusable description entry under "definitions"
(e.g., "seed_brokers_with_top_level_redpanda") containing the current
seed_brokers text, then replace the inline "description" values for the
"seed_brokers" fields in both the input and output "redpanda" blocks with a
"$ref" pointing to that definition; also apply the same replacement for the
other duplicate occurrence of "seed_brokers" noted elsewhere so both places
reference the single definition.
- Around line 4007-4022: Remove the duplicate outputs.iceberg override block
(the second iceberg object that contains the "catalog" → "headers" subtree) and
either delete it or merge any unique fields (e.g., ensure schema_evolution and
other properties from both blocks are combined) into the remaining
outputs.iceberg entry so there is a single, unambiguous outputs.iceberg object;
locate references to "outputs.iceberg", "iceberg", and the nested "catalog" /
"headers" keys to reconcile and preserve any missing fields before removing the
duplicate.

In `@modules/components/attachments/connect-4.81.0.json`:
- Line 23753: Update the description text (the "description" field) that
currently states "For exactly-once processing, you should configure an
appropriate start_time or implement idempotent downstream processing" so it no
longer implies start_time provides exactly-once guarantees; instead state that
because the input only tracks its cursor in memory and restarts resume from the
configured start_time (or the beginning), the input may replay or miss events
across restarts and recommend implementing idempotent downstream processing to
achieve exactly-once behavior, referring explicitly to start_time in the new
wording.
- Around line 23796-23803: The example for the "filter_pattern" field currently
uses "[ERROR]" which is misleading for CloudWatch filter syntax; update the
examples array for the "filter_pattern" property to use a simple unambiguous
pattern like "ERROR" (replace the current "[ERROR]" example with "ERROR") so the
documentation demonstrates matching the word ERROR without requiring
CloudWatch-specific quoting or bracket semantics.

In `@modules/components/examples/advanced/inputs/aws_cloudwatch_logs.yaml`:
- Around line 23-30: The example sets credentials.from_ec2_role to an empty
string but the field is documented as a boolean; update the advanced example in
aws_cloudwatch_logs.yaml to use a boolean value (true or false) for
credentials.from_ec2_role so it matches the documented type and won't fail
validation—locate the credentials block and change the from_ec2_role entry to a
boolean value (e.g., from_ec2_role: false) in the credentials section.

In `@modules/components/examples/advanced/outputs/iceberg.yaml`:
- Line 41: The example value for the field named from_ec2_role in
modules/components/examples/advanced/outputs/iceberg.yaml is an empty string but
the field is documented as a bool in
modules/components/partials/fields/outputs/iceberg.adoc; update the example to
use a boolean literal (true or false) instead of "" so the example matches the
documented type and validates correctly.
- Around line 27-28: Remove the duplicate mapping entries for region and
endpoint within the aws_s3 mapping: locate the repeated keys "region" and
"endpoint" (they appear twice in the same mapping) and delete the later
duplicate lines so each key exists only once in the aws_s3 block; ensure the
remaining region and endpoint values are the intended ones (keep the earlier
definitions and remove the second occurrences).

In `@modules/components/examples/common/outputs/iceberg.yaml`:
- Line 41: The example value for from_ec2_role in
modules/components/examples/common/outputs/iceberg.yaml is currently an empty
string but should be a boolean; update the from_ec2_role placeholder to a
boolean literal (true or false) to match the documented type in
modules/components/partials/fields/outputs/iceberg.adoc and ensure any consumers
or examples expect a boolean value.
- Around line 27-28: The example YAML contains duplicated aws_s3 mapping keys:
the aws_s3 block re-declares region and endpoint twice (the keys named "region"
and "endpoint" under the aws_s3 mapping). Remove the duplicate declarations
(keep a single aws_s3 block with one region and one endpoint entry) so the
aws_s3 mapping has unique keys; locate the aws_s3 mapping in iceberg.yaml and
delete or merge the redundant region/endpoint lines to restore a valid example.

In `@modules/components/partials/fields/inputs/aws_cloudwatch_logs.adoc`:
- Around line 81-91: The YAML example for the filter_pattern field currently
shows filter_pattern: [ERROR] which is parsed as a sequence; update the example
to a quoted string to match the declared Type: string — e.g. change the example
to filter_pattern: "[ERROR]" (or use single quotes) in the
aws_cloudwatch_logs.adoc block referencing filter_pattern so copy-paste produces
a string value rather than an array.

In `@modules/cookbooks/examples/test-examples.sh`:
- Around line 162-168: The SKIP_LINT flag and log_skip call are misleading
because the continue skips the whole loop (both lint and the subsequent rpk
connect test), so rename the flag and messaging to reflect skipping the entire
connector test (e.g., SKIP_CONNECT_TEST or SKIP_CGO_CONNECTOR_TEST) and update
uses in the script and any test-configs (e.g., tigerbeetle_cdc/test-config.sh);
alternatively, if you intended to skip only linting, remove the continue so only
lint is bypassed and the rpk connect test still runs, and ensure SKIPPED_TESTS
and TOTAL_TESTS are incremented/adjusted consistently in the same block where
the skip decision is made (references: SKIP_LINT, log_skip, SKIPPED_TESTS,
TOTAL_TESTS, continue, rpk connect test).

In `@modules/cookbooks/pages/mongodb_cdc.adoc`:
- Around line 186-194: The step currently says "Confirm the user has the
required privileges" but runs db.grantRolesToUser("your_user", [...]) which
mutates state; either change the instruction text to "Grant the required
privileges" to match db.grantRolesToUser, or replace the code example with a
read-only check such as db.getUser("your_user") (or another read-only
inspection) so the command matches the “confirm” wording; update the surrounding
sentence accordingly and ensure references to db.grantRolesToUser and db.getUser
are clear.

---

Outside diff comments:
In `@modules/components/partials/fields/outputs/iceberg.adoc`:
- Around line 458-489: Duplicate documentation blocks for the fields
storage.aws_s3.endpoint and storage.aws_s3.region appear in iceberg.adoc; remove
the repeated sections so each field (storage.aws_s3.endpoint and
storage.aws_s3.region) is documented only once, keeping the intended content
(types, defaults, and any examples) from the canonical occurrence and deleting
the redundant heading/description entries (also verify
storage.aws_s3.force_path_style_urls remains singular and intact).

---

Duplicate comments:
In `@modules/cookbooks/pages/gcp_spanner_cdc.adoc`:
- Around line 57-62: The docs use `base64 -i /path/to/service-account.json`
which is non-portable and can insert line wraps; update the example for setting
GCP_CREDENTIALS (and the credentials_json guidance) to use a portable,
newline-free base64 command such as piping the file into base64 and stripping
newlines (e.g. use `base64 < /path/to/service-account.json | tr -d '\n'` or
`openssl base64 -A`) so `${GCP_CREDENTIALS}` does not get wrapped and works on
both GNU and BSD/macOS systems.

In `@modules/cookbooks/pages/postgres_cdc.adoc`:
- Around line 174-180: The SQL snippet uses invalid syntax `GRANT REPLICATION ON
DATABASE mydb TO your_user;`; replace this with a role-level replication grant
by running `ALTER ROLE your_user WITH REPLICATION` (and keep the existing `GRANT
SELECT ON ALL TABLES IN SCHEMA public TO your_user;` line), ensuring the example
shows granting replication at the role level rather than per-database; update
the documented SQL example accordingly and keep the placeholder `your_user` for
clarity.

---

Nitpick comments:
In `@modules/components/pages/caches/ttlru.adoc`:
- Line 41: Update the markdown link in
modules/components/pages/caches/ttlru.adoc that currently points to
"https://github.com/hashicorp/golang-lru/tree/main/expirable" to instead use the
canonical pkg.go.dev URL
"https://pkg.go.dev/github.com/hashicorp/golang-lru/v2/expirable"; locate the
literal string "expirable" link in the file and replace the GitHub tree/main URL
with the pkg.go.dev stable reference so the documentation points to the official
Go docs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 880edfdc-5d08-476c-b43c-6526918f5619

📥 Commits

Reviewing files that changed from the base of the PR and between d07196d and f91f750.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (34)
  • docs-data/connect-4.81.0.json
  • docs-data/connect-diff-4.79.0_to_4.81.0.json
  • docs-data/overrides.json
  • modules/ROOT/nav.adoc
  • modules/components/attachments/connect-4.81.0.json
  • modules/components/examples/advanced/inputs/aws_cloudwatch_logs.yaml
  • modules/components/examples/advanced/outputs/iceberg.yaml
  • modules/components/examples/common/caches/multilevel.yaml
  • modules/components/examples/common/caches/noop.yaml
  • modules/components/examples/common/inputs/aws_cloudwatch_logs.yaml
  • modules/components/examples/common/outputs/iceberg.yaml
  • modules/components/pages/caches/ttlru.adoc
  • modules/components/pages/inputs/aws_cloudwatch_logs.adoc
  • modules/components/pages/outputs/iceberg.adoc
  • modules/components/partials/fields/inputs/aws_cloudwatch_logs.adoc
  • modules/components/partials/fields/inputs/redpanda.adoc
  • modules/components/partials/fields/outputs/iceberg.adoc
  • modules/components/partials/fields/outputs/redpanda.adoc
  • modules/cookbooks/examples/gcp_spanner_cdc/route-by-event.yaml
  • modules/cookbooks/examples/microsoft_sql_server_cdc/route-by-event.yaml
  • modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml
  • modules/cookbooks/examples/mysql_cdc/route-by-event.yaml
  • modules/cookbooks/examples/postgres_cdc/route-by-event.yaml
  • modules/cookbooks/examples/test-examples.sh
  • modules/cookbooks/examples/tigerbeetle_cdc/route-by-event.yaml
  • modules/cookbooks/examples/tigerbeetle_cdc/test-config.sh
  • modules/cookbooks/pages/gcp_spanner_cdc.adoc
  • modules/cookbooks/pages/microsoft_sql_server_cdc.adoc
  • modules/cookbooks/pages/mongodb_cdc.adoc
  • modules/cookbooks/pages/mysql_cdc.adoc
  • modules/cookbooks/pages/postgres_cdc.adoc
  • modules/cookbooks/pages/tigerbeetle_cdc.adoc
  • modules/get-started/pages/whats-new.adoc
  • modules/guides/pages/bloblang/about.adoc
💤 Files with no reviewable changes (2)
  • modules/components/examples/common/caches/noop.yaml
  • modules/components/examples/common/caches/multilevel.yaml
✅ Files skipped from review due to trivial changes (2)
  • modules/components/pages/inputs/aws_cloudwatch_logs.adoc
  • modules/guides/pages/bloblang/about.adoc
🚧 Files skipped from review as they are similar to previous changes (5)
  • modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml
  • modules/ROOT/nav.adoc
  • modules/cookbooks/examples/postgres_cdc/route-by-event.yaml
  • modules/cookbooks/pages/tigerbeetle_cdc.adoc
  • modules/cookbooks/pages/microsoft_sql_server_cdc.adoc

Comment thread docs-data/connect-4.81.0.json Outdated
Comment thread docs-data/connect-4.81.0.json Outdated
Comment thread docs-data/overrides.json
Comment thread docs-data/overrides.json
Comment thread modules/components/attachments/connect-4.81.0.json Outdated
Comment thread modules/components/examples/common/outputs/iceberg.yaml Outdated
Comment thread modules/components/examples/common/outputs/iceberg.yaml Outdated
Comment thread modules/components/partials/fields/inputs/aws_cloudwatch_logs.adoc
Comment thread modules/cookbooks/examples/test-examples.sh Outdated
Comment thread modules/cookbooks/pages/mongodb_cdc.adoc Outdated
mfernest and others added 2 commits March 17, 2026 10:58
- postgres_cdc: active voice in prerequisites intro
- mysql_cdc: normalize Troubleshoot and Next steps headings
- tigerbeetle_cdc: active voice in TigerBeetle cluster prereq
- gcp_spanner_cdc: consistent at-least-once delivery phrasing
- microsoft_sql_server_cdc: stronger phrasing for permissions requirement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kbooks

# Conflicts:
#	docs-data/connect-diff-4.79.0_to_4.81.0.json
Copilot AI and others added 3 commits June 3, 2026 18:34
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Ashley Jeffs <ash@jeffail.uk>

Copilot AI commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Resolved in 8836006.

Copilot AI requested a review from Feediver1 June 3, 2026 18:36

@Feediver1 Feediver1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs standards review (delta since the last substantive review)

Scope: Deltas since @Jeffail's 2026-03-18 CHANGES_REQUESTED and the related March activity, focusing on what landed today (2026-06-03) in the conflict-resolution + apply-suggestions push.
Overall: Most of Jeffail's Mar 18 critical feedback is now resolved (GCP Spanner body-vs-metadata fully fixed; SQL Server schema → database_schema rename mostly applied). But the new commit f2af218 ("Apply suggestions from code review") introduced three regressions that are worse than the original issues — two structurally broken YAML files and one prose/YAML contradiction in mysql_cdc.adoc. These need fixing before merge.

Activity timeline since the last substantive review

Date Actor Action
2026-03-18 @Jeffail CHANGES_REQUESTED — body-vs-metadata for Spanner, schema rename for SQL Server
2026-03-19 @rockwotj APPROVED for mysql + postgres
2026-03-18 → 23 @mfernest Pushed c79adfc, b622cfa, 5e63346, 237d3bd, d3468f4 addressing Jeffail's feedback
(PR went quiet for ~2.5 months)
2026-06-03 18:30 @Feediver1 Asked Copilot to resolve merge conflicts
2026-06-03 18:34 copilot-swe-agent Merged origin/main
2026-06-03 18:35 @Feediver1 f2af218 — Apply suggestions from code review (problem commit)
2026-06-03 18:41 github-actions Regenerated overrides

Critical issues newly introduced by f2af218

1. modules/cookbooks/examples/microsoft_sql_server_cdc/basic-capture.yamlroot.table field destroyed, duplicate root.schema added

Current state (lines 13–15):

root.operation = meta("operation")
root.schema = meta("database_schema")
root.schema = meta("database_schema")    # DUPLICATE; root.table = meta("table") is gone
root.lsn = meta("lsn")

The accepted suggestion overwrote root.table = meta("table") with another root.schema line instead of adding alongside it. The captured event now has no table name, and the prose still describes a table field.

Fix: restore as:

root.operation = meta("operation")
root.schema = meta("database_schema")
root.table = meta("table")
root.lsn = meta("lsn")

2. modules/cookbooks/examples/microsoft_sql_server_cdc/to-s3.yamlpath: key deleted, output is structurally broken YAML

Current state (lines 18–23):

output:
  aws_s3:
    bucket: ${S3_BUCKET}
      cdc/${! meta("database_schema") }/${! meta("table") }/${! ... }.ndjson    # orphan indented string
      cdc/${! meta("database_schema") }/${! meta("table") }/${! ... }.ndjson    # duplicate orphan
    batching:

The path: >- key was overwritten. The two orphan lines under bucket: will either be folded into bucket as a multi-line string or trigger a YAML parse error. Either way, the aws_s3.path is now missing — objects will be written to the bucket root with default names, not the cdc/<schema>/<table>/... partitioned path the page documents.

Fix: restore as:

output:
  aws_s3:
    bucket: ${S3_BUCKET}
    path: >-
      cdc/${! meta("database_schema") }/${! meta("table") }/${! timestamp_unix().format_timestamp("2006/01/02/15") }/${! uuid_v4() }.ndjson
    batching:

CI passes because the test harness apparently doesn't validate the aws_s3 output schema strictly on this file (similar to the SKIP_LINT path that was added for tigerbeetle_cdc). Worth confirming whether this file is actually exercised — if it is, lint should have caught it; if not, that's a coverage gap.

3. modules/cookbooks/pages/mysql_cdc.adoc — filter-events prose now contradicts the YAML

Current prose (line 101):

- Filters to only `insert` operations, dropping `update`, `delete`, and snapshot `read` events
- Transforms the event to a simplified format with a timestamp

But modules/cookbooks/examples/mysql_cdc/filter-events.yaml actually does this:

root = if meta("operation") == "delete" {
  deleted()
} else {
  { "operation": meta("operation"), "table": meta("table"), ... }
}

The YAML drops only delete events and keeps everything else (inserts, updates, snapshot reads). The new prose claims it drops "update, delete, and snapshot read" — wrong by three out of four event types. Readers following the doc will be confused when their pipeline emits updates that the prose says are filtered out.

Fix options:

  • (a) Revert prose to "Filters out delete events, passing through insert and update operations" (matches current YAML).
  • (b) Change the YAML to actually match the new prose (drop everything except insert).

Option (a) is the smaller change and consistent with the YAML's current logic.

Minor formatting nit (same file, same section): the bullet list lost its trailing blank line, so the bullets run straight into == Route to Redpanda without separation. Render-tolerant, but worth fixing while in there.

What's resolved well since March

  • GCP Spanner body-vs-metadata feedback (Jeffail Mar 18) — fully addressed. Spot-checked basic-capture.yaml, to-redpanda.yaml, to-s3.yaml: all four metadata fields (mod_type, table_name, commit_timestamp, server_transaction_id) now correctly use meta(...). The this.mods reference is gone. ✓
  • SQL Server meta("schema") → meta("database_schema") rename applied across the three SQL Server YAML examples in the field paths. The database_schema value flows correctly in to-redpanda.yaml (meta topic = meta("database_schema") + "." + meta("table")). The basic-capture and to-s3 files carry the rename but were collateral-damaged by Critical 1 + 2 above.
  • Pre-existing CI failures on dynamodb_cdc appear to be resolved or skipped now; all checks green on the current head.
  • Merge of origin/main was clean (Copilot resolved it). No content conflicts in the diff.

Process notes

  • @mfernest has left Redpanda. Joyce/Jake taking over makes sense given the goal of landing this PR. The Mar 23 → Jun 3 silence is explained by that handoff.
  • f2af218 is the only commit that needs surgery — everything else is fine. Cleanest path: a follow-up commit that reverts the three damaged hunks rather than a full revert of f2af218. (All three of f2af218's hunks are damaged, so a full revert of just that commit is also safe.)

CodeRabbit findings

None new since March. CodeRabbit has been silent on the recent commits (Review skipped for the latest head).

What works well

  • Six cookbook pages covering MySQL, PostgreSQL, MongoDB, SQL Server, TigerBeetle, GCP Spanner — meaningful breadth; the consistent 5-pattern structure (capture/filter/route-to-Redpanda/route-to-S3/route-by-event) makes them scannable.
  • Learning objectives upgraded from awareness-tier (Find, Look up) to skill-tier (Apply, Adapt) per the Mar 18 + Mar 24 style audits.
  • Style fixups landed properly: em dashes replaced with definition lists, [source,yaml] standardized to [,yaml], active voice in prerequisites, consistent "Configure replication mode" heading.
  • Placeholder names (my-cdc-bucket, my-table) replaced with realistic examples (cdc-archive, orders) — readers can copy/adapt without first deciphering "is my-X a placeholder or a literal?".
  • SKIP_LINT mechanism in test-examples.sh for cgo-only connectors (TigerBeetle) — clean solution that doesn't disable testing for everything else.
  • DOC-1880 broker disambiguation note ("broker output ≠ Redpanda broker") is exactly the right kind of disambiguation for newcomers.

Recommended next step

Open a follow-up commit that reverts the three damaged hunks in f2af218:

  1. microsoft_sql_server_cdc/basic-capture.yaml line 14: restore root.table = meta("table")
  2. microsoft_sql_server_cdc/to-s3.yaml lines 21–22: restore path: >- key with single value
  3. mysql_cdc.adoc lines 101–103: restore prose to match actual YAML semantics + restore blank line before == heading

Once those land, the PR looks ready for final approval — the rest of the deltas since Jeffail's review are net positive.

Feediver1 and others added 2 commits June 3, 2026 14:55
The apply-suggestions commit overwrote rather than added in three places:

- SQL Server basic-capture.yaml: root.table = meta("table") was replaced
  with a duplicate root.schema line, dropping the table field entirely.
- SQL Server to-s3.yaml: the path: >- key was deleted, leaving two orphan
  indented strings under bucket: that broke the aws_s3 output config.
- mysql_cdc.adoc filter-events prose: the description claimed the YAML
  drops update/delete/snapshot-read events, but the YAML only drops
  delete events. Reverts prose to match actual YAML semantics and
  restores the blank line before the next == heading.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The aws_dynamodb_cdc connector renamed the singular `table:` field to a
plural `tables:` list. The six example YAMLs were updated when main
merged, but the two inline YAML examples in the cookbook page still
showed the old singular form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Feediver1

Copy link
Copy Markdown
Contributor

Final PR Review: feat(doc-1771): CDC cookbooks (#392)

Head: 22e18af · Files: 7 cookbook .adoc pages + 30 example YAMLs + 1 nav.adoc + 1 test-examples.sh + 1
broker.adoc (DOC-1880 bundled) + auto-regenerated content from main merge

Overall: ✅ Ready to merge once Jeffail's stale CHANGES_REQUESTED is cleared. All technical content is
correct, all Ash's suggestions are implemented, build is green, content is tagged correctly.

Ash's suggestions — verification table

#: 1
Ash's suggestion: SQL Server: rename meta("schema") → meta("database_schema") in basic-capture.yaml
Current code: Line 13: root.schema = meta("database_schema")
Status: ✅
────────────────────────────────────────
#: 2
Ash's suggestion: SQL Server: rename in to-redpanda.yaml topic name
Current code: meta topic = meta("database_schema") + "." + meta("table")
Status: ✅
────────────────────────────────────────
#: 3
Ash's suggestion: SQL Server: rename in to-s3.yaml root.schema
Current code: Uses meta("database_schema")
Status: ✅
────────────────────────────────────────
#: 4
Ash's suggestion: SQL Server: rename in to-s3.yaml S3 path
Current code: cdc/${! meta("database_schema") }/${! meta("table") }/...
Status: ✅
────────────────────────────────────────
#: 5
Ash's suggestion: SQL Server page prose: mention database_schema.table for topic name
Current code: Line 113: "Uses database_schema.table as the Redpanda topic name"
Status: ✅
────────────────────────────────────────
#: 6
Ash's suggestion: Spanner: mod_type/table_name/commit_timestamp are metadata, use meta(...) in
basic-capture.yaml
Current code: All four fields use meta(...)
Status: ✅
────────────────────────────────────────
#: 7
Ash's suggestion: Spanner: filter-events.yaml — this.mods doesn't exist (each mod is its own message)
Current code: YAML rewritten; no this.mods references
Status: ✅
────────────────────────────────────────
#: 8
Ash's suggestion: Spanner: meta topic = meta("table_name") in to-redpanda.yaml
Current code: Line 13: meta topic = meta("table_name")
Status: ✅
────────────────────────────────────────
#: 9
Ash's suggestion: Spanner: key: ${! meta("server_transaction_id") } in to-redpanda.yaml
Current code: Line 17: key: ${! meta("server_transaction_id") }
Status: ✅
────────────────────────────────────────
#: 10
Ash's suggestion: Spanner: metadata fields in to-s3.yaml body mapping
Current code: root.mod_type = meta("mod_type") etc.
Status: ✅
────────────────────────────────────────
#: 11
Ash's suggestion: Spanner: to-s3.yaml S3 path uses meta("table_name")
Current code: cdc/${! meta("table_name") }/...
Status: ✅
────────────────────────────────────────
#: 12
Ash's suggestion: Spanner: route-by-event.yaml — check: meta("mod_type") not this.mod_type
Current code: Line 12: check: meta("mod_type") == "INSERT"
Status: ✅
────────────────────────────────────────
#: 13
Ash's suggestion: Spanner page prose: clarify meta-vs-body distinction
Current code: Line 112: "Fields such as mod_type... are available as message metadata (use
meta("field_name")). The message body contains the row data with keys, new_values, and old_values."
Status: ✅

All 13 review threads marked resolved=true on GitHub. ✓

Build / CI

┌─────────────────────────────────────────┬───────────────────────────────────┐
│                  Check                  │              Status               │
├─────────────────────────────────────────┼───────────────────────────────────┤
│ Unit Tests                              │ ✅ pass                           │
├─────────────────────────────────────────┼───────────────────────────────────┤
│ Integration Tests — dynamodb_cdc        │ ✅ pass                           │
├─────────────────────────────────────────┼───────────────────────────────────┤
│ Integration Tests — oracledb_cdc        │ ✅ pass                           │
├─────────────────────────────────────────┼───────────────────────────────────┤
│ Test Summary                            │ ✅ pass                           │
├─────────────────────────────────────────┼───────────────────────────────────┤
│ netlify/redpanda-connect/deploy-preview │ ✅ Deploy Preview ready           │
├─────────────────────────────────────────┼───────────────────────────────────┤
│ Header/Pages/Redirect rules             │ skipping (no config-file changes) │
└─────────────────────────────────────────┴───────────────────────────────────┘

Spot-checked rendered output on the Netlify preview at
https://deploy-preview-392--redpanda-connect.netlify.app/connect/cookbooks// — all 6 new pages
return HTTP 200 and render their full section structure (Prerequisites → Capture → Filter → Route to
Redpanda → Route to S3 → Route by event type → Configure replication mode → Troubleshoot → Next steps). No unresolved xrefs, no broken includes detected in the rendered HTML.

Content tagging — verification

Single-source tags ✓ — All 7 cookbook pages wrap content in // tag::single-source[] / //
end::single-source[], so cloud-docs can include them via the streaming-component pattern.

Env gating ✓ — All 7 pages use ifndef::env-cloud[] / ifdef::env-cloud[] blocks to distinguish the rpk
installation xref between self-managed (xref:get-started:quickstarts/rpk.adoc) and cloud
(xref:manage:rpk/rpk-install.adoc). Block-level conditionals throughout, no inline ifdef anti-patterns.

Frontmatter ✓ — All 6 new pages have:

  • :description: — concise and audience-focused
  • :page-categories: Streaming, Integration (Spanner adds AWS correctly removed; dynamodb has Streaming,
    Integration, AWS)
  • :page-topic-type: cookbook
  • :personas: streaming_developer, data_engineer
  • Three :learning-objective-N: attributes using skill-tier Bloom's verbs (Apply, Adapt, Identify) —
    upgraded from awareness-tier in commit 5e63346

Page titles ✓ — Title Case: "MySQL CDC Patterns", "PostgreSQL CDC Patterns", "MongoDB CDC Patterns",
"SQL Server CDC Patterns", "TigerBeetle CDC Patterns", "Google Cloud Spanner CDC Patterns". Consistent
with the team rule (page titles use title case).

H2 sentence case ✓ — Per the rendered output: "Capture CDC events", "Filter CDC events", "Route to
Redpanda", "Route to S3", "Route by event type", "Configure replication mode", "Troubleshoot common
issues". Sentence case throughout.

Nav entries ✓ — modules/ROOT/nav.adoc lines 394–399 add all six new cookbooks under the cookbooks
section.

Remaining items

  1. Stale reviewDecision: CHANGES_REQUESTED — Jeffail's Mar 18 CHANGES_REQUESTED review hasn't been
    dismissed. Even though all his threads are resolved and all his suggestions are implemented, the merge
    is BLOCKED until either:
  • @Jeffail re-reviews and approves, or

  • An admin dismisses the stale review.

    This is the only thing standing between this PR and merge.

  1. Stale URLs in PR description — The "Page previews" section in the PR body lists URLs like
    /redpanda-connect/cookbooks/mysql_cdc/, but the component was renamed redpanda-connect → connect during the 2.5-month gap (via PR Add unified navigation attributes and rename component #427). Current URLs are /connect/cookbooks/mysql_cdc/. The actual pages render fine; just the description links 404. Optional cleanup — /add-preview-links can refresh these.

  2. DOC-1880 broker page bundle — Commit 5c09420 (broker output page intro +
    Redpanda-broker-vs-broker-output disambiguation) is unrelated to DOC-1771 but rides along in this PR.
    Good content, not a regression. Land together is fine; could also be split if you prefer a cleaner
    audit trail per-ticket.

  3. No glossterm usage — None of the 7 cookbook pages use glossterm: macros. CDC-domain terms (binlog, change stream, replication slot, WAL) probably aren't in the glossary yet. Not blocking — pages read
    clearly without them; this could be a future enhancement once the relevant glossary entries exist.

Verdict

Merge-ready technically. CI is green; content is correct; tagging is consistent; rendered preview verifies.

@Jeffail Jeffail left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you to the distinguished author for these comprehensive CDC cookbooks.

One correctness issue surfaced while validating the MongoDB cookbook against the mongodb_cdc input source in redpanda-data/connect: the input never emits operation: create. The value is passed through verbatim from MongoDB's change-stream operationType (internal/impl/mongodb/cdc/input.go:796:970), and the switch at :802-905 only handles insert, update, replace, and delete (every other type hits default: continue and is dropped). Snapshot reads are emitted as read (:698). The connector's own field documentation confirms this at :72. The emitted set is therefore insert, update, replace, delete, and read — never create.

Three suggestions follow to replace create with insert.

Comment thread modules/cookbooks/examples/mongodb_cdc/route-by-event.yaml Outdated
Comment thread modules/cookbooks/pages/mongodb_cdc.adoc Outdated
Comment thread modules/cookbooks/pages/mongodb_cdc.adoc Outdated
@Jeffail

Jeffail commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Recommendation: use the redpanda output instead of kafka_franz

The cookbook examples route to Redpanda via the kafka_franz output. Since these sections are titled "Route to Redpanda" — and each cookbook's Next steps already links to xref:components:outputs/redpanda.adoc[Redpanda output] — the redpanda output is the more idiomatic and consistent choice for Redpanda-native destinations. Switching also keeps the prose, the cross-references, and the example configurations aligned.

This affects 12 example files (the to-redpanda.yaml and route-by-event.yaml pair for each of the six connectors):

  • modules/cookbooks/examples/{mysql_cdc,postgres_cdc,mongodb_cdc,microsoft_sql_server_cdc,tigerbeetle_cdc,gcp_spanner_cdc}/to-redpanda.yaml
  • modules/cookbooks/examples/{mysql_cdc,postgres_cdc,mongodb_cdc,microsoft_sql_server_cdc,tigerbeetle_cdc,gcp_spanner_cdc}/route-by-event.yaml

Note that field names differ slightly between the two outputs — for example, kafka_franz's seed_brokers maps to the redpanda output's seed_brokers as well, but it would be worth a quick check against the redpanda output reference to confirm the batching, key, and topic fields carry over unchanged before applying the swap across all twelve files.

Feediver1 and others added 5 commits June 4, 2026 10:50
Co-authored-by: Ashley Jeffs <ash@jeffail.uk>
Co-authored-by: Ashley Jeffs <ash@jeffail.uk>
Co-authored-by: Ashley Jeffs <ash@jeffail.uk>
Switch the to-redpanda.yaml and route-by-event.yaml examples for all
six CDC cookbooks from kafka_franz to the redpanda output. The
seed_brokers, topic, key, and batching fields carry over unchanged
between the two outputs, so the swap is one keyword per occurrence.

The redpanda output is the idiomatic choice for Redpanda destinations
and aligns the examples with the cookbook section titles ("Route to
Redpanda") and the existing Next steps xrefs to the redpanda output
reference.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Feediver1

Copy link
Copy Markdown
Contributor

Recommendation: use the redpanda output instead of kafka_franz

The cookbook examples route to Redpanda via the kafka_franz output. Since these sections are titled "Route to Redpanda" — and each cookbook's Next steps already links to xref:components:outputs/redpanda.adoc[Redpanda output] — the redpanda output is the more idiomatic and consistent choice for Redpanda-native destinations. Switching also keeps the prose, the cross-references, and the example configurations aligned.

This affects 12 example files (the to-redpanda.yaml and route-by-event.yaml pair for each of the six connectors):

  • modules/cookbooks/examples/{mysql_cdc,postgres_cdc,mongodb_cdc,microsoft_sql_server_cdc,tigerbeetle_cdc,gcp_spanner_cdc}/to-redpanda.yaml
  • modules/cookbooks/examples/{mysql_cdc,postgres_cdc,mongodb_cdc,microsoft_sql_server_cdc,tigerbeetle_cdc,gcp_spanner_cdc}/route-by-event.yaml

Note that field names differ slightly between the two outputs — for example, kafka_franz's seed_brokers maps to the redpanda output's seed_brokers as well, but it would be worth a quick check against the redpanda output reference to confirm the batching, key, and topic fields carry over unchanged before applying the swap across all twelve files.

UPDATED:

  • 12 files swapped: all 6 to-redpanda.yaml + all 6 route-by-event.yaml files. kafka_franz: → redpanda:
    (18 total occurrences).
    • Field compatibility verified before swap: seed_brokers, topic, key, and batching.{count,period,...}
      are identical between the two outputs, so no field renames needed.
    • Merge with concurrent activity preserved: Joyce's meta("operation") == "create" → "insert" fix on
      mongodb_cdc/route-by-event.yaml (commit f35a3b6) and the two mongodb_cdc.adoc prose fixes auto-merged
      cleanly with my swap on the same file.
    • Final state confirmed: zero kafka_franz references remain in any of the 12 target files; the mongodb
      route-by-event.yaml correctly shows the combined result (check: meta("operation") == "insert" +
      redpanda: output).

Feediver1 and others added 2 commits June 4, 2026 13:07
The flag's continue jumps to the next iteration of the loop, so both
the lint step and the subsequent rpk connect test are skipped — not
just lint as the name suggested. The standard rpk binary can't run
either step against a cgo-only connector like tigerbeetle_cdc, so the
"skip both" behavior is correct; only the name and log message needed
to reflect it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- microsoft_sql_server_cdc.adoc: name the SQL Server CDC operations
  (insert, update_before, update_after, delete) and the target Redpanda
  topics in the route-by-event bullet, replacing the vague "all other
  change events" phrasing.
- postgres_cdc.adoc: add a NOTE that ALTER ROLE ... WITH REPLICATION
  requires superuser privileges, so readers don't hit an unexpected
  permissions error.

The other ten CodeRabbit findings were already addressed in earlier
commits on this branch (the f2af218 revert, the stream_snapshot: false
fixes, the SKIP_CONNECT_TEST rename, the create -> insert MongoDB
operation fix, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Feediver1

Copy link
Copy Markdown
Contributor

Re CodeRabbit reviews:

Pushed 9b90aba with the two genuinely actionable fixes:

  1. microsoft_sql_server_cdc.adoc:143 — vague "all other change events" → exact SQL Server CDC operation
    names (update_before, update_after, delete) plus the target topics, so readers know what each route
    catches without having to look at the YAML.
  2. postgres_cdc.adoc:174–180 — added a NOTE that ALTER ROLE ... WITH REPLICATION requires superuser
    privileges, since most users running through this troubleshooting step won't be superuser by default
    and would otherwise hit a confusing permissions error.

The other 10 CodeRabbit findings were already resolved earlier in this session — they read as stale
because CodeRabbit posted them against 06e6d7e/f2af218-era state, and the subsequent revert (26b6e60),
stream_snapshot: false push (b622cfa), create→insert change (f35a3b6), SKIP_CONNECT_TEST rename
(79c4fc3), and others addressed them.

@Feediver1 Feediver1 requested a review from Jeffail June 4, 2026 17:27

@kbatuigas kbatuigas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Just a couple of things to flag per CC, but not blockers:

  1. [all 6 cookbooks] Definition list syntax in "Configure replication mode" sections. Still uses \stream_snapshot: true:: Captures...instead of the preferred unordered list format. Also present ingcp_spanner_cdc.adoc` for authentication types.
  2. [tigerbeetle_cdc.adoc:90] Em dash still present: "Filter events to process only settled transfers — single_phase... — using". Per style guide, replace with parentheses or colons.

JakeSCahill and others added 3 commits June 16, 2026 17:15
…ts, remove em dashes

Per CodeRabbit/kbatuigas review feedback:
- Convert stream_snapshot replication-mode definition lists to unordered lists
  in mysql, postgres, mongodb, and microsoft_sql_server cookbooks
- Convert GCP authentication-types definition list to unordered list
- Replace em-dash parenthetical with parentheses in tigerbeetle filter section

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…kbooks

# Conflicts:
#	docs-data/connect-diff-4.96.1_to_4.96.2.json
@JakeSCahill JakeSCahill merged commit 73bacad into main Jun 17, 2026
9 checks passed
@JakeSCahill JakeSCahill deleted the feat/doc-1771-cdc-cookbooks branch June 17, 2026 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants