Skip to content

fix(connector_sdk) : Adding IBM db2 log based replication example #546

Merged
fivetran-JenasVimal merged 12 commits intomainfrom
ibm_db2_log_based_replication
Apr 14, 2026
Merged

fix(connector_sdk) : Adding IBM db2 log based replication example #546
fivetran-JenasVimal merged 12 commits intomainfrom
ibm_db2_log_based_replication

Conversation

@fivetran-JenasVimal
Copy link
Copy Markdown
Contributor

@fivetran-JenasVimal fivetran-JenasVimal commented Mar 23, 2026

Jira ticket

Closes RD-971676

Description of Change

Adding a new ibm db2 log based replication example
IBM Db2 Log-Based Replication Connector

This connector syncs data from IBM Db2 to your destination using log-based replication — instead of repeatedly querying the source table, it watches Db2's transaction log for changes and
syncs only what changed.

How it works

  1. First sync: Reads all rows from the source table and loads them into the destination.
  2. Every sync after that: Picks up only the rows that were inserted, updated, or deleted since the last sync — no full scans.
  3. Progress is saved every 500 rows, so if a sync is interrupted, it resumes where it left off instead of starting over.

Testing

Fivetran debug

image

incremental sync : Leo was added

image

Duckdb warehouse

2026-03-23_20-49-54

Checklist

Some tips and links to help validate your PR:

  • Tested the connector with fivetran debug command.
  • Added/Updated example-specific README.md file, see the README template for the required structure and guidelines.
  • Followed Python Coding Standards, refer here

Copilot AI review requested due to automatic review settings March 23, 2026 15:22
@fivetran-JenasVimal fivetran-JenasVimal requested review from a team as code owners March 23, 2026 15:22
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 23, 2026

🧹 Python Code Quality Check

✅ No issues found in Python Files.

🔍 See how this check works

This comment is auto-updated with every commit.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Connector SDK example demonstrating IBM Db2 log-based replication (CDC) using the ASN Capture / Change Data (CD) table approach.

Changes:

  • Introduces a new ibm_db2_log_based_replication connector implementation that performs an initial load and then applies changes from ASN.IBMSNAP_EMPCD.
  • Adds example documentation (README.md) describing the setup and how the CDC pipeline works.
  • Adds requirements.txt and configuration.json for the example connector.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.

File Description
connectors/ibm_db2_log_based_replication/connector.py New connector implementation for initial load + CD-table-driven incremental sync
connectors/ibm_db2_log_based_replication/configuration.json Example configuration added for running the connector
connectors/ibm_db2_log_based_replication/requirements.txt Adds ibm_db dependency pin
connectors/ibm_db2_log_based_replication/README.md Documentation for setup, configuration, and behavior

Comment on lines +42 to +47
# ASN schema and CD table name as created by setup_cdc.sh
ASN_SCHEMA = "ASN"
CD_TABLE = "IBMSNAP_EMPCD"

# How many CD rows to process before writing an intermediate checkpoint
CHECKPOINT_INTERVAL = 500
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module constants don’t follow the repo’s convention of private, double-underscore, upper snake case for connector constants (e.g., __CHECKPOINT_INTERVAL). Rename these constants accordingly to align with the Connector SDK Python guidelines used in this repo.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +316 to +318
# Save the final state so the next sync knows where to continue from.
# Learn more about checkpointing:
# https://fivetran.com/docs/connectors/connector-sdk/best-practices#largedatasetrecommendation
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This op.checkpoint() call is also missing the required standard checkpoint comment block immediately above it (the repo expects the full checkpoint explanation before every checkpoint operation).

Suggested change
# Save the final state so the next sync knows where to continue from.
# Learn more about checkpointing:
# https://fivetran.com/docs/connectors/connector-sdk/best-practices#largedatasetrecommendation
# Save the progress by checkpointing the state. This is important for ensuring that the sync process can resume
# from the correct position in case of next sync or interruptions.
# You should checkpoint even if you are not using incremental sync, as it tells Fivetran it is safe to write to destination.
# For large datasets, checkpoint regularly (e.g., every N records) not only at the end.
# Learn more about how and where to checkpoint by reading our best practices documentation
# (https://fivetran.com/docs/connector-sdk/best-practices#optimizingperformancewhenhandlinglargedatasets).

Copilot uses AI. Check for mistakes.
Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/configuration.json Outdated
Comment on lines +300 to +305

if last_log_marker is None:
# ── First sync: full initial load ──────────────────────────────────
# Reads directly from the source EMPLOYEE table once to populate the
# destination. After this, all changes come from the ASN CD table.
log.info("No previous state found. Starting initial full load.")
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the connector crashes during perform_initial_load(), the intermediate checkpoints set initial_load_complete=False, but update() only checks for last_log_marker and will skip the initial load on the next run. This can leave the destination partially loaded forever. Use initial_load_complete to decide whether to (re)run/continue the initial load, or store a resumable cursor for the initial scan (e.g., last processed PK) and only set initial_load_complete=True after the scan finishes successfully.

Suggested change
if last_log_marker is None:
# ── First sync: full initial load ──────────────────────────────────
# Reads directly from the source EMPLOYEE table once to populate the
# destination. After this, all changes come from the ASN CD table.
log.info("No previous state found. Starting initial full load.")
initial_load_complete = state.get("initial_load_complete")
if last_log_marker is None or not initial_load_complete:
# ── First sync or incomplete initial load: full initial load ────────
# Reads directly from the source EMPLOYEE table once to populate the
# destination. After this, all changes come from the ASN CD table.
if last_log_marker is None:
log.info("No previous state found. Starting initial full load.")
else:
log.warning(
"Previous state indicates the initial load did not complete successfully. "
"Restarting initial full load."
)

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +33
## Getting started

### 1. Start the Db2 Docker container

```bash
docker-compose up -d
```
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ## Getting started section should include the standard Setup Guide sentence from the README template, and headings should not include numbers (e.g., ### 1. ...). Also, the README is missing the required ## Features section from the example README structure.

Copilot uses AI. Check for mistakes.
"""
Define the schema function which lets you configure the schema your connector delivers.
See the technical reference documentation for more details on the schema function:
https://fivetran.com/docs/connectors/connector-sdk/technical-reference#schema
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema() docstring doesn’t match the required template (notably the documentation link path). In this repo, the schema docstring is expected to match the template connector’s wording/link exactly for consistency.

Suggested change
https://fivetran.com/docs/connectors/connector-sdk/technical-reference#schema
https://fivetran.com/docs/connector-sdk/technical-reference/connector-sdk-code/connector-sdk-methods#schema

Copilot uses AI. Check for mistakes.
Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/README.md
Comment thread connectors/ibm_db2_log_based_replication/README.md Outdated
@fivetran-JenasVimal fivetran-JenasVimal self-assigned this Mar 23, 2026
@fivetran-JenasVimal fivetran-JenasVimal marked this pull request as ready for review March 23, 2026 18:54
@github-actions github-actions Bot added size/XL PR size: extra large and removed size/L PR size: Large labels Mar 23, 2026
@fivetran-JenasVimal fivetran-JenasVimal force-pushed the ibm_db2_log_based_replication branch from 36e7a8e to 704f3d3 Compare March 23, 2026 18:59
@github-actions github-actions Bot added size/L PR size: Large and removed size/XL PR size: extra large labels Mar 23, 2026
@fivetran-JenasVimal fivetran-JenasVimal marked this pull request as draft March 23, 2026 18:59
fivetran-JenasVimal and others added 3 commits March 24, 2026 00:30
…d CDC connector

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… naming conventions

- Rename abbreviated variables: conn→connection, stmt→statement, sql→query,
  row→database_row, conn_str→connection_string, current_seq→current_commit_sequence
- Rename connect_to_db→connect_to_database, standardize_row→normalize_row,
  get_current_log_marker→get_current_commit_sequence
- Replace LOGMARKER cursor with IBMSNAP_COMMITSEQ hex cursor for correctness
- Fix CD table reference: ASN.IBMSNAP_EMPCD→DB2INST1.CDEMPLOYEE
- State key renamed: last_log_marker→last_commit_sequence
- Add required checkpoint comment block before every op.checkpoint() call
- Add required upsert comment block before every op.upsert() call
- Add template-compliant module docstring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL PR size: extra large and removed size/L PR size: Large labels Mar 24, 2026
@github-actions github-actions Bot added size/L PR size: Large and removed size/XL PR size: extra large labels Mar 24, 2026
@fivetran-JenasVimal fivetran-JenasVimal marked this pull request as ready for review March 24, 2026 08:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/README.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions

Comment thread connectors/ibm_db2_log_based_replication/README.md Outdated
Comment thread connectors/ibm_db2_log_based_replication/README.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/README.md
Comment thread connectors/ibm_db2_log_based_replication/README.md
@github-actions github-actions Bot added size/XL PR size: extra large and removed size/L PR size: Large labels Mar 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Comment thread connectors/ibm_db2_log_based_replication/README.md
Comment thread connectors/ibm_db2_log_based_replication/README.md
Comment thread connectors/ibm_db2_log_based_replication/README.md Outdated
Comment thread connectors/ibm_db2_log_based_replication/connector.py Outdated
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/connector.py
Comment thread connectors/ibm_db2_log_based_replication/configuration.json Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fivetran-rishabhghosh fivetran-rishabhghosh removed the request for review from a team April 13, 2026 18:26
@fivetran-JenasVimal fivetran-JenasVimal merged commit a7b8234 into main Apr 14, 2026
4 checks passed
@fivetran-JenasVimal fivetran-JenasVimal deleted the ibm_db2_log_based_replication branch April 14, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants