Add Snowflake Cortex Hacker News connector to AI tutorials by kellykohlleffel · Pull Request #555 · fivetran/fivetran_connector_sdk

kellykohlleffel · 2026-03-31T14:36:17Z

Summary

Adds a new connector example to all_things_ai/tutorials/snowflake-cortex-hacker-news/
Syncs top stories from the free Hacker News API and enriches them with AI-powered sentiment analysis and topic classification via Snowflake Cortex REST API during ingestion
Demonstrates configurable model selection (Claude, Mistral, Llama), cost-controlled enrichment limits, and incremental sync with batch checkpointing
Follows all SDK v2+ patterns: direct op.upsert()/op.checkpoint() calls, validate_configuration(), standard boilerplate comments, and official README template
Updates main README.md with new entry in the AI tutorials section

Test plan

fivetran debug --configuration configuration.json runs successfully (see results below)
fivetran deploy completed and data verified in Snowflake destination
black --check --line-length 99 connector.py passes
flake8 passes with official repo .flake8 config
All credentials scrubbed from configuration.json (angle-bracket placeholders)
No requirements.txt file (not needed per oura-ring PR review feedback)
README follows official template with all required sections

Test results

SDK Version: 2.25.1230.001
Sync: SUCCEEDED
Records: 5 upserts (5 stories synced, 3 enriched with Cortex using claude-sonnet-4-6, max_enrichments=3)

Table	Records	Enriched
stories_enriched	5	3

Raw fivetran debug terminal output

31-Mar 09:34:12.029 WARNING sdk `requirements.txt` file not found in your project folder.
31-Mar 09:34:13.409 INFO sdk Debugging connector at: /Users/kelly.kohlleffel/Documents/GitHub/fivetran_connector_sdk_personal/examples/quick_start_examples/hacker_news_plus_snowflake_cortex_v2
31-Mar 09:34:13.420 INFO sdk Running connector tester...
31-Mar 09:34:13.780 INFO debugger Version: 2.25.1230.001
31-Mar 09:34:13.790 INFO debugger Destination schema: .../files/warehouse.db/tester
31-Mar 09:34:15.147 INFO debugger Previous state:
{}
Mar 31, 2026 9:34:16 AM com.fivetran.partner_sdk.client.connector.PartnerSdkConnectorClient schema
INFO: Fetching schema from partner
31-Mar 09:34:16.241 INFO sdk Initiating the 'schema' method call...
31-Mar 09:34:16.251 INFO debugger [SchemaChange]: tester.stories_enriched
31-Mar 09:34:16.255 INFO sdk Initiating the 'update' method call...
31-Mar 09:34:16.256 WARNING Example: all_things_ai/tutorials : snowflake-cortex-hacker-news
31-Mar 09:34:16.256 INFO Cortex enrichment ENABLED: model=claude-sonnet-4-6, max_enrichments=3
31-Mar 09:34:16.256 INFO Last synced story ID: 0
31-Mar 09:34:16.256 INFO Fetching top stories list from Hacker News
31-Mar 09:34:16.444 INFO Retrieved 500 story IDs from HN API
31-Mar 09:34:16.444 INFO Syncing 5 new stories (filtered from 500 new)
31-Mar 09:34:16.444 INFO Processing batch 1/1 (5 stories)
31-Mar 09:34:44.538 INFO Checkpointed at story ID 47586614
31-Mar 09:34:44.538 INFO Sync complete: 5 stories synced, 3 enriched with Cortex
31-Mar 09:34:44.591 INFO debugger [CreateTable]: tester.stories_enriched
31-Mar 09:34:44.616 INFO debugger Checkpoint: {"last_synced_id": 47586614}
31-Mar 09:34:44.617 INFO debugger SYNC PROGRESS:
Operation       | Calls
----------------+------------
Upserts         | 5
Updates         | 0
Deletes         | 0
Truncates       | 0
SchemaChanges   | 1
Checkpoints     | 1
31-Mar 09:34:44.617 INFO debugger Sync SUCCEEDED

Generated with Claude Code

Adds a new connector example that syncs top stories from the Hacker News API and enriches them with AI-powered sentiment analysis and topic classification using the Snowflake Cortex REST API during ingestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new AI tutorial connector example that syncs Hacker News top stories and optionally enriches them with sentiment + topic classification via Snowflake Cortex during ingestion, plus links it from the repo’s main README.

Changes:

Added a new tutorial connector (snowflake-cortex-hacker-news) with incremental sync, batch processing, and optional Cortex enrichment.
Added tutorial documentation (setup, configuration, cost notes, behavior overview).
Updated the root README.md to include the new tutorial entry.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
README.md	Adds a link/description for the new Snowflake Cortex + Hacker News tutorial.
all_things_ai/tutorials/snowflake-cortex-hacker-news/README.md	New tutorial README describing features, configuration, and behavior.
all_things_ai/tutorials/snowflake-cortex-hacker-news/connector.py	New SDK v2+ connector implementation: HN fetch + optional Cortex enrichment + batching/checkpointing.
all_things_ai/tutorials/snowflake-cortex-hacker-news/configuration.json	New example configuration file for running the tutorial.

Copilot · 2026-04-01T09:49:38Z

+- `cortex_timeout` (optional): Timeout in seconds for Cortex API calls, defaults to 30
+- `max_enrichments` (optional): Maximum number of AI enrichments per sync for cost control, defaults to 50
+
+Note: Ensure that the `configuration.json` file is not checked into version control to protect sensitive information.


The README note says to not check configuration.json into version control, but this example directory includes a committed configuration.json. Consider aligning with the repo README template wording: committed examples use placeholder values, and production repositories should not commit real credentials/config.

Suggested change

Note: Ensure that the `configuration.json` file is not checked into version control to protect sensitive information.

Note: This example repository includes a `configuration.json` file checked into version control, but it contains only placeholder values and no real credentials. In your own production repositories, do not commit `configuration.json` files containing real secrets or environment-specific configuration; instead, use environment variables or a secrets manager to store sensitive values.

Fixed in commit 075de23f — updated the README note to clarify that the committed configuration.json contains only placeholder values, and that production repositories should not commit real credentials.

Copilot · 2026-04-01T09:49:38Z

+            if enrichment.get("cortex_sentiment") is not None:
+                enriched_count += 1


enriched_count is used to enforce max_enrichments, but it only increments when cortex_sentiment is not None. If sentiment parsing fails (or only classification succeeds), the connector will continue making Cortex calls beyond the intended limit, defeating cost control. Increment the counter based on enrichment attempts per story (or on success of either enrichment), and ideally count each Cortex call if that’s what you’re trying to cap.

Suggested change

if enrichment.get("cortex_sentiment") is not None:

enriched_count += 1

# Count each enrichment attempt toward the max_enrichments limit,

# regardless of which specific enrichment fields are present.

enriched_count += 1

Fixed in commit 075de23f — enriched_count now increments on each enrichment attempt regardless of which specific enrichment fields succeed. This ensures the max_enrichments cost control limit is enforced based on Cortex API calls made, not just successful sentiment parses.

Also updated the README Data handling section in commit bb7b654b to document that max_enrichments counts each enrichment attempt regardless of which specific fields succeed.

Copilot · 2026-04-01T09:49:38Z

+    synced_count = 0
+    highest_synced_id = state.get("last_synced_id", 0)
+
+    for story_id in story_ids:
+        story_data = fetch_story(session, story_id)
+        if not story_data:
+            continue
+
+        # Enrich with Cortex if enabled and within enrichment limit
+        title = story_data.get("title", "")
+        if is_cortex_enabled and title and enriched_count < max_enrichments:
+            enrichment = enrich_story(session, configuration, title)
+            story_data.update(enrichment)
+            if enrichment.get("cortex_sentiment") is not None:
+                enriched_count += 1
+
+        # Flatten nested data structures for Fivetran compatibility
+        flattened = flatten_dict(story_data)
+
+        # The 'upsert' operation is used to insert or update data in the destination table.
+        # The first argument is the name of the destination table.
+        # The second argument is a dictionary containing the record to be upserted.
+        op.upsert(table="stories_enriched", data=flattened)
+
+        synced_count += 1
+        highest_synced_id = max(highest_synced_id, story_id)
+


State advancement can skip stories permanently: highest_synced_id only tracks the max ID among successfully upserted stories, and update() uses it as a "cursor" (sid > last_synced_id). If a lower-ID story in the batch fails to fetch/upsert but a higher-ID story succeeds, the state jumps past the failed ID and that story will never be retried. To avoid data loss, process IDs in ascending order and only advance state to the last contiguously processed ID (or don’t advance past failures).

Fixed in commit 075de23f — story IDs are now sorted ascending before processing (sorted([sid for sid in all_story_ids if sid > last_synced_id])). This ensures contiguous state advancement so that if a lower-ID story fails to fetch, the cursor does not skip past it to a higher-ID success.

Also updated the README Data handling section in commit bb7b654b to document the ascending sort order and contiguous state advancement behavior.

@kellykohlleffel The ascending sort only helps if combined with stopping at the first failure — i.e., break out of the loop when fetch_story returns None.
That way highest_synced_id never advances past failing lower story_id, and it gets retried next sync.

@fivetran-anushkaparashar Great catch — you're absolutely right. Fixed in commit cff5845e.

The fix distinguishes between fetch failures (which now halt the batch) and legitimate skips (which still advance the cursor):

fetch_story() changes:

Fetch failures now propagate RuntimeError from fetch_data_with_retry instead of being swallowed

Legitimate skips (deleted items, non-story types like comments/jobs/polls) still return None since the cursor may safely advance past them — they will never become valid stories

process_batch() changes:

Catches RuntimeError, logs the halt, and returns immediately so highest_synced_id never advances past the failed story_id. The failed story is then retried on the next sync.

On None returns (legitimate skips), highest_synced_id is still advanced via max() so the cursor can move past deleted items.

Why distinguish: A simple break on any None would halt the sync on every comment/job/poll item, potentially leaving the connector permanently stuck if the next-up story_id happens to be a non-story type. The distinction ensures the cursor advances through legitimate skips while halting only on actual fetch failures.

Also updated the README Data handling section to document the halt-on-failure semantics.

`fivetran debug` results (after fix)

Sync: SUCCEEDED
Records: 5 upserts (5 stories, 3 enriched with Cortex)
Checkpoint: `{"last_synced_id": 47607468}`

Table Records Checkpoints

stories_enriched 5 1

Raw fivetran debug terminal output

```
07-Apr 11:33:34.156 INFO ⚡ sdk Debugging connector at: hacker_news_plus_snowflake_cortex_v2
07-Apr 11:33:34.164 INFO ⚡ sdk Running connector tester...
07-Apr 11:33:34.676 INFO ⚡ debugger Version: 2.25.1230.001
07-Apr 11:33:35.746 INFO ⚡ debugger Previous state: {}
07-Apr 11:33:36.834 INFO ⚡ sdk Initiating the 'schema' method call...
07-Apr 11:33:36.843 INFO ⚡ debugger [SchemaChange]: tester.stories_enriched
07-Apr 11:33:36.847 INFO ⚡ sdk Initiating the 'update' method call...
07-Apr 11:33:36.847 WARNING Example: all_things_ai/tutorials : snowflake-cortex-hacker-news
07-Apr 11:33:36.847 INFO Cortex enrichment ENABLED: model=claude-sonnet-4-6, max_enrichments=3
07-Apr 11:33:36.847 INFO Last synced story ID: 0
07-Apr 11:33:36.847 INFO Fetching top stories list from Hacker News
07-Apr 11:33:37.064 INFO Retrieved 500 story IDs from HN API
07-Apr 11:33:37.065 INFO Syncing 5 new stories (filtered from 500 new)
07-Apr 11:33:37.065 INFO Processing batch 1/1 (5 stories)
07-Apr 11:33:56.260 INFO Checkpointed at story ID 47607468
07-Apr 11:33:56.261 INFO Sync complete: 5 stories synced, 3 enriched with Cortex
07-Apr 11:33:56.315 INFO ⚡ debugger [CreateTable]: tester.stories_enriched
07-Apr 11:33:56.340 INFO ⚡ debugger Checkpoint: {"last_synced_id": 47607468}
07-Apr 11:33:56.340 INFO ⚡ debugger SYNC PROGRESS:
Operation | Calls
----------------+------------
Upserts | 5
Updates | 0
Deletes | 0
Truncates | 0
SchemaChanges | 1
Checkpoints | 1
07-Apr 11:33:56.340 INFO ⚡ debugger Sync SUCCEEDED
```

…ctor - Fix enriched_count to increment on attempt, not only on sentiment success - Sort story IDs ascending for contiguous state advancement - Update README configuration note for clarity on placeholder vs production usage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kellykohlleffel · 2026-04-01T12:32:30Z

All Copilot review threads have been replied to individually. Here is the summary:

Changes in commit `075de23f`

Accepted (3 items):

enriched_count on attempt — Counter now increments on each enrichment attempt, not only when sentiment parsing succeeds. This ensures max_enrichments cost control is enforced based on Cortex API calls made.
Contiguous state advancement — Story IDs are now sorted ascending before processing. This prevents the cursor from skipping past failed lower-ID stories when a higher-ID story succeeds.
README note wording — Updated the configuration.json note to clarify that the committed file contains only placeholder values, and production repos should not commit real credentials.

Pushed back with precedent (2 items):

configuration.json all-placeholders — The approved snowflake-cortex-livestock-weather-intelligence connector in this same directory uses literal defaults for non-secret values ("true", "60", "5") alongside angle-bracket placeholders for secrets. Our connector follows the same convention. Happy to change if human reviewers prefer otherwise.
README config example — Same reasoning as above.

Test results

SDK Version: 2.25.1230.001
Sync: SUCCEEDED
Records: 5 upserts (5 stories, 3 enriched with Cortex via claude-sonnet-4-6)

Table	Records
stories_enriched	5

Raw fivetran debug terminal output

01-Apr 07:28:58.107 WARNING sdk requirements.txt file not found in your project folder.
01-Apr 07:28:59.365 INFO sdk Debugging connector at: /Users/kelly.kohlleffel/Documents/GitHub/fivetran_connector_sdk/all_things_ai/tutorials/snowflake-cortex-hacker-news
01-Apr 07:28:59.374 INFO sdk Running connector tester...
01-Apr 07:29:00.943 INFO debugger Version: 2.25.1230.001
01-Apr 07:29:01.916 INFO debugger Previous state: {}
01-Apr 07:29:02.998 INFO sdk Initiating the 'schema' method call...
01-Apr 07:29:03.007 INFO debugger [SchemaChange]: tester.stories_enriched
01-Apr 07:29:03.011 INFO sdk Initiating the 'update' method call...
01-Apr 07:29:03.011 WARNING Example: all_things_ai/tutorials : snowflake-cortex-hacker-news
01-Apr 07:29:03.011 INFO Cortex enrichment ENABLED: model=claude-sonnet-4-6, max_enrichments=3
01-Apr 07:29:03.011 INFO Last synced story ID: 0
01-Apr 07:29:03.011 INFO Fetching top stories list from Hacker News
01-Apr 07:29:03.181 INFO Retrieved 500 story IDs from HN API
01-Apr 07:29:03.182 INFO Syncing 5 new stories (filtered from 500 new)
01-Apr 07:29:03.182 INFO Processing batch 1/1 (5 stories)
01-Apr 07:29:32.104 INFO Checkpointed at story ID 47540833
01-Apr 07:29:32.104 INFO Sync complete: 5 stories synced, 3 enriched with Cortex
01-Apr 07:29:32.181 INFO debugger [CreateTable]: tester.stories_enriched
01-Apr 07:29:32.209 INFO debugger Checkpoint: {"last_synced_id": 47540833}
01-Apr 07:29:32.210 INFO debugger SYNC PROGRESS:
Operation       | Calls
----------------+------------
Upserts         | 5
Updates         | 0
Deletes         | 0
Truncates       | 0
SchemaChanges   | 1
Checkpoints     | 1
01-Apr 07:29:32.210 INFO debugger Sync SUCCEEDED

All review feedback has been addressed. Ready for re-review.

- Document ascending sort of story IDs for contiguous state advancement - Clarify that max_enrichments counts attempts, not just successes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fivetran-anushkaparashar

I have a question regarding highest_synced_id = max(highest_synced_id, story_id). Please check!

@fivetran-anushkaparashar

…iled lower-ID stories Address PR fivetran#555 review feedback from @fivetran-anushkaparashar: ascending sort order alone was not sufficient to guarantee contiguous state advancement. The loop must also halt on fetch failure so highest_synced_id never advances past a failed lower story_id. Changes: - fetch_story now raises RuntimeError on fetch failure (previously returned None for both failures and legitimate skips, which made them indistinguishable) - fetch_story still returns None for legitimate skips (deleted items, non-story item types like comments/jobs/polls) since the cursor may safely advance past them - they will never become valid stories - process_batch now catches RuntimeError, logs the failure, and returns immediately without advancing state past the failed story_id - process_batch advances highest_synced_id on legitimate None skips so the cursor can move past deleted items and non-story types - README Data handling section updated to document the halt-on-failure semantics and the distinction between failures and legitimate skips Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kellykohlleffel · 2026-04-08T12:51:28Z

@fivetran-anushkaparashar let me know if my update took care of the concern on highest_synced_id = max(highest_synced_id, story_id) - many thanks!

fivetran-anushkaparashar

LGTM!

kellykohlleffel · 2026-04-10T13:17:47Z

@fivetran/tech-writers Friendly ping — this PR has been approved by @fivetran-anushkaparashar and is waiting on your review. Happy to address any feedback. Thanks!

kellykohlleffel requested review from a team as code owners March 31, 2026 14:36

github-actions Bot added size/XL PR size: extra large ai-assisted/unknown risk/unknown labels Mar 31, 2026

fivetran-sahilkhirwal assigned kellykohlleffel Apr 1, 2026

fivetran-sahilkhirwal requested a review from Copilot April 1, 2026 09:45

fivetran-sahilkhirwal added risk/low and removed risk/unknown labels Apr 1, 2026

Copilot AI reviewed Apr 1, 2026

View reviewed changes

fivetran-sahilkhirwal requested a review from fivetran-anushkaparashar April 1, 2026 09:53

Update README data handling section to reflect code fixes

bb7b654

- Document ascending sort of story IDs for contiguous state advancement - Clarify that max_enrichments counts attempts, not just successes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fivetran-anushkaparashar reviewed Apr 6, 2026

View reviewed changes

fivetran-sahilkhirwal requested a review from fivetran-anushkaparashar April 8, 2026 13:33

fivetran-anushkaparashar approved these changes Apr 9, 2026

View reviewed changes

fivetran-sahilkhirwal requested a review from fivetran-chinmayichandrasekar April 13, 2026 09:40

kellykohlleffel mentioned this pull request Apr 18, 2026

Add Snowflake Cortex Clinical Trial Intelligence connector to AI tutorials #566

Open

9 tasks

	Note: Ensure that the `configuration.json` file is not checked into version control to protect sensitive information.
	Note: This example repository includes a `configuration.json` file checked into version control, but it contains only placeholder values and no real credentials. In your own production repositories, do not commit `configuration.json` files containing real secrets or environment-specific configuration; instead, use environment variables or a secrets manager to store sensitive values.

		if enrichment.get("cortex_sentiment") is not None:
		enriched_count += 1

Conversation

kellykohlleffel commented Mar 31, 2026

Summary

Test plan

Test results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

fivetran-anushkaparashar Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel Apr 7, 2026

Choose a reason for hiding this comment

`fivetran debug` results (after fix)

Uh oh!

kellykohlleffel commented Apr 1, 2026

Changes in commit 075de23f

Test results

Uh oh!

fivetran-anushkaparashar left a comment

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel commented Apr 8, 2026

Uh oh!

fivetran-anushkaparashar left a comment

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Changes in commit `075de23f`