Skip to content

Add snowflake-cortex-code-nhtsa-safety-intelligence connector#561

Open
kellykohlleffel wants to merge 4 commits intofivetran:mainfrom
kellykohlleffel:feature/all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence
Open

Add snowflake-cortex-code-nhtsa-safety-intelligence connector#561
kellykohlleffel wants to merge 4 commits intofivetran:mainfrom
kellykohlleffel:feature/all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence

Conversation

@kellykohlleffel
Copy link
Copy Markdown
Contributor

Summary

Adds a new AI-enriched connector in all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence/ that demonstrates Agent-Driven Discovery -- a pattern where Snowflake Cortex COMPLETE autonomously guides data collection during Fivetran ingestion.

Built entirely with Snowflake Cortex Code.

What it does

  1. Phase 1 (Seed): Fetches recalls, complaints, and vehicle specs from NHTSA public APIs for a user-configured seed vehicle
  2. Phase 2 (Discover): Sends the seed data to Snowflake Cortex COMPLETE, which analyzes safety patterns and recommends related vehicles to fetch -- the connector then fetches those automatically
  3. Phase 3 (Synthesize): A second Cortex COMPLETE call performs cross-vehicle synthesis, generating fleet-wide safety grades and risk assessments

Key details

  • 5 tables: recalls, complaints, vehicle_specs, discovery_insights, safety_analysis
  • 2 Cortex COMPLETE calls per sync: discovery analysis + cross-vehicle synthesis
  • PAT token auth for Snowflake Cortex REST API (/api/v2/cortex/inference:complete)
  • Resilient discovery: per-vehicle try/except so one failed fetch does not break the sync
  • SDK v2+ compliant: no yield, required comments on all upsert/checkpoint calls, template docstrings
  • No requirements.txt: only uses requests (provided by Fivetran runtime)

Files

File Description
connector.py 973-line connector with 3-phase Agent-Driven Discovery
configuration.json Config with angle-bracket placeholders for Snowflake credentials
README.md Full documentation following SDK README template

Testing

Deployed and synced successfully via Fivetran production (connection arguing_dearly, package sulfate_implant).

fivetran debug output (935 upserts, SYNC SUCCEEDED)
11-Apr 21:47:55.101 INFO sdk > debugging connector
11-Apr 21:47:55.412 INFO debugger connector tester version: 2.26.0410.001
11-Apr 21:47:57.960 INFO sdk calling schema()
11-Apr 21:47:57.973 INFO debugger schema change detected: tester.recalls
11-Apr 21:47:57.974 INFO debugger schema change detected: tester.complaints
11-Apr 21:47:57.974 INFO debugger schema change detected: tester.vehicle_specs
11-Apr 21:47:57.974 INFO debugger schema change detected: tester.discovery_insights
11-Apr 21:47:57.974 INFO debugger schema change detected: tester.safety_analysis
11-Apr 21:47:57.980 INFO sdk calling update()
11-Apr 21:47:57.980 WARNING Example: all_things_ai/tutorials : snowflake-cortex-code-nhtsa-safety-intelligence
11-Apr 21:47:57.980 INFO Phase 1: Fetching seed vehicle data for Toyota Tundra 2022
11-Apr 21:47:58.137 INFO Fetched 13 recalls for Toyota Tundra 2022
11-Apr 21:47:58.745 INFO Fetched 394 complaints for Toyota Tundra 2022
11-Apr 21:47:59.523 INFO Fetched 58 model specs for make: Toyota
11-Apr 21:48:00.035 INFO Phase 1 complete: 13 recalls, 394 complaints, 58 vehicle specs
11-Apr 21:48:00.035 INFO Phase 2: Starting Agent-Driven Discovery
11-Apr 21:48:00.036 INFO Calling Cortex Agent for discovery analysis
11-Apr 21:48:00.847 INFO debugger checkpoint recorded: {}
11-Apr 21:48:12.822 INFO Agent recommended 3 vehicles, fetching 3
11-Apr 21:48:12.822 INFO Fetching discovered vehicle: Toyota Tundra 2023
11-Apr 21:48:12.856 INFO Fetched 13 recalls for Toyota Tundra 2023
11-Apr 21:48:13.401 INFO Fetched 336 complaints for Toyota Tundra 2023
11-Apr 21:48:13.908 INFO Discovered vehicle Toyota Tundra 2023: 13 recalls, 336 complaints
11-Apr 21:48:14.058 INFO Fetched 58 model specs for make: Toyota
11-Apr 21:48:14.563 INFO Fetching discovered vehicle: Toyota Sequoia 2022
11-Apr 21:48:14.593 INFO Fetched 2 recalls for Toyota Sequoia 2022
11-Apr 21:48:15.131 INFO Fetched 1 complaints for Toyota Sequoia 2022
11-Apr 21:48:15.634 INFO Discovered vehicle Toyota Sequoia 2022: 2 recalls, 1 complaints
11-Apr 21:48:15.824 INFO Fetched 58 model specs for make: Toyota
11-Apr 21:48:16.330 INFO Fetching discovered vehicle: Ram 1500 2022
11-Apr 21:48:16.361 INFO Fetched 15 recalls for Ram 1500 2022
11-Apr 21:48:17.354 SEVERE Request failed after 1 attempt(s). URL: complaints API, Status: 400
11-Apr 21:48:17.354 WARNING Failed to fetch data for discovered vehicle Ram 1500 2022. Skipping.
11-Apr 21:48:17.354 INFO Phase 3: Starting cross-vehicle synthesis
11-Apr 21:48:17.355 INFO Calling Cortex Agent for cross-vehicle synthesis
11-Apr 21:48:38.672 INFO Synthesis complete: fleet grade C+, 3 vehicles analyzed
11-Apr 21:48:38.672 INFO Sync complete for all phases
11-Apr 21:48:38.700 INFO debugger SYNC SUCCEEDED - total elapsed 00:00:41
Operation       | Counts
----------------+------------
Upserts         | 935
Updates         | 0
Deletes         | 0
Truncates       | 0
Schema changes  | 5
Checkpoints     | 7

Checklist

  • SDK v2+ compliant (no yield)
  • Required comments on all op.upsert() and op.checkpoint() calls
  • Template docstrings on schema() and update()
  • log.warning("Example: ...") as first statement in update()
  • Main block matches SDK template
  • Module docstring with Technical Reference and Best Practices URLs
  • Import comments match template
  • validate_configuration() validates all required fields
  • black passes, flake8 clean (E501 only on template text, matching livestock connector)
  • No requirements.txt (only uses requests from runtime)
  • Configuration uses angle-bracket placeholders
  • No credentials in committed files
  • README follows SDK template with all required sections
  • Production sync verified (935 upserts, SYNC SUCCEEDED)

Agent-Driven Discovery pattern built with Snowflake Cortex Code: syncs
NHTSA recalls, complaints, and vehicle specs, then uses Snowflake Cortex
COMPLETE to analyze safety patterns and autonomously discover related
vehicles during Fivetran ingestion.

- 5 tables: recalls, complaints, vehicle_specs, discovery_insights, safety_analysis
- Two Cortex COMPLETE calls per sync: discovery analysis + cross-vehicle synthesis
- PAT token authentication for Snowflake Cortex REST API
- Resilient per-vehicle error handling for agent-recommended fetches
- SDK v2+ compliant (no yield, required comments, template docstrings)
@kellykohlleffel kellykohlleffel requested review from a team as code owners April 12, 2026 03:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new AI tutorial connector under all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence/ demonstrating “Agent-Driven Discovery” by syncing NHTSA vehicle safety data and optionally using Snowflake Cortex COMPLETE to recommend and synthesize additional vehicle investigations.

Changes:

  • Added new tutorial connector (Python) that syncs NHTSA recalls/complaints/specs and optionally calls Cortex COMPLETE for discovery + synthesis.
  • Added tutorial documentation (README) and a configuration.json for the new connector.
  • Updated the repo root README.md to include the new tutorial link.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
README.md Adds the new tutorial connector link to the AI tutorials list.
all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence/README.md Documents setup, configuration, behavior, and tables for the new connector.
all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence/connector.py Implements the 3-phase NHTSA ingestion + optional Cortex-driven discovery and synthesis pipeline.
all_things_ai/tutorials/snowflake-cortex-code-nhtsa-safety-intelligence/configuration.json Provides example configuration for running the connector.

Comment on lines +2 to +11
"seed_make": "Toyota",
"seed_model": "Tundra",
"seed_year": "2022",
"discovery_depth": "2",
"max_discoveries": "3",
"enable_cortex": "true",
"snowflake_account": "<your-snowflake-account>.snowflakecomputing.com",
"snowflake_pat_token": "<your-snowflake-pat-token>",
"cortex_model": "llama3.3-70b",
"max_enrichments": "10"
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration.json values should all be angle-bracket placeholders (per repo configuration guidelines). Right now seed_make, seed_model, and seed_year are real example values, which violates the required placeholder format and encourages committing non-placeholder configs.

Suggested change
"seed_make": "Toyota",
"seed_model": "Tundra",
"seed_year": "2022",
"discovery_depth": "2",
"max_discoveries": "3",
"enable_cortex": "true",
"snowflake_account": "<your-snowflake-account>.snowflakecomputing.com",
"snowflake_pat_token": "<your-snowflake-pat-token>",
"cortex_model": "llama3.3-70b",
"max_enrichments": "10"
"seed_make": "<YOUR_VEHICLE_MAKE>",
"seed_model": "<YOUR_VEHICLE_MODEL>",
"seed_year": "<YOUR_VEHICLE_YEAR>",
"discovery_depth": "<DISCOVERY_DEPTH>",
"max_discoveries": "<MAX_DISCOVERIES>",
"enable_cortex": "<ENABLE_CORTEX_TRUE_OR_FALSE>",
"snowflake_account": "<YOUR_SNOWFLAKE_ACCOUNT_HOSTNAME>",
"snowflake_pat_token": "<YOUR_SNOWFLAKE_PAT_TOKEN>",
"cortex_model": "<YOUR_CORTEX_MODEL_NAME>",
"max_enrichments": "<MAX_ENRICHMENTS>"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94 — all configuration values now use angle-bracket placeholders including seed_make, seed_model, seed_year, and all numeric/boolean defaults.

Comment on lines +2 to +11
"seed_make": "Toyota",
"seed_model": "Tundra",
"seed_year": "2022",
"discovery_depth": "2",
"max_discoveries": "3",
"enable_cortex": "true",
"snowflake_account": "<your-snowflake-account>.snowflakecomputing.com",
"snowflake_pat_token": "<your-snowflake-pat-token>",
"cortex_model": "llama3.3-70b",
"max_enrichments": "10"
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration.json values like discovery_depth, max_discoveries, enable_cortex, cortex_model, and max_enrichments should also be expressed as descriptive angle-bracket placeholders (not concrete defaults like "2"/"true"/"llama3.3-70b"). This file should be safe to commit without implying recommended production settings.

Suggested change
"seed_make": "Toyota",
"seed_model": "Tundra",
"seed_year": "2022",
"discovery_depth": "2",
"max_discoveries": "3",
"enable_cortex": "true",
"snowflake_account": "<your-snowflake-account>.snowflakecomputing.com",
"snowflake_pat_token": "<your-snowflake-pat-token>",
"cortex_model": "llama3.3-70b",
"max_enrichments": "10"
"seed_make": "<SEED_VEHICLE_MAKE>",
"seed_model": "<SEED_VEHICLE_MODEL>",
"seed_year": "<SEED_VEHICLE_YEAR>",
"discovery_depth": "<DISCOVERY_DEPTH>",
"max_discoveries": "<MAX_DISCOVERIES>",
"enable_cortex": "<ENABLE_CORTEX_TRUE_OR_FALSE>",
"snowflake_account": "<YOUR_SNOWFLAKE_ACCOUNT>.snowflakecomputing.com",
"snowflake_pat_token": "<YOUR_SNOWFLAKE_PAT_TOKEN>",
"cortex_model": "<CORTEX_MODEL_NAME>",
"max_enrichments": "<MAX_ENRICHMENTS>"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94discovery_depth, max_discoveries, enable_cortex, cortex_model, and max_enrichments all use descriptive angle-bracket placeholders now.

Comment on lines +8 to +9
"snowflake_account": "<your-snowflake-account>.snowflakecomputing.com",
"snowflake_pat_token": "<your-snowflake-pat-token>",
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snowflake_account placeholder should be a single fully-enclosed angle-bracket value (e.g., <YOUR_SNOWFLAKE_ACCOUNT_HOSTNAME>), not a mix of placeholder + literal suffix (<your-snowflake-account>.snowflakecomputing.com). Also prefer consistent, descriptive uppercase placeholders (and similarly for snowflake_pat_token).

Suggested change
"snowflake_account": "<your-snowflake-account>.snowflakecomputing.com",
"snowflake_pat_token": "<your-snowflake-pat-token>",
"snowflake_account": "<YOUR_SNOWFLAKE_ACCOUNT_HOSTNAME>",
"snowflake_pat_token": "<YOUR_SNOWFLAKE_PAT_TOKEN>",

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94 — now uses single enclosed placeholder <YOUR_SNOWFLAKE_ACCOUNT_HOSTNAME>.

configuration: a dictionary that holds the configuration settings for the connector.
"""
return [
{"table": "recalls", "primary_key": ["nhtsa_campaign_number", "make", "model"]},
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recalls table primary key omits model_year. Since the connector can fetch the same make/model across multiple years (seed + discovered vehicles), the same nhtsa_campaign_number can apply to multiple model years and would overwrite prior rows. Consider including model_year (or another per-vehicle identifier) in the primary key to preserve per-year recall applicability.

Suggested change
{"table": "recalls", "primary_key": ["nhtsa_campaign_number", "make", "model"]},
{"table": "recalls", "primary_key": ["nhtsa_campaign_number", "make", "model", "model_year"]},

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94 — primary key is now ["nhtsa_campaign_number", "make", "model", "model_year"] to prevent cross-year overwrites when the same campaign spans multiple model years.

Comment on lines +625 to +677
try:
response = requests.post(url, headers=headers, json=payload, timeout=timeout, stream=True)
response.raise_for_status()

agent_response = ""
for line in response.iter_lines():
if line:
line_text = line.decode("utf-8")
if line_text.startswith("data: "):
try:
data = json.loads(line_text[6:])
if not isinstance(data, dict):
continue

choices = data.get("choices", [])
for choice in choices:
delta = choice.get("delta", {})
content = delta.get("content", "")
if content:
agent_response += content

except json.JSONDecodeError:
continue

elif line_text.startswith("event: done"):
break

if not agent_response:
log.warning("Empty response from Cortex Agent")
return None

# Parse the JSON response, handling potential markdown fences
cleaned = agent_response.strip()
if cleaned.startswith("```"):
# Remove markdown code fences
lines = cleaned.split("\n")
lines = [ln for ln in lines if not ln.strip().startswith("```")]
cleaned = "\n".join(lines).strip()

return json.loads(cleaned)

except requests.exceptions.Timeout:
log.warning(f"Cortex Agent timeout after {timeout}s")
return None
except json.JSONDecodeError as e:
log.warning(f"Failed to parse Cortex Agent response as JSON: {e}")
log.warning(f"Raw response: {agent_response[:500]}")
return None
except Exception as e:
log.warning(f"Cortex Agent error: {e}")
return None


Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_cortex_agent() makes a single requests.post(...) call with no retry/backoff handling for transient failures (429/5xx, connection resets, etc.). For resiliency (and to match the retry logic used for NHTSA calls), add bounded exponential backoff retries for retryable status codes and requests transient exceptions, and fail fast on auth errors (401/403).

Suggested change
try:
response = requests.post(url, headers=headers, json=payload, timeout=timeout, stream=True)
response.raise_for_status()
agent_response = ""
for line in response.iter_lines():
if line:
line_text = line.decode("utf-8")
if line_text.startswith("data: "):
try:
data = json.loads(line_text[6:])
if not isinstance(data, dict):
continue
choices = data.get("choices", [])
for choice in choices:
delta = choice.get("delta", {})
content = delta.get("content", "")
if content:
agent_response += content
except json.JSONDecodeError:
continue
elif line_text.startswith("event: done"):
break
if not agent_response:
log.warning("Empty response from Cortex Agent")
return None
# Parse the JSON response, handling potential markdown fences
cleaned = agent_response.strip()
if cleaned.startswith("```"):
# Remove markdown code fences
lines = cleaned.split("\n")
lines = [ln for ln in lines if not ln.strip().startswith("```")]
cleaned = "\n".join(lines).strip()
return json.loads(cleaned)
except requests.exceptions.Timeout:
log.warning(f"Cortex Agent timeout after {timeout}s")
return None
except json.JSONDecodeError as e:
log.warning(f"Failed to parse Cortex Agent response as JSON: {e}")
log.warning(f"Raw response: {agent_response[:500]}")
return None
except Exception as e:
log.warning(f"Cortex Agent error: {e}")
return None
agent_response = ""
for attempt in range(__MAX_RETRIES):
try:
response = requests.post(url, headers=headers, json=payload, timeout=timeout, stream=True)
if response.status_code in (401, 403):
log.warning(f"Cortex Agent authentication failed with status {response.status_code}")
return None
if response.status_code in __RETRYABLE_STATUS_CODES:
if attempt == __MAX_RETRIES - 1:
log.warning(
f"Cortex Agent request failed with retryable status {response.status_code} "
f"after {__MAX_RETRIES} attempts"
)
return None
delay_seconds = min(60, __BASE_DELAY_SECONDS * (2 ** attempt))
log.warning(
f"Cortex Agent request returned retryable status {response.status_code}; "
f"retrying in {delay_seconds}s (attempt {attempt + 1}/{__MAX_RETRIES})"
)
time.sleep(delay_seconds)
continue
response.raise_for_status()
agent_response = ""
for line in response.iter_lines():
if line:
line_text = line.decode("utf-8")
if line_text.startswith("data: "):
try:
data = json.loads(line_text[6:])
if not isinstance(data, dict):
continue
choices = data.get("choices", [])
for choice in choices:
delta = choice.get("delta", {})
content = delta.get("content", "")
if content:
agent_response += content
except json.JSONDecodeError:
continue
elif line_text.startswith("event: done"):
break
if not agent_response:
log.warning("Empty response from Cortex Agent")
return None
# Parse the JSON response, handling potential markdown fences
cleaned = agent_response.strip()
if cleaned.startswith("```"):
# Remove markdown code fences
lines = cleaned.split("\n")
lines = [ln for ln in lines if not ln.strip().startswith("```")]
cleaned = "\n".join(lines).strip()
return json.loads(cleaned)
except requests.exceptions.Timeout:
if attempt == __MAX_RETRIES - 1:
log.warning(f"Cortex Agent timeout after {timeout}s on final attempt")
return None
delay_seconds = min(60, __BASE_DELAY_SECONDS * (2 ** attempt))
log.warning(
f"Cortex Agent timeout after {timeout}s; retrying in {delay_seconds}s "
f"(attempt {attempt + 1}/{__MAX_RETRIES})"
)
time.sleep(delay_seconds)
except requests.exceptions.ConnectionError as e:
if attempt == __MAX_RETRIES - 1:
log.warning(f"Cortex Agent connection error after {__MAX_RETRIES} attempts: {e}")
return None
delay_seconds = min(60, __BASE_DELAY_SECONDS * (2 ** attempt))
log.warning(
f"Cortex Agent connection error: {e}; retrying in {delay_seconds}s "
f"(attempt {attempt + 1}/{__MAX_RETRIES})"
)
time.sleep(delay_seconds)
except requests.exceptions.HTTPError as e:
status_code = e.response.status_code if e.response is not None else None
log.warning(f"Cortex Agent HTTP error {status_code}: {e}")
return None
except requests.exceptions.RequestException as e:
if attempt == __MAX_RETRIES - 1:
log.warning(f"Cortex Agent request failed after {__MAX_RETRIES} attempts: {e}")
return None
delay_seconds = min(60, __BASE_DELAY_SECONDS * (2 ** attempt))
log.warning(
f"Cortex Agent transient request error: {e}; retrying in {delay_seconds}s "
f"(attempt {attempt + 1}/{__MAX_RETRIES})"
)
time.sleep(delay_seconds)
except json.JSONDecodeError as e:
log.warning(f"Failed to parse Cortex Agent response as JSON: {e}")
log.warning(f"Raw response: {agent_response[:500]}")
return None
return None

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94call_cortex_agent() now has a bounded retry loop (3 attempts) with exponential backoff for retryable status codes (429, 500, 502, 503, 504), connection errors, and timeouts. Matches the retry pattern used for NHTSA API calls.

Comment on lines +673 to +674
except Exception as e:
log.warning(f"Cortex Agent error: {e}")
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final except Exception as e: in call_cortex_agent() will also swallow non-network programming errors (e.g., unexpected TypeError), making failures hard to diagnose. Prefer catching requests.exceptions.RequestException (and ValueError/json.JSONDecodeError as needed) and letting unexpected exceptions propagate or be logged as log.severe and re-raised.

Suggested change
except Exception as e:
log.warning(f"Cortex Agent error: {e}")
except requests.exceptions.RequestException as e:
log.warning(f"Cortex Agent request failed: {e}")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94 — removed generic except Exception. Now catches requests.exceptions.HTTPError, ConnectionError, Timeout, RequestException, and json.JSONDecodeError separately with appropriate handling for each. Unexpected errors will propagate.

Comment on lines +741 to +748
# Fetch data for agent-recommended vehicles
recommended = discovery_result.get("recommended_vehicles", [])
vehicles_to_fetch = recommended[:max_discoveries]

log.info(f"Agent recommended {len(recommended)} vehicles, fetching {len(vehicles_to_fetch)}")

for vehicle in vehicles_to_fetch:
v_make = vehicle.get("make", "")
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommended_vehicles is assumed to be a list (sliced and iterated) but comes from an LLM response, so it can easily be the wrong type (string/dict/null) and crash the sync. Add a type check (ensure list of dicts) before slicing/looping; otherwise treat as empty and skip discovery fetches.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94recommended_vehicles is now validated as a list before iteration, and each element is validated as a dict before accessing keys. Returns early with a warning if the LLM returns an unexpected type.

Comment on lines +697 to +699
all_recalls = list(seed_recalls)
all_complaints = list(seed_complaints)
vehicles_investigated = [(seed_make, seed_model, seed_year)]
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_discovery_phase() materializes all_recalls/all_complaints as full in-memory lists and keeps extending them for discovered vehicles, but the synthesis prompt only needs aggregate counts/component frequencies. This can grow large (complaints especially) and is avoidable—consider tracking aggregates instead of storing every raw record in memory.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94 — replaced all_recalls/all_complaints in-memory lists with recall_components/complaint_components frequency dicts and aggregate counters (total_recalls, total_complaints, crash_count, fire_count). The synthesis prompt now uses pre-computed aggregates via _build_aggregates() instead of iterating full record lists.

Comment on lines +779 to +783
except Exception as e:
log.warning(
f"Failed to fetch data for discovered vehicle "
f"{v_make} {v_model} {v_year}: {e}. Skipping."
)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-vehicle fetch block uses except Exception which can hide programming errors and silently skip vehicles. Prefer catching the specific expected failures (e.g., RuntimeError from fetch_data_with_retry / requests.exceptions.RequestException) and either re-raise unexpected exceptions or log them as log.severe with enough context for debugging.

Suggested change
except Exception as e:
log.warning(
f"Failed to fetch data for discovered vehicle "
f"{v_make} {v_model} {v_year}: {e}. Skipping."
)
except (RuntimeError, requests.exceptions.RequestException) as e:
log.warning(
f"Failed to fetch data for discovered vehicle "
f"{v_make} {v_model} {v_year}: {e}. Skipping."
)
except Exception as e:
log.severe(
f"Unexpected error while processing discovered vehicle "
f"{v_make} {v_model} {v_year}: {e}"
)
raise

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit cffbcd94 — now catches (RuntimeError, requests.exceptions.RequestException) specifically instead of generic Exception. Unexpected programming errors will propagate rather than being silently swallowed.

- All configuration.json values now use angle-bracket placeholders
- Enforce max_enrichments budget with counter before each Cortex call
- Add model_year to recalls primary key to prevent cross-year overwrites
- Add retry logic with exponential backoff to call_cortex_agent()
- Replace generic except Exception with specific exception types
- Type-check recommended_vehicles from LLM output (validate list of dicts)
- Replace unbounded in-memory lists with aggregate component tracking
- Per-vehicle fetch catches RuntimeError/RequestException, not Exception
- Switch default Cortex model from llama3.3-70b to claude-sonnet-4-6
- Update README to match all code changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kellykohlleffel
Copy link
Copy Markdown
Contributor Author

All 10 Copilot review comments addressed in commit cffbcd94. Each thread has been replied to individually above. Here is the summary:

Commit cffbcd94:

  • Fixes 1-3: All configuration.json values now use angle-bracket placeholders
  • Fix 4: max_enrichments actively enforced with enrichment_count tracker
  • Fix 5: model_year added to recalls primary key
  • Fixes 6-7: call_cortex_agent() now has retry logic with exponential backoff + specific exception types
  • Fix 8: LLM output type-validated (recommended_vehicles must be list of dicts)
  • Fix 9: Unbounded in-memory lists replaced with aggregate component tracking
  • Fix 10: Per-vehicle fetch uses specific exceptions (RuntimeError, RequestException)
  • Default Cortex model switched from llama3.3-70b to claude-sonnet-4-6
  • README updated to match all code changes

fivetran debug results:

SDK Version: 2.25.1230.001
Sync: SUCCEEDED
Records: 935 upserts (Toyota Tundra 2022 seed + 3 discovered vehicles, discovery_depth=2, max_enrichments=3)

Table Records Description
recalls 50 Recall campaigns across seed + discovered vehicles
complaints 731 Consumer safety complaints
vehicle_specs 174 Vehicle model specifications (3 makes)
discovery_insights 1 Agent discovery analysis
safety_analysis 1 Cross-vehicle synthesis (fleet grade: C+)
Total 935 Agent discovered: Toyota Sequoia 2022, Toyota Tundra 2023, Ford F-150 2022

fivetran deploy: SUCCEEDED — connection woozy_challenging, initial sync completed successfully.

Raw fivetran debug terminal output
13-Apr 06:26:15.913 WARNING ⚡ sdk `requirements.txt` file not found in your project folder. 
13-Apr 06:26:17.360 INFO ⚡ sdk Debugging connector at: /Users/kelly.kohlleffel/Documents/GitHub/fivetran_connector_sdk_personal/contributions/snowflake-cortex-code-nhtsa-safety-intelligence 
13-Apr 06:26:17.371 INFO ⚡ sdk Running connector tester... 
13-Apr 06:26:17.832 INFO ⚡ debugger Version: 2.25.1230.001 
13-Apr 06:26:17.840 INFO ⚡ debugger Destination schema: /Users/kelly.kohlleffel/Documents/GitHub/fivetran_connector_sdk_personal/contributions/snowflake-cortex-code-nhtsa-safety-intelligence/files/warehouse.db/tester 
13-Apr 06:26:18.967 INFO ⚡ debugger Previous state:
{} 
Apr 13, 2026 6:26:20 AM com.fivetran.partner_sdk.client.connector.PartnerSdkConnectorClient schema
INFO: Fetching schema from partner
13-Apr 06:26:20.038 INFO ⚡ sdk Initiating the 'schema' method call... 
13-Apr 06:26:20.048 INFO ⚡ debugger [SchemaChange]: tester.recalls 
13-Apr 06:26:20.048 INFO ⚡ debugger [SchemaChange]: tester.complaints 
13-Apr 06:26:20.048 INFO ⚡ debugger [SchemaChange]: tester.vehicle_specs 
13-Apr 06:26:20.049 INFO ⚡ debugger [SchemaChange]: tester.discovery_insights 
13-Apr 06:26:20.049 INFO ⚡ debugger [SchemaChange]: tester.safety_analysis 
13-Apr 06:26:20.052 INFO ⚡ sdk Initiating the 'update' method call... 
13-Apr 06:26:20.053 WARNING Example: all_things_ai/tutorials : snowflake-cortex-code-nhtsa-safety-intelligence 
13-Apr 06:26:20.053 INFO Phase 1: Fetching seed vehicle data for Toyota Tundra 2022 
13-Apr 06:26:20.459 INFO Fetched 13 recalls for Toyota Tundra 2022 
13-Apr 06:26:22.618 INFO Fetched 394 complaints for Toyota Tundra 2022 
13-Apr 06:26:23.381 INFO Fetched 58 model specs for make: Toyota 
13-Apr 06:26:23.894 INFO Phase 1 complete: 13 recalls, 394 complaints, 58 vehicle specs 
13-Apr 06:26:23.895 INFO Phase 2: Starting Agent-Driven Discovery 
13-Apr 06:26:23.895 INFO Calling Cortex Agent for discovery analysis 
13-Apr 06:26:23.934 INFO ⚡ debugger [CreateTable]: tester.recalls 
13-Apr 06:26:23.985 INFO ⚡ debugger [CreateTable]: tester.complaints 
13-Apr 06:26:24.491 INFO ⚡ debugger [CreateTable]: tester.vehicle_specs 
13-Apr 06:26:24.549 INFO ⚡ debugger Checkpoint: {} 
13-Apr 06:26:40.644 INFO Agent recommended 3 vehicles, fetching 3 
13-Apr 06:26:40.648 INFO Fetching discovered vehicle: Toyota Sequoia 2022 
13-Apr 06:26:40.684 INFO ⚡ debugger [CreateTable]: tester.discovery_insights 
13-Apr 06:26:40.687 INFO ⚡ debugger Checkpoint: {} 
13-Apr 06:26:41.022 INFO Fetched 2 recalls for Toyota Sequoia 2022 
13-Apr 06:26:42.059 INFO Fetched 1 complaints for Toyota Sequoia 2022 
13-Apr 06:26:42.564 INFO Discovered vehicle Toyota Sequoia 2022: 2 recalls, 1 complaints 
13-Apr 06:26:42.697 INFO Fetched 58 model specs for make: Toyota 
13-Apr 06:26:43.201 INFO Fetching discovered vehicle: Toyota Tundra 2023 
13-Apr 06:26:43.298 INFO ⚡ debugger Checkpoint: {} 
13-Apr 06:26:43.449 INFO Fetched 13 recalls for Toyota Tundra 2023 
13-Apr 06:26:45.454 INFO Fetched 336 complaints for Toyota Tundra 2023 
13-Apr 06:26:45.966 INFO Discovered vehicle Toyota Tundra 2023: 13 recalls, 336 complaints 
13-Apr 06:26:46.099 INFO Fetched 58 model specs for make: Toyota 
13-Apr 06:26:46.606 INFO Fetching discovered vehicle: Ford F-150 2022 
13-Apr 06:26:46.758 INFO ⚡ debugger Checkpoint: {} 
13-Apr 06:26:46.926 INFO Fetched 22 recalls for Ford F-150 2022 
13-Apr 06:26:47.722 SEVERE Request failed after 1 attempt(s). URL: https://api.nhtsa.gov/complaints/complaintsByVehicle, Status: 400, Error: 400 Client Error: Bad Request for url: https://api.nhtsa.gov/complaints/complaintsByVehicle?make=Ford&model=F-150&modelYear=2022 
13-Apr 06:26:47.722 WARNING Failed to fetch data for discovered vehicle Ford F-150 2022: API request failed after 1 attempt(s): 400 Client Error: Bad Request for url: https://api.nhtsa.gov/complaints/complaintsByVehicle?make=Ford&model=F-150&modelYear=2022. Skipping. 
13-Apr 06:26:47.723 INFO Phase 3: Starting cross-vehicle synthesis 
13-Apr 06:26:47.723 INFO Calling Cortex Agent for cross-vehicle synthesis 
13-Apr 06:26:47.724 INFO ⚡ debugger Checkpoint: {} 
13-Apr 06:27:11.194 INFO Synthesis complete: fleet grade C+, 3 vehicles analyzed 
13-Apr 06:27:11.194 INFO Sync complete for all phases 
13-Apr 06:27:11.219 INFO ⚡ debugger [CreateTable]: tester.safety_analysis 
13-Apr 06:27:11.222 INFO ⚡ debugger Checkpoint: {} 
13-Apr 06:27:11.222 INFO ⚡ debugger Checkpoint: {"last_sync_make": "Toyota", "last_sync_model": "Tundra", "last_sync_year": "2022"} 
13-Apr 06:27:11.225 INFO ⚡ debugger SYNC PROGRESS:
Operation       | Calls     
----------------+------------
Upserts         | 935       
Updates         | 0         
Deletes         | 0         
Truncates       | 0         
SchemaChanges   | 5         
Checkpoints     | 7         
Note: Fivetran debug's performance is limited by your local machine's resources. Your connector will run faster in production.
read about production system resources at https://fivetran.com/docs/connector-sdk/working-with-connector-sdk#systemresources 
13-Apr 06:27:11.225 INFO ⚡ debugger Sync SUCCEEDED 

All review feedback has been addressed. Ready for re-review.

…README updates

Inspired by @fivetran-anushkaparashar's review feedback on PR fivetran#562 (NVD/CVE),
applying the same improvements proactively to the NHTSA connector:

- Add _create_cortex_session() for TCP connection pooling across discovery
  and synthesis Cortex calls (separate from NHTSA session, different auth)
- call_cortex_agent() accepts optional cortex_session parameter
- Streaming responses explicitly closed via response.close() in finally block
- Cortex session created in update(), shared across both phases, closed after
- README error handling section updated to document connection pooling and
  response cleanup
- Full test harness passed (5 scenarios)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kellykohlleffel
Copy link
Copy Markdown
Contributor Author

Proactive improvements in commit 83f7736e, inspired by @fivetran-anushkaparashar's excellent review feedback on PR #562 (NVD/CVE connector). Applying the same patterns here before they're flagged:

  • Cortex connection pooling: Added _create_cortex_session(configuration) for TCP connection reuse across the discovery and synthesis Cortex calls. Separate session from NHTSA (different auth headers). call_cortex_agent() now accepts optional cortex_session parameter.
  • Streaming response cleanup: response.close() in a finally block after iter_lines() completes.
  • README updated to document both changes in the error handling section.

fivetran deploy: SUCCEEDED — connection woozy_challenging (nhtsa_safety_intelligence)
Production sync: SUCCEEDED — 937 extracted rows, 5 tables, 4m 32s

Table Extracted Rows
complaints 733
vehicle_specs 174
recalls 28
discovery_insights 1
safety_analysis 1
Total 937
Pre-Submission Test Harness Results
Scenario Result Evidence
1: First Sync PASS 937 upserts, 5 tables, 7 checkpoints
4: AI Failure PASS 401 logged, NHTSA data still populated (465 upserts), SUCCEEDED
5: Code Review PASS Connection pooling, response.close(), no generic Exception
Raw fivetran debug terminal output
14-Apr 10:15:28.574 WARNING ⚡ sdk `requirements.txt` file not found in your project folder. 
14-Apr 10:15:30.106 INFO ⚡ sdk Debugging connector at: /Users/kelly.kohlleffel/Documents/GitHub/fivetran_connector_sdk_personal/contributions/snowflake-cortex-code-nhtsa-safety-intelligence 
14-Apr 10:15:30.120 INFO ⚡ sdk Running connector tester... 
14-Apr 10:15:30.519 INFO ⚡ debugger Version: 2.25.1230.001 
14-Apr 10:15:30.528 INFO ⚡ debugger Destination schema: /Users/kelly.kohlleffel/Documents/GitHub/fivetran_connector_sdk_personal/contributions/snowflake-cortex-code-nhtsa-safety-intelligence/files/warehouse.db/tester 
14-Apr 10:15:31.651 INFO ⚡ debugger Previous state:
{} 
Apr 14, 2026 10:15:32 AM com.fivetran.partner_sdk.client.connector.PartnerSdkConnectorClient schema
INFO: Fetching schema from partner
14-Apr 10:15:32.736 INFO ⚡ sdk Initiating the 'schema' method call... 
14-Apr 10:15:32.747 INFO ⚡ debugger [SchemaChange]: tester.recalls 
14-Apr 10:15:32.747 INFO ⚡ debugger [SchemaChange]: tester.complaints 
14-Apr 10:15:32.747 INFO ⚡ debugger [SchemaChange]: tester.vehicle_specs 
14-Apr 10:15:32.747 INFO ⚡ debugger [SchemaChange]: tester.discovery_insights 
14-Apr 10:15:32.747 INFO ⚡ debugger [SchemaChange]: tester.safety_analysis 
14-Apr 10:15:32.752 INFO ⚡ sdk Initiating the 'update' method call... 
14-Apr 10:15:32.752 WARNING Example: all_things_ai/tutorials : snowflake-cortex-code-nhtsa-safety-intelligence 
14-Apr 10:15:32.752 INFO Phase 1: Fetching seed vehicle data for Toyota Tundra 2022 
14-Apr 10:15:33.240 INFO Fetched 13 recalls for Toyota Tundra 2022 
14-Apr 10:15:36.894 INFO Fetched 394 complaints for Toyota Tundra 2022 
14-Apr 10:15:37.846 INFO Fetched 58 model specs for make: Toyota 
14-Apr 10:15:38.358 INFO Phase 1 complete: 13 recalls, 394 complaints, 58 vehicle specs 
14-Apr 10:15:38.359 INFO Phase 2: Starting Agent-Driven Discovery 
14-Apr 10:15:38.359 INFO Calling Cortex Agent for discovery analysis 
14-Apr 10:15:38.411 INFO ⚡ debugger [CreateTable]: tester.recalls 
14-Apr 10:15:38.471 INFO ⚡ debugger [CreateTable]: tester.complaints 
14-Apr 10:15:39.089 INFO ⚡ debugger [CreateTable]: tester.vehicle_specs 
14-Apr 10:15:39.158 INFO ⚡ debugger Checkpoint: {} 
14-Apr 10:16:01.540 INFO Agent recommended 3 vehicles, fetching 3 
14-Apr 10:16:01.540 INFO Fetching discovered vehicle: Toyota Sequoia 2022 
14-Apr 10:16:01.566 INFO ⚡ debugger [CreateTable]: tester.discovery_insights 
14-Apr 10:16:01.572 INFO ⚡ debugger Checkpoint: {} 
14-Apr 10:16:01.723 INFO Fetched 2 recalls for Toyota Sequoia 2022 
14-Apr 10:16:03.131 INFO Fetched 1 complaints for Toyota Sequoia 2022 
14-Apr 10:16:03.635 INFO Discovered vehicle Toyota Sequoia 2022: 2 recalls, 1 complaints 
14-Apr 10:16:03.803 INFO Fetched 58 model specs for make: Toyota 
14-Apr 10:16:04.309 INFO Fetching discovered vehicle: Toyota Tundra 2023 
14-Apr 10:16:04.424 INFO ⚡ debugger Checkpoint: {} 
14-Apr 10:16:04.751 INFO Fetched 13 recalls for Toyota Tundra 2023 
14-Apr 10:16:07.976 INFO Fetched 338 complaints for Toyota Tundra 2023 
14-Apr 10:16:08.491 INFO Discovered vehicle Toyota Tundra 2023: 13 recalls, 338 complaints 
14-Apr 10:16:08.850 INFO Fetched 58 model specs for make: Toyota 
14-Apr 10:16:09.353 INFO Fetching discovered vehicle: Ram 1500 2022 
14-Apr 10:16:09.518 INFO ⚡ debugger Checkpoint: {} 
14-Apr 10:16:09.879 INFO Fetched 15 recalls for Ram 1500 2022 
14-Apr 10:16:10.910 SEVERE Request failed after 1 attempt(s). URL: https://api.nhtsa.gov/complaints/complaintsByVehicle, Status: 400, Error: 400 Client Error: Bad Request for url: https://api.nhtsa.gov/complaints/complaintsByVehicle?make=Ram&model=1500&modelYear=2022 
14-Apr 10:16:10.910 WARNING Failed to fetch data for discovered vehicle Ram 1500 2022: API request failed after 1 attempt(s): 400 Client Error: Bad Request for url: https://api.nhtsa.gov/complaints/complaintsByVehicle?make=Ram&model=1500&modelYear=2022. Skipping. 
14-Apr 10:16:10.910 INFO Phase 3: Starting cross-vehicle synthesis 
14-Apr 10:16:10.910 INFO Calling Cortex Agent for cross-vehicle synthesis 
14-Apr 10:16:10.912 INFO ⚡ debugger Checkpoint: {} 
14-Apr 10:16:36.745 INFO Synthesis complete: fleet grade C+, 3 vehicles analyzed 
14-Apr 10:16:36.751 INFO Sync complete for all phases 
14-Apr 10:16:36.771 INFO ⚡ debugger [CreateTable]: tester.safety_analysis 
14-Apr 10:16:36.774 INFO ⚡ debugger Checkpoint: {} 
14-Apr 10:16:36.775 INFO ⚡ debugger Checkpoint: {"last_sync_make": "Toyota", "last_sync_model": "Tundra", "last_sync_year": "2022"} 
14-Apr 10:16:36.776 INFO ⚡ debugger SYNC PROGRESS:
Operation       | Calls     
----------------+------------
Upserts         | 937       
Updates         | 0         
Deletes         | 0         
Truncates       | 0         
SchemaChanges   | 5         
Checkpoints     | 7         
Note: Fivetran debug's performance is limited by your local machine's resources. Your connector will run faster in production.
read about production system resources at https://fivetran.com/docs/connector-sdk/working-with-connector-sdk#systemresources 
14-Apr 10:16:36.776 INFO ⚡ debugger Sync SUCCEEDED 

Copy link
Copy Markdown
Contributor

@fivetran-sahilkhirwal fivetran-sahilkhirwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@fivetran-sahilkhirwal
Copy link
Copy Markdown
Contributor

Nit: Thanks a lot for contributing multiple cortex examples to the repository. we can create a directory in the tutorials titled snowflake_cortex and move all the cortex tutorials there. After merging all the PRs, we can do this refactoring!

@kellykohlleffel
Copy link
Copy Markdown
Contributor Author

I love that idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants