Skip to content

feat: LF-native Iceberg catalog flow (v1.3.0)#59

Merged
drernie merged 10 commits intomainfrom
48-lf-native
Mar 19, 2026
Merged

feat: LF-native Iceberg catalog flow (v1.3.0)#59
drernie merged 10 commits intomainfrom
48-lf-native

Conversation

@drernie
Copy link
Copy Markdown
Member

@drernie drernie commented Mar 19, 2026

Summary

  • Adds end-to-end Lake Formation–native Iceberg catalog support: DataZone can now import Glue tables registered under Lake Formation
  • New seed_glue_tables.py script provisions Glue databases, registers S3 locations with LF, creates Iceberg tables, and drives the DataZone Glue import flow
  • Terraform grants the DataZone Glue import IAM role the necessary Lake Formation permissions
  • DATAZONE_PROJECTS is now declared in Terraform and fed back via tf-outputs.json, eliminating post-domain-recreation drift
  • Seed scripts derive project names from seed-config.yaml (no more hardcoded names)

Test plan

  • ./poe deploy completes cleanly (Terraform plan shows only expected LF/Glue additions)
  • python scripts/seed_glue_tables.py seeds databases and tables without error
  • DataZone Glue import completes and assets appear in the DataZone catalog
  • python scripts/seed_users.py and seed_packages.py run without referencing hardcoded project names
  • ./poe test-unit passes

🤖 Generated with Claude Code

drernie and others added 10 commits March 18, 2026 11:28
…bles

- Replace all hardcoded "owner"/"users"/"guests" strings with dynamic
  seed_config.py lookups throughout scripts/
- Remove _ensure_projects from sagemaker_gaps.py — Terraform owns project
  creation; project IDs now read from tf-outputs.json via project.project_name
- Move Glue data source logic (_ensure_glue_data_source and helpers) from
  sagemaker_gaps.py into seed_glue_tables.py where it belongs
- seed_glue_tables.py uses SEED_CONFIG.default_project for owner and
  non-default projects for subscribers — no hardcoded keys anywhere

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…outputs.json

Terraform was removing DATAZONE_PROJECTS from the control-plane and
rale-authorizer Lambdas on every apply because the key wasn't in the
config, then sagemaker_gaps.py would re-add it via direct boto3 calls —
an infinite back-and-forth.

Fix: declare `var.datazone_projects` (default "") and wire it into both
Lambda environment blocks. The _terraform-apply task now reads the value
from infra/tf-outputs.json (written by sagemaker_gaps.py on the previous
run) and passes it as TF_VAR_datazone_projects, so Terraform owns the
key and plans no changes on subsequent deploys.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

@drernie drernie merged commit fa7c86a into main Mar 19, 2026
6 checks passed
@drernie drernie deleted the 48-lf-native branch March 19, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant