diff --git a/docs-data/property-overrides.json b/docs-data/property-overrides.json index 9d34ab7769..e8479a3b2e 100644 --- a/docs-data/property-overrides.json +++ b/docs-data/property-overrides.json @@ -979,7 +979,7 @@ "config_scope": "cluster" }, "iceberg_rest_catalog_authentication_mode": { - "description": "The authentication mode for client requests made to the Iceberg catalog. Choose from: `none`, `bearer`, `oauth2`, and `aws_sigv4`. In `bearer` mode, the token specified in `iceberg_rest_catalog_token` is used unconditonally, and no attempts are made to refresh the token. In `oauth2` mode, the credentials specified in `iceberg_rest_catalog_client_id` and `iceberg_rest_catalog_client_secret` are used to obtain a bearer token from the URI defined by `iceberg_rest_catalog_oauth2_server_uri`. In `aws_sigv4` mode, the same AWS credentials used for cloud storage (see `cloud_storage_region`, `cloud_storage_access_key`, `cloud_storage_secret_key`, and `cloud_storage_credentials_source`) are used to sign requests to AWS Glue catalog with SigV4.", + "description": "The authentication mode for client requests made to the Iceberg catalog. Choose from: `none`, `bearer`, `oauth2`, `aws_sigv4`, and `gcp`. In `bearer` mode, the token specified in `iceberg_rest_catalog_token` is used unconditionally, and no attempts are made to refresh the token. In `oauth2` mode, the credentials specified in `iceberg_rest_catalog_client_id` and `iceberg_rest_catalog_client_secret` are used to obtain a bearer token from the URI defined by `iceberg_rest_catalog_oauth2_server_uri`. In `aws_sigv4` mode, the same AWS credentials used for cloud storage (see `cloud_storage_region`, `cloud_storage_access_key`, `cloud_storage_secret_key`, and `cloud_storage_credentials_source`) are used to sign requests to AWS Glue catalog with SigV4. In `gcp` mode, GCP VM instance metadata credentials are used to authenticate with the Iceberg REST catalog.", "config_scope": "cluster" }, "iceberg_rest_catalog_aws_access_key": { diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 089097e22c..a7e667dfba 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -200,7 +200,7 @@ *** xref:manage:iceberg/rest-catalog/index.adoc[Integrate with REST Catalogs] **** xref:manage:iceberg/iceberg-topics-aws-glue.adoc[AWS Glue] **** xref:manage:iceberg/iceberg-topics-databricks-unity.adoc[Databricks Unity Catalog] -**** xref:manage:iceberg/iceberg-topics-gcp-biglake.adoc[GCP BigLake] +**** xref:manage:iceberg/iceberg-topics-gcp-biglake.adoc[GCP Lakehouse] **** xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[Snowflake and Open Catalog] *** xref:manage:iceberg/query-iceberg-topics.adoc[Query Iceberg Topics] *** xref:manage:iceberg/migrate-iceberg-catalog.adoc[Migrate Iceberg Catalogs] diff --git a/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc b/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc index 898c1ddf80..c86f07ac0b 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc @@ -76,7 +76,7 @@ When `iceberg_delete` or the topic override `redpanda.iceberg.delete` is set to ifdef::env-cloud[] For BYOC clusters created in March 2026 or later, the required AWS Glue IAM policy is automatically provisioned and attached to the cluster's IAM role when Iceberg is enabled. You don't need to manually create IAM policies or roles for Glue access. -For clusters created before March 2026, you must re-run `rpk byoc apply` to provision the Glue IAM policy before enabling Iceberg. This is a one-time operation that updates the cluster's IAM role with the necessary Glue permissions. +For clusters created before March 2026, you must re-run `rpk cloud byoc aws apply --redpanda-id=` to provision the Glue IAM policy before enabling Iceberg. This is a one-time operation that updates the cluster's IAM role with the necessary Glue permissions. endif::[] ifndef::env-cloud[] diff --git a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc index 553d37b820..e47849a573 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc @@ -1,15 +1,25 @@ -= Use Iceberg Topics with GCP BigLake -:description: Add Redpanda topics as Iceberg tables to your Google BigLake data lakehouse that you can query from Google BigQuery. += Use Iceberg Topics with GCP Lakehouse +:description: Add Redpanda topics as Iceberg tables to Google Lakehouse for Apache Iceberg that you can query from Google BigQuery. :page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration + [NOTE] ==== include::shared:partial$enterprise-license.adoc[] ==== // tag::single-source[] +:page-topic-type: how-to +:personas: developer, sre +:learning-objective-1: Create a catalog in GCP Lakehouse for Iceberg topic data. +:learning-objective-2: Configure a Redpanda cluster to use GCP Lakehouse as an Iceberg REST catalog. +:learning-objective-3: Query Iceberg topic data from Google BigQuery. + +ifdef::env-cloud[] +:rpk_install_doc: manage:rpk/rpk-install.adoc +endif::[] + ifndef::env-cloud[] -:rp_version: 25.3 :rpk_install_doc: get-started:rpk-install.adoc endif::[] @@ -20,59 +30,82 @@ This guide is for integrating Iceberg topics with a managed REST catalog. Integr ifndef::env-cloud[The blog post uses a Redpanda Cloud cluster, but you follow the same steps for a Self-Managed cluster.] ==== -This guide walks you through querying Redpanda topics as Iceberg tables stored in Google Cloud Storage, using a REST catalog integration with https://cloud.google.com/biglake/docs/introduction[Google BigLake^]. In this guide, you do the following: +This guide walks you through querying Redpanda topics as Iceberg tables stored in Google Cloud Storage, using a REST catalog integration with https://docs.cloud.google.com/lakehouse/docs/introduction[Google Lakehouse for Apache Iceberg^] (formerly BigLake). -- Create Google Cloud resources such as a storage bucket and service account -- Grant permissions to the service account to access Iceberg data in the bucket -- Create a catalog in BigLake -- Configure the BigLake integration for your Redpanda cluster -- Query the Iceberg data in Google BigQuery +After completing this guide, you will be able to: -This guide also includes optional steps to deploy a Redpanda quickstart cluster on a GCP VM instance using Docker Compose, which you can use to quickly test the BigLake Iceberg integration. +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +ifndef::env-cloud[] +This guide also includes optional steps to deploy a Redpanda quickstart cluster on a GCP VM instance using Docker Compose, which you can use to quickly test the Lakehouse Iceberg integration. +endif::[] For general information about Iceberg catalog integrations in Redpanda, see xref:manage:iceberg/use-iceberg-catalogs.adoc[]. -NOTE: Check the https://cloud.google.com/biglake[BigLake product page^] for the latest status and availability of the REST Catalog API. +NOTE: Check the https://docs.cloud.google.com/lakehouse/docs[Lakehouse product page^] for the latest status and availability of the REST Catalog API. == Prerequisites * A Google Cloud Platform (GCP) project. +ifdef::env-cloud[] ++ +** Lakehouse must be in the same GCP project as the cluster. Cross-project Lakehouse is not supported. +endif::[] + If you do not have permissions to manage GCP resources such as VMs, storage buckets, and service accounts in your project, ask your project owner to create or update them for you. * The https://docs.cloud.google.com/sdk/docs/install[`gcloud` CLI^] installed and configured for your GCP project. -* https://cloud.google.com/biglake/docs/enable-biglake-api[BigLake API^] enabled for your GCP project. -* Redpanda v{full-version} or later. Your Redpanda cluster must be deployed on GCP VMs. +* https://cloud.google.com/biglake/docs/enable-biglake-api[Lakehouse (BigLake) API^] enabled for your GCP project. +* Redpanda version 25.3 or later. +ifndef::env-cloud[] +Your Redpanda cluster must be deployed on GCP VMs. +endif::[] * `rpk` xref:{rpk_install_doc}[installed or updated] to the latest version. +ifdef::env-cloud[] +** You can also use the Redpanda Cloud API to xref:manage:cluster-maintenance/config-cluster.adoc#set-cluster-configuration-properties[reference secrets in your cluster configuration]. +endif::[] ifndef::env-cloud[] * xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables. + -You also use the GCS bucket URI to set the warehouse location for the BigLake catalog. +You also use the GCS bucket URI to set the warehouse location for the Lakehouse catalog. +endif::[] + +ifdef::env-cloud[] +NOTE: For BYOC clusters created before June 9, 2026, you must re-run `rpk cloud byoc gcp apply --redpanda-id= --project-id=` to enable the required API services before following this guide. This is a one-time operation. endif::[] == Limitations === Multi-region bucket support -BigLake metastore does not support multi-region buckets. Use single-region buckets to store your Iceberg topics. +The Lakehouse runtime catalog does not support multi-region buckets. Use single-region buckets to store your Iceberg topics. === Catalog deletion -Currently, it is not possible to delete non-empty BigLake Iceberg catalogs through the BigLake interface. If you need to reconfigure your setup, create a new bucket or use the REST API to remove the existing catalog. +Currently, it is not possible to delete non-empty Lakehouse Iceberg catalogs through the Lakehouse interface. If you need to reconfigure your setup, create a new bucket or use the REST API to remove the existing catalog. === Topic names -BigLake does not support Iceberg table names that contain dots (`.`). When creating Iceberg topics in Redpanda that you plan to access through BigLake, either: +Lakehouse does not support Iceberg table names that contain dots (`.`). When creating Iceberg topics in Redpanda that you plan to access through Lakehouse, either: - Use the `iceberg_topic_name_dot_replacement` cluster property to set a replacement string for dots in topic names. Ensure that the replacement value does not cause table name collisions. For example, `current.orders` and `current_orders` would both map to the same table name if you set the replacement to an underscore (`_`). - Ensure that the new topic names do not include dots. -You must also set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). See <> for the list of cluster properties to set when enabling the BigLake REST catalog integration. +You must also set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). See <> for the list of cluster properties to set when enabling the Lakehouse REST catalog integration. == Set up Google Cloud resources +ifdef::env-cloud[] +For BYOC clusters, the required Lakehouse IAM permissions are automatically provisioned and attached to the cluster's service account when Iceberg is enabled with a Lakehouse endpoint. You can skip to <>. + +For BYOVPC clusters, you must grant the required permissions to your cluster's service account and enable the `biglake.googleapis.com` and `bigquery.googleapis.com` APIs in your GCP project. +endif::[] + +ifndef::env-cloud[] === Create a service account for Redpanda -If you don't already have a Google Cloud service account to use, create a new service account that will be used by the VMs running Redpanda. Redpanda uses this account for writing data to Tiered Storage, Iceberg data and metadata, and for interacting with the BigLake catalog: +If you don't already have a Google Cloud service account to use, create a new service account that will be used by the VMs running Redpanda. Redpanda uses this account for writing data to Tiered Storage, Iceberg data and metadata, and for interacting with the Lakehouse catalog: [,bash] ---- @@ -81,8 +114,9 @@ gcloud iam service-accounts create --display-name "`: You can use a https://docs.cloud.google.com/iam/docs/service-accounts-create[name^] that contains lowercase alphanumeric characters and dashes. +* ``: You can use a https://docs.cloud.google.com/iam/docs/service-accounts-create[name^] that contains lowercase alphanumeric characters and dashes. * ``: Enter a display name for the service account. +endif::[] === Grant required permissions @@ -113,15 +147,13 @@ gcloud projects add-iam-policy-binding $(gcloud config get-value project) \ --role="roles/biglake.editor" ---- -=== Create a BigLake catalog +=== Create a Lakehouse catalog -Create a BigLake Iceberg REST catalog using the `gcloud` CLI: - -NOTE: This command is currently pre-GA and may change. Check the https://docs.cloud.google.com/sdk/gcloud/reference/alpha/biglake/iceberg/catalogs/create[gcloud reference^] for the latest information. +Create a Lakehouse Iceberg REST catalog using the https://docs.cloud.google.com/sdk/gcloud/reference/biglake/iceberg/catalogs/create[`gcloud biglake`^] command: [,bash] ---- -gcloud alpha biglake iceberg catalogs create --catalog-type=gcs-bucket --project= +gcloud biglake iceberg catalogs create --catalog-type=gcs-bucket --project= ---- Replace the placeholder values: @@ -129,9 +161,10 @@ Replace the placeholder values: * ``: Use the name of your storage bucket as the catalog ID. * ``: Your GCP project ID. +ifndef::env-cloud[] == Optional: Deploy Redpanda quickstart on GCP -If you want to quickly test Iceberg topics in BigLake, you can deploy a test environment using the Redpanda Streaming quickstart. In this section, you create a new storage bucket for Tiered Storage and Iceberg data. You configure a Redpanda cluster for the BigLake catalog integration and deploy the cluster on a GCP Linux VM instance using Docker Compose. +If you want to quickly test Iceberg topics in Lakehouse, you can deploy a test environment using the Redpanda Streaming quickstart. You create a new storage bucket for Tiered Storage and Iceberg data, configure a Redpanda cluster for the Lakehouse catalog integration, and deploy the cluster on a GCP Linux VM instance using Docker Compose. NOTE: If you already have a Redpanda cluster deployed on GCP, skip to <>. @@ -233,7 +266,7 @@ cloud_storage_disable_tls: false cloud_storage_bucket: cloud_storage_credentials_source: gcp_instance_metadata -# Configure Iceberg REST catalog integration with BigLake +# Configure Iceberg REST catalog integration with Lakehouse iceberg_enabled: true iceberg_catalog_type: rest iceberg_rest_catalog_endpoint: https://biglake.googleapis.com/iceberg/v1/restcatalog @@ -271,6 +304,7 @@ curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk- rpk profile create quickstart --from-profile rpk-profile.yaml ---- +endif::[] == Configure Redpanda for Iceberg @@ -295,12 +329,40 @@ iceberg_dlq_table_suffix: _dlq * Replace `` with your bucket name and `` with your Google Cloud project ID. * You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table]. -- +endif::[] +ifdef::env-cloud[] ++ +Use `rpk` as shown in the following example, or xref:manage:cluster-maintenance/config-cluster.adoc#set-cluster-configuration-properties[use the Cloud API] to update these cluster properties. The update might take several minutes to complete. ++ +[,bash] +---- +rpk cloud login + +rpk profile create --from-cloud + +rpk cluster config set \ + iceberg_enabled=true \ + iceberg_catalog_type=rest \ + iceberg_rest_catalog_endpoint=https://biglake.googleapis.com/iceberg/v1/restcatalog \ + iceberg_rest_catalog_authentication_mode=gcp \ + iceberg_rest_catalog_warehouse=gs:/// \ + iceberg_rest_catalog_gcp_user_project= \ + iceberg_dlq_table_suffix=_dlq +---- ++ +-- +* ``: Your Redpanda cluster ID. +* ``: For BYOC clusters, the bucket name is `redpanda-cloud-storage-`. For BYOVPC clusters, use the name of the object storage bucket you created as a customer-managed resource. +* ``: Your GCP project ID. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table]. +-- +endif::[] ifndef::env-cloud[] . If you change the configuration for a running cluster, you must restart that cluster now. endif::[] -. Enable the REST catalog integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following examples show how to use xref:get-started:rpk-install.adoc[`rpk`] to either create a new topic or alter the configuration for an existing topic and set the Iceberg mode to `key_value`. The `key_value` mode creates a two-column Iceberg table for the topic, with one column for the record metadata including the key, and another binary column for the record's value. See xref:manage:iceberg/choose-iceberg-mode.adoc[] for more details on Iceberg modes. +. Enable the REST catalog integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following examples show how to use xref:get-started:rpk-install.adoc[`rpk`] to either create a new topic or alter the configuration for an existing topic and set the Iceberg mode to `key_value`. The `key_value` mode creates a two-column Iceberg table for the topic, with one column for the record metadata including the key, and another binary column for the record's value. See xref:manage:iceberg/specify-iceberg-schema.adoc[] for more details on Iceberg modes. + .Create a new topic and set `redpanda.iceberg.mode`: [,bash] @@ -312,8 +374,9 @@ rpk topic create --topic-config=redpanda.iceberg.mode=key_value [,bash] ---- rpk topic alter-config --set redpanda.iceberg.mode=key_value ----- +---- + +ifndef::env-cloud[] [NOTE] ==== If you're using the Self-managed quickstart for testing, your Redpanda cluster includes a `transactions` topic with data in it, and a sample schema in the Schema Registry. To enable Iceberg for the `transactions` topic, run: @@ -323,8 +386,9 @@ If you're using the Self-managed quickstart for testing, your Redpanda cluster i rpk topic alter-config transactions --set redpanda.iceberg.mode=value_schema_latest:subject=transactions ---- ==== +endif::[] -It may take a few moments for the Iceberg data to become available in BigLake. +Iceberg data can take a few moments to become available in Lakehouse. == Query Iceberg topics in BigQuery @@ -344,8 +408,9 @@ LIMIT 10 + Replace `` with your bucket name. -Your Redpanda topic is now available as Iceberg tables in BigLake, allowing you to run analytics queries directly on your streaming data. +Your Redpanda topic is now available as Iceberg tables in Lakehouse, allowing you to run analytics queries directly on your streaming data. +ifndef::env-cloud[] == Optional: Clean up resources When you're finished with the quickstart example, you can clean up the resources you created: @@ -362,12 +427,13 @@ gcloud storage buckets delete gs:// gcloud iam service-accounts delete @$(gcloud config get-value project).iam.gserviceaccount.com ---- -NOTE: Manually delete the BigLake catalog using the https://docs.cloud.google.com/bigquery/docs/reference/biglake/rest/v1/projects.locations.catalogs/delete[REST API^]. +NOTE: Manually delete the Lakehouse catalog using the https://docs.cloud.google.com/bigquery/docs/reference/biglake/rest/v1/projects.locations.catalogs/delete[REST API^]. +endif::[] include::shared:partial$suggested-reading.adoc[] - xref:manage:iceberg/use-iceberg-catalogs.adoc[] - xref:manage:iceberg/query-iceberg-topics.adoc[] -- https://cloud.google.com/biglake/docs/introduction[Google BigLake documentation^] +- https://docs.cloud.google.com/lakehouse/docs/introduction[Google Lakehouse for Apache Iceberg documentation^] // end::single-source[] diff --git a/modules/manage/pages/iceberg/use-iceberg-catalogs.adoc b/modules/manage/pages/iceberg/use-iceberg-catalogs.adoc index 2383ca46f7..67beecd2ad 100644 --- a/modules/manage/pages/iceberg/use-iceberg-catalogs.adoc +++ b/modules/manage/pages/iceberg/use-iceberg-catalogs.adoc @@ -35,6 +35,7 @@ This section provides general guidance on using REST catalogs with Redpanda. For * xref:manage:iceberg/iceberg-topics-aws-glue.adoc[AWS Glue Data Catalog] * xref:manage:iceberg/iceberg-topics-databricks-unity.adoc[Databricks Unity Catalog] +* xref:manage:iceberg/iceberg-topics-gcp-biglake.adoc[GCP Lakehouse] * xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[Snowflake with Open Catalog] ==== @@ -55,10 +56,10 @@ The Iceberg integration for Redpanda Cloud supports multiple Iceberg catalogs ac The following matrix shows the current status of Iceberg integrations across different cloud providers and catalogs. Check this matrix regularly as Redpanda Cloud continues to expand GA coverage for Iceberg topics. |=== -| | Databricks Unity Catalog | Snowflake Open Catalog | AWS Glue Data Catalog | Google BigQuery +| | Databricks Unity Catalog | Snowflake Open Catalog | AWS Glue Data Catalog | GCP Lakehouse |AWS |Supported |Beta |Beta |N/A -|GCP |Supported |Beta |N/A |Beta +|GCP |Supported |Beta |N/A |Supported |Azure |Beta |Beta |N/A |N/A |=== endif::[] @@ -76,7 +77,7 @@ The following shows the current status of Iceberg catalog integrations. Check th |Databricks Unity Catalog |Supported |Snowflake Open Catalog |Supported |AWS Glue Data Catalog | Beta -|Google BigQuery |Beta +|GCP Lakehouse |Supported |=== endif::[] diff --git a/modules/reference/partials/properties/cluster-properties.adoc b/modules/reference/partials/properties/cluster-properties.adoc index 8da5b42d14..e9fac0b7b4 100644 --- a/modules/reference/partials/properties/cluster-properties.adoc +++ b/modules/reference/partials/properties/cluster-properties.adoc @@ -7968,7 +7968,7 @@ endif::[] // tag::redpanda-cloud[] === iceberg_rest_catalog_authentication_mode -The authentication mode for client requests made to the Iceberg catalog. Choose from: `none`, `bearer`, `oauth2`, and `aws_sigv4`. In `bearer` mode, the token specified in `iceberg_rest_catalog_token` is used unconditonally, and no attempts are made to refresh the token. In `oauth2` mode, the credentials specified in `iceberg_rest_catalog_client_id` and `iceberg_rest_catalog_client_secret` are used to obtain a bearer token from the URI defined by `iceberg_rest_catalog_oauth2_server_uri`. In `aws_sigv4` mode, the same AWS credentials used for cloud storage (see `cloud_storage_region`, `cloud_storage_access_key`, `cloud_storage_secret_key`, and `cloud_storage_credentials_source`) are used to sign requests to AWS Glue catalog with SigV4. +The authentication mode for client requests made to the Iceberg catalog. Choose from: `none`, `bearer`, `oauth2`, `aws_sigv4`, and `gcp`. In `bearer` mode, the token specified in `iceberg_rest_catalog_token` is used unconditionally, and no attempts are made to refresh the token. In `oauth2` mode, the credentials specified in `iceberg_rest_catalog_client_id` and `iceberg_rest_catalog_client_secret` are used to obtain a bearer token from the URI defined by `iceberg_rest_catalog_oauth2_server_uri`. In `aws_sigv4` mode, the same AWS credentials used for cloud storage (see `cloud_storage_region`, `cloud_storage_access_key`, `cloud_storage_secret_key`, and `cloud_storage_credentials_source`) are used to sign requests to AWS Glue catalog with SigV4. In `gcp` mode, GCP VM instance metadata credentials are used to authenticate with the Iceberg REST catalog. ifdef::env-cloud[] NOTE: This property is available only in Redpanda Cloud BYOC deployments.