Skip to content

Commit b6b2e1f

Browse files
authored
QUA-1632: Add required permissions for each connector on the connections page (#1086)
* docs(connectors): add required permissions for each connector on the connections page Add detailed Setup Guide sections with minimum permissions, SQL grant examples, and troubleshooting tables for 18 connectors: PostgreSQL, MySQL, MariaDB, TimescaleDB, Microsoft SQL Server, Synapse, Oracle, DB2, Redshift, Databricks, Teradata, Hive, Presto, Trino, Dremio, Fabric Analytics, Azure Datalake Storage, and Google Cloud Storage. Each connector now documents: - Minimum permissions for source datastore (read-only) - Additional permissions for enrichment datastore (read-write) where supported - Ready-to-use SQL scripts or IAM policies - Troubleshooting common errors table * docs(connectors): add detailed troubleshooting notes and example scripts for all connectors Standardize all 18 connector permission sections to match the Athena documentation pattern. Each connector now includes: - Example scripts: Added ready-to-copy code blocks for Presto (rules.json), Trino (rules.json), Dremio (SQL GRANT), Fabric Analytics (Azure CLI), Azure Data Lake Storage (az role assignment), and Google Cloud Storage (gsutil iam ch) - Detailed Troubleshooting Notes: Added subsections for Authentication Errors, Permission Errors, and Connection Errors with bullet-point common causes and debugging tips for all 18 connectors - Admonitions: Added missing notes for DB2 (SYSCAT system catalogs) and Trino (connector-level security) * docs(connectors): add troubleshooting to BigQuery/Snowflake/S3, fix enrichment permissions, and document missing connection properties Add Troubleshooting Common Errors and Detailed Troubleshooting Notes sections to BigQuery, Snowflake, and Amazon S3 to match the Athena documentation pattern. Fix enrichment permissions tables with missing DROP TABLE (PostgreSQL, SQL Server, Synapse), ALTER TABLE + DROP TABLE (Redshift, Trino), and DROPIN (DB2) based on actual dataplane write operations. Document missing connection properties found in controlplane code: - Databricks: OAuth M2M authentication (Service Principal + OAuth Secret) - Oracle: TCP/TCPS protocol selector - Hive: ZooKeeper HA toggle - DB2: SSL toggle - Teradata: SELECT ON DBC.DatabasesV permission for catalog discovery - PostgreSQL: track_commit_timestamp config for incremental profiling Add Example IAM Policy JSON sections to Azure Data Lake Storage and Google Cloud Storage with ready-to-copy role assignments. * docs(connectors): add missing PROCESS privilege to MySQL and MariaDB GRANT examples Add GRANT PROCESS ON *.* to both source and enrichment SQL examples in MySQL and MariaDB. PROCESS is a global-level privilege that cannot be granted at database scope — it was listed in the permissions table but missing from the copy-paste examples, leaving users with an incomplete setup.
1 parent 94b0c59 commit b6b2e1f

22 files changed

Lines changed: 2059 additions & 2 deletions

.typos.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ MyApp = "MyApp"
66
OpenAPIv3 = "OpenAPIv3"
77
AKS = "AKS"
88
IST = "IST"
9+
CREATEIN = "CREATEIN"
10+
ALTERIN = "ALTERIN"
911

1012
[files]
1113
extend-exclude = [

docs/source-datastore/add-datastores/amazon-s3.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,56 @@ To create a policy, follow these steps:
119119
!!! warning
120120
Currently, object-level permissions alone are insufficient to authenticate the connection. Please ensure you also include bucket-level permissions as demonstrated in the example above.
121121

122+
### Troubleshooting Common Errors
123+
124+
| Error | Likely Cause | Fix |
125+
|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
126+
| `AccessDenied` | The IAM identity lacks one or more of the required S3 permissions | Add the missing permissions to the IAM policy and re-test the connection |
127+
| `InvalidAccessKeyId` | The Access Key ID does not exist or has been deactivated | Verify the Access Key ID in **IAM > Users > Security credentials** or generate a new key pair |
128+
| `SignatureDoesNotMatch` | The Secret Access Key is incorrect or was copied with extra whitespace | Re-enter the Secret Access Key carefully, ensuring no trailing spaces or newlines |
129+
| `NoSuchBucket` | The bucket name in the URI does not exist | Verify the bucket name and ensure the URI follows the format `s3://bucket-name` |
130+
| `AllAccessDisabled` | The bucket policy explicitly denies access or the bucket is in a different account | Check the bucket policy for explicit `Deny` statements and verify the bucket is in the correct AWS account |
131+
132+
### Detailed Troubleshooting Notes
133+
134+
#### Authentication Errors
135+
136+
The error `InvalidAccessKeyId` or `SignatureDoesNotMatch` indicates that the AWS credentials are incorrect or malformed.
137+
138+
Common causes:
139+
140+
- **Incorrect Access Key ID** — the Access Key ID was misspelled or has been deactivated in the IAM Console.
141+
- **Incorrect Secret Access Key** — the Secret Access Key was copied with extra whitespace, a trailing newline, or was truncated.
142+
- **Rotated credentials** — the access key pair has been rotated since the connection was created.
143+
- **Temporary credentials** — if using STS assumed-role credentials, the session token may be missing or expired.
144+
145+
!!! note
146+
The Secret Access Key is only visible once at creation time. If you cannot verify it, generate a new access key pair from the IAM Console.
147+
148+
#### Permission Errors
149+
150+
The error `AccessDenied` means the IAM identity authenticated successfully but lacks the required S3 permissions.
151+
152+
Common causes:
153+
154+
- **Missing bucket-level permissions** — the IAM policy grants object-level permissions (`s3:GetObject`) but not bucket-level permissions (`s3:ListBucket`). Both are required.
155+
- **Bucket policy conflict** — the bucket has a resource-based policy with an explicit `Deny` that overrides the IAM policy.
156+
- **S3 Block Public Access** — the bucket's public access settings may block access even for authenticated IAM users if the policy references public access.
157+
- **Wrong resource ARN** — the IAM policy specifies a different bucket or path than the one in the connection form.
158+
159+
#### Connection Errors
160+
161+
The error `NoSuchBucket` or `AllAccessDisabled` indicates a configuration issue with the bucket.
162+
163+
Common causes:
164+
165+
- **Bucket does not exist** — the bucket name was misspelled or the bucket was deleted.
166+
- **Wrong region** — the bucket is in a different AWS region than expected, causing endpoint resolution failures.
167+
- **Bucket in different account** — the bucket belongs to a different AWS account and cross-account access is not configured.
168+
169+
!!! tip
170+
Start by confirming credentials are valid (authentication errors), then verify IAM policy permissions (permission errors), and finally check the bucket name and region (connection errors).
171+
122172
## Add a Source Datastore
123173

124174
A source datastore is a storage location used to connect and access data from external sources. Amazon S3 is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.

docs/source-datastore/add-datastores/azure-datalake-storage.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,160 @@ After completing the setup, you will have the following credentials:
6565
!!! tip
6666
For detailed step-by-step instructions on creating a service principal in the Azure Portal, refer to the [**Microsoft documentation**](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal){:target="_blank"}.
6767

68+
## Datastore Azure Datalake Storage Privileges
69+
70+
The permissions required depend on the authentication method and whether you are using Azure Datalake Storage as a source or enrichment datastore.
71+
72+
### Minimum Permissions (Source Datastore)
73+
74+
#### Access Key Authentication
75+
76+
Access keys provide full read/write access to the storage account by default. No additional role assignments are needed.
77+
78+
#### Service Principal Authentication
79+
80+
The Service Principal must be assigned the following Azure RBAC role on the target container or storage account:
81+
82+
| Role / Permission | Purpose |
83+
|------------------------------------------------|-------------------------------------------------------------------------|
84+
| `Storage Blob Data Reader` | Read and list blobs (files) in the container |
85+
86+
Specific permissions included in this role:
87+
88+
| Permission | Purpose |
89+
|------------------------------------------------|-------------------------------------------------------------------------|
90+
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` | Read blob (file) contents for profiling and scanning |
91+
| `Microsoft.Storage/storageAccounts/blobServices/containers/read` | List blobs in the container to discover data assets |
92+
93+
### Additional Permissions for Enrichment Datastore
94+
95+
#### Access Key Authentication
96+
97+
Access keys provide full read/write access by default. No additional role assignments are needed.
98+
99+
#### Service Principal Authentication
100+
101+
For enrichment, the Service Principal must be assigned a higher-privilege role:
102+
103+
| Role / Permission | Purpose |
104+
|------------------------------------------------|-------------------------------------------------------------------------|
105+
| `Storage Blob Data Contributor` | Read, write, and delete blobs in the container |
106+
107+
Specific permissions included in this role:
108+
109+
| Permission | Purpose |
110+
|------------------------------------------------|-------------------------------------------------------------------------|
111+
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` | Read blob contents |
112+
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write` | Write enrichment result files |
113+
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete` | Remove temporary or outdated enrichment files |
114+
| `Microsoft.Storage/storageAccounts/blobServices/containers/read` | List blobs in the container |
115+
116+
!!! note
117+
If the storage account uses **hierarchical namespace** (Azure Data Lake Storage Gen2), ensure the Service Principal also has appropriate ACL permissions at the directory level if RBAC alone is not sufficient.
118+
119+
### Example IAM Role Assignment
120+
121+
Replace `<SERVICE_PRINCIPAL_ID>`, `<SUBSCRIPTION_ID>`, `<RESOURCE_GROUP>`, `<STORAGE_ACCOUNT>`, and `<CONTAINER>` with your actual values.
122+
123+
#### Source Datastore (Read-Only)
124+
125+
```json
126+
{
127+
"properties": {
128+
"roleDefinitionId": "/subscriptions/<SUBSCRIPTION_ID>/providers/Microsoft.Authorization/roleDefinitions/2a2b9908-6ea1-4ae2-8e65-a410df84e7d1",
129+
"principalId": "<SERVICE_PRINCIPAL_ID>",
130+
"scope": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
131+
}
132+
}
133+
```
134+
135+
!!! note
136+
The role definition ID `2a2b9908-6ea1-4ae2-8e65-a410df84e7d1` corresponds to the **Storage Blob Data Reader** built-in role.
137+
138+
#### Enrichment Datastore (Read-Write)
139+
140+
```json
141+
{
142+
"properties": {
143+
"roleDefinitionId": "/subscriptions/<SUBSCRIPTION_ID>/providers/Microsoft.Authorization/roleDefinitions/ba92f5b4-2d11-453d-a403-e96b0029c9fe",
144+
"principalId": "<SERVICE_PRINCIPAL_ID>",
145+
"scope": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
146+
}
147+
}
148+
```
149+
150+
!!! note
151+
The role definition ID `ba92f5b4-2d11-453d-a403-e96b0029c9fe` corresponds to the **Storage Blob Data Contributor** built-in role.
152+
153+
#### Assigning via Azure CLI
154+
155+
```bash
156+
# Source Datastore (Read-Only)
157+
az role assignment create \
158+
--assignee <SERVICE_PRINCIPAL_ID> \
159+
--role "Storage Blob Data Reader" \
160+
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
161+
162+
# Enrichment Datastore (Read-Write)
163+
az role assignment create \
164+
--assignee <SERVICE_PRINCIPAL_ID> \
165+
--role "Storage Blob Data Contributor" \
166+
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
167+
```
168+
169+
!!! tip
170+
You can also assign roles through the Azure Portal by navigating to the storage account or container, selecting **Access Control (IAM)**, and clicking **Add role assignment**.
171+
172+
### Troubleshooting Common Errors
173+
174+
| Error | Likely Cause | Fix |
175+
|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
176+
| `AuthenticationFailed` | Incorrect account name, access key, or Service Principal credentials (Client ID, Client Secret, Tenant ID) | Verify credentials in the Azure Portal — check the storage account access keys or app registration |
177+
| `AuthorizationPermissionMismatch` | The Service Principal does not have the required RBAC role on the container or storage account | Assign `Storage Blob Data Reader` (source) or `Storage Blob Data Contributor` (enrichment) to the Service Principal |
178+
| `ContainerNotFound` | The container name in the URI does not exist | Verify the container name in the Azure Portal under the storage account's **Containers** section |
179+
| `InvalidUri` | The URI format is incorrect — it must follow `abfss://<container>@<account>.dfs.core.windows.net` | Verify the URI format matches the expected pattern |
180+
| `This request is not authorized to perform this operation` | The Service Principal has `Storage Blob Data Reader` but the operation requires write access | Upgrade the role assignment to `Storage Blob Data Contributor` for enrichment datastores |
181+
182+
### Detailed Troubleshooting Notes
183+
184+
#### Authentication Errors
185+
186+
The error `AuthenticationFailed` indicates that the credentials are incorrect or the authentication method is misconfigured.
187+
188+
Common causes:
189+
190+
- **Incorrect access key** — the storage account access key was copied incorrectly or has been rotated since the connection was created.
191+
- **Wrong account name** — the account name does not match the storage account.
192+
- **Expired Client Secret** — when using Service Principal authentication, the Client Secret has expired.
193+
- **Wrong Tenant ID** — the Tenant ID does not match the Microsoft Entra ID tenant where the app is registered.
194+
195+
!!! note
196+
Access keys provide the simplest authentication but grant full access to the storage account. For least-privilege access, use Service Principal authentication with RBAC role assignments scoped to the specific container.
197+
198+
#### Permission Errors
199+
200+
The error `AuthorizationPermissionMismatch` or `This request is not authorized to perform this operation` means the credentials are valid but lack the required permissions.
201+
202+
Common causes:
203+
204+
- **Missing RBAC role** — the Service Principal does not have `Storage Blob Data Reader` (source) or `Storage Blob Data Contributor` (enrichment) assigned.
205+
- **Role assigned at wrong scope** — the role is assigned at the subscription or resource group level but not at the container level, or vice versa.
206+
- **ACL restrictions** — when using hierarchical namespace (Data Lake Storage Gen2), POSIX ACLs may restrict access even if RBAC roles are assigned.
207+
- **Source vs. enrichment mismatch** — the Service Principal has `Storage Blob Data Reader` but the operation requires write access (enrichment).
208+
209+
#### Connection Errors
210+
211+
The error `ContainerNotFound` or `InvalidUri` indicates a configuration issue with the URI or container name.
212+
213+
Common causes:
214+
215+
- **Container does not exist** — the container name in the URI was misspelled or the container has not been created.
216+
- **Invalid URI format** — the URI must follow `abfss://<container>@<account>.dfs.core.windows.net`. Missing the `abfss://` scheme or using the wrong account suffix (`.blob.core.windows.net` instead of `.dfs.core.windows.net`) will cause failures.
217+
- **Storage account firewall** — the storage account firewall blocks connections from the Qualytics IP.
218+
219+
!!! tip
220+
Start by confirming credentials are valid (authentication errors), then verify RBAC role assignments (permission errors), and finally check the URI format and container existence (connection errors).
221+
68222
## Add a Source Datastore
69223

70224
A source datastore is a storage location used to connect and access data from external sources. Azure Datalake Storage is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.

docs/source-datastore/add-datastores/bigquery.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,55 @@ Grants read and write access for data editing and management:
9090
| 3. | `roles/bigquery.jobUser` | Enables running of jobs such as queries and data loading. |
9191
| 4. | `roles/bigquery.readSessionUser` | Facilitates the creation of read sessions for efficient data retrieval. |
9292

93+
### Troubleshooting Common Errors
94+
95+
| Error | Likely Cause | Fix |
96+
|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
97+
| `Access Denied: 403` | The service account lacks the required BigQuery roles | Assign `bigquery.dataViewer` and `bigquery.jobUser` roles to the service account |
98+
| `Not found: Dataset` | The Dataset ID in the connection form does not exist or the service account cannot see it | Verify the Dataset ID in the BigQuery Console and ensure the service account has `bigquery.dataViewer` on the dataset |
99+
| `Not found: Project` | The Project ID is incorrect or the service account does not belong to the project | Verify the Project ID in the Google Cloud Console |
100+
| `The caller does not have bigquery.jobs.create permission` | The service account lacks the `bigquery.jobUser` role | Assign `roles/bigquery.jobUser` to the service account at the project level |
101+
| `Invalid service account key` | The JSON key file is malformed, truncated, or belongs to a different project | Re-download the service account key from **IAM & Admin > Service Accounts** in the Google Cloud Console |
102+
103+
### Detailed Troubleshooting Notes
104+
105+
#### Authentication Errors
106+
107+
The error `Invalid service account key` indicates that the JSON key file used for authentication is incorrect or corrupted.
108+
109+
Common causes:
110+
111+
- **Malformed JSON** — the key file was modified or truncated after download.
112+
- **Wrong project** — the service account key belongs to a different Google Cloud project than the one specified in the connection form.
113+
- **Key deleted or disabled** — the key was deleted from the service account in the Google Cloud Console.
114+
115+
!!! note
116+
Service account keys do not expire, but they can be deleted or disabled by project administrators. If the key stops working, verify its status in **IAM & Admin > Service Accounts > Keys**.
117+
118+
#### Permission Errors
119+
120+
The error `Access Denied: 403` or `The caller does not have bigquery.jobs.create permission` means the service account authenticated successfully but lacks the required roles.
121+
122+
Common causes:
123+
124+
- **Missing `bigquery.jobUser`** — the service account cannot run queries without this role. It must be assigned at the project level.
125+
- **Missing `bigquery.dataViewer`** — the service account cannot read dataset or table metadata.
126+
- **Missing `bigquery.readSessionUser`** — the service account cannot create read sessions for efficient data retrieval via the Storage API.
127+
- **Dataset-level vs. project-level** — some roles are assigned at the dataset level but the operation requires project-level access (e.g., `bigquery.jobUser`).
128+
129+
#### Connection Errors
130+
131+
The error `Not found: Dataset` or `Not found: Project` indicates a configuration issue with the Project ID or Dataset ID.
132+
133+
Common causes:
134+
135+
- **Wrong Project ID** — the Project ID does not match the Google Cloud project.
136+
- **Wrong Dataset ID** — the Dataset ID was misspelled or does not exist in the specified project.
137+
- **Regional mismatch** — the temporary dataset is in a different region than the source dataset, causing cross-region query failures.
138+
139+
!!! tip
140+
Start by confirming the service account key is valid (authentication errors), then verify BigQuery roles (permission errors), and finally check the Project ID and Dataset ID (connection errors).
141+
93142
## Add a Source Datastore
94143

95144
A source datastore is a storage location used to connect to and access data from external sources. BigQuery is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.

0 commit comments

Comments
 (0)