Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ MyApp = "MyApp"
OpenAPIv3 = "OpenAPIv3"
AKS = "AKS"
IST = "IST"
CREATEIN = "CREATEIN"
ALTERIN = "ALTERIN"

[files]
extend-exclude = [
Expand Down
50 changes: 50 additions & 0 deletions docs/source-datastore/add-datastores/amazon-s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,56 @@ To create a policy, follow these steps:
!!! warning
Currently, object-level permissions alone are insufficient to authenticate the connection. Please ensure you also include bucket-level permissions as demonstrated in the example above.

### Troubleshooting Common Errors

| Error | Likely Cause | Fix |
|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| `AccessDenied` | The IAM identity lacks one or more of the required S3 permissions | Add the missing permissions to the IAM policy and re-test the connection |
| `InvalidAccessKeyId` | The Access Key ID does not exist or has been deactivated | Verify the Access Key ID in **IAM > Users > Security credentials** or generate a new key pair |
| `SignatureDoesNotMatch` | The Secret Access Key is incorrect or was copied with extra whitespace | Re-enter the Secret Access Key carefully, ensuring no trailing spaces or newlines |
| `NoSuchBucket` | The bucket name in the URI does not exist | Verify the bucket name and ensure the URI follows the format `s3://bucket-name` |
| `AllAccessDisabled` | The bucket policy explicitly denies access or the bucket is in a different account | Check the bucket policy for explicit `Deny` statements and verify the bucket is in the correct AWS account |

### Detailed Troubleshooting Notes

#### Authentication Errors

The error `InvalidAccessKeyId` or `SignatureDoesNotMatch` indicates that the AWS credentials are incorrect or malformed.

Common causes:

- **Incorrect Access Key ID** — the Access Key ID was misspelled or has been deactivated in the IAM Console.
- **Incorrect Secret Access Key** — the Secret Access Key was copied with extra whitespace, a trailing newline, or was truncated.
- **Rotated credentials** — the access key pair has been rotated since the connection was created.
- **Temporary credentials** — if using STS assumed-role credentials, the session token may be missing or expired.

!!! note
The Secret Access Key is only visible once at creation time. If you cannot verify it, generate a new access key pair from the IAM Console.

#### Permission Errors

The error `AccessDenied` means the IAM identity authenticated successfully but lacks the required S3 permissions.

Common causes:

- **Missing bucket-level permissions** — the IAM policy grants object-level permissions (`s3:GetObject`) but not bucket-level permissions (`s3:ListBucket`). Both are required.
- **Bucket policy conflict** — the bucket has a resource-based policy with an explicit `Deny` that overrides the IAM policy.
- **S3 Block Public Access** — the bucket's public access settings may block access even for authenticated IAM users if the policy references public access.
- **Wrong resource ARN** — the IAM policy specifies a different bucket or path than the one in the connection form.

#### Connection Errors

The error `NoSuchBucket` or `AllAccessDisabled` indicates a configuration issue with the bucket.

Common causes:

- **Bucket does not exist** — the bucket name was misspelled or the bucket was deleted.
- **Wrong region** — the bucket is in a different AWS region than expected, causing endpoint resolution failures.
- **Bucket in different account** — the bucket belongs to a different AWS account and cross-account access is not configured.

!!! tip
Start by confirming credentials are valid (authentication errors), then verify IAM policy permissions (permission errors), and finally check the bucket name and region (connection errors).

## Add a Source Datastore

A source datastore is a storage location used to connect and access data from external sources. Amazon S3 is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.
Expand Down
154 changes: 154 additions & 0 deletions docs/source-datastore/add-datastores/azure-datalake-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,160 @@ After completing the setup, you will have the following credentials:
!!! tip
For detailed step-by-step instructions on creating a service principal in the Azure Portal, refer to the [**Microsoft documentation**](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal){:target="_blank"}.

## Datastore Azure Datalake Storage Privileges

The permissions required depend on the authentication method and whether you are using Azure Datalake Storage as a source or enrichment datastore.

### Minimum Permissions (Source Datastore)

#### Access Key Authentication

Access keys provide full read/write access to the storage account by default. No additional role assignments are needed.

#### Service Principal Authentication

The Service Principal must be assigned the following Azure RBAC role on the target container or storage account:

| Role / Permission | Purpose |
|------------------------------------------------|-------------------------------------------------------------------------|
| `Storage Blob Data Reader` | Read and list blobs (files) in the container |

Specific permissions included in this role:

| Permission | Purpose |
|------------------------------------------------|-------------------------------------------------------------------------|
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` | Read blob (file) contents for profiling and scanning |
| `Microsoft.Storage/storageAccounts/blobServices/containers/read` | List blobs in the container to discover data assets |

### Additional Permissions for Enrichment Datastore

#### Access Key Authentication

Access keys provide full read/write access by default. No additional role assignments are needed.

#### Service Principal Authentication

For enrichment, the Service Principal must be assigned a higher-privilege role:

| Role / Permission | Purpose |
|------------------------------------------------|-------------------------------------------------------------------------|
| `Storage Blob Data Contributor` | Read, write, and delete blobs in the container |

Specific permissions included in this role:

| Permission | Purpose |
|------------------------------------------------|-------------------------------------------------------------------------|
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` | Read blob contents |
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write` | Write enrichment result files |
| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete` | Remove temporary or outdated enrichment files |
| `Microsoft.Storage/storageAccounts/blobServices/containers/read` | List blobs in the container |

!!! note
If the storage account uses **hierarchical namespace** (Azure Data Lake Storage Gen2), ensure the Service Principal also has appropriate ACL permissions at the directory level if RBAC alone is not sufficient.

### Example IAM Role Assignment

Replace `<SERVICE_PRINCIPAL_ID>`, `<SUBSCRIPTION_ID>`, `<RESOURCE_GROUP>`, `<STORAGE_ACCOUNT>`, and `<CONTAINER>` with your actual values.

#### Source Datastore (Read-Only)

```json
{
"properties": {
"roleDefinitionId": "/subscriptions/<SUBSCRIPTION_ID>/providers/Microsoft.Authorization/roleDefinitions/2a2b9908-6ea1-4ae2-8e65-a410df84e7d1",
"principalId": "<SERVICE_PRINCIPAL_ID>",
"scope": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
}
}
```

!!! note
The role definition ID `2a2b9908-6ea1-4ae2-8e65-a410df84e7d1` corresponds to the **Storage Blob Data Reader** built-in role.

#### Enrichment Datastore (Read-Write)

```json
{
"properties": {
"roleDefinitionId": "/subscriptions/<SUBSCRIPTION_ID>/providers/Microsoft.Authorization/roleDefinitions/ba92f5b4-2d11-453d-a403-e96b0029c9fe",
"principalId": "<SERVICE_PRINCIPAL_ID>",
"scope": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
}
}
```

!!! note
The role definition ID `ba92f5b4-2d11-453d-a403-e96b0029c9fe` corresponds to the **Storage Blob Data Contributor** built-in role.

#### Assigning via Azure CLI

```bash
# Source Datastore (Read-Only)
az role assignment create \
--assignee <SERVICE_PRINCIPAL_ID> \
--role "Storage Blob Data Reader" \
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"

# Enrichment Datastore (Read-Write)
az role assignment create \
--assignee <SERVICE_PRINCIPAL_ID> \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>/blobServices/default/containers/<CONTAINER>"
```

!!! tip
You can also assign roles through the Azure Portal by navigating to the storage account or container, selecting **Access Control (IAM)**, and clicking **Add role assignment**.

### Troubleshooting Common Errors

| Error | Likely Cause | Fix |
|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| `AuthenticationFailed` | Incorrect account name, access key, or Service Principal credentials (Client ID, Client Secret, Tenant ID) | Verify credentials in the Azure Portal — check the storage account access keys or app registration |
| `AuthorizationPermissionMismatch` | The Service Principal does not have the required RBAC role on the container or storage account | Assign `Storage Blob Data Reader` (source) or `Storage Blob Data Contributor` (enrichment) to the Service Principal |
| `ContainerNotFound` | The container name in the URI does not exist | Verify the container name in the Azure Portal under the storage account's **Containers** section |
| `InvalidUri` | The URI format is incorrect — it must follow `abfss://<container>@<account>.dfs.core.windows.net` | Verify the URI format matches the expected pattern |
| `This request is not authorized to perform this operation` | The Service Principal has `Storage Blob Data Reader` but the operation requires write access | Upgrade the role assignment to `Storage Blob Data Contributor` for enrichment datastores |

### Detailed Troubleshooting Notes

#### Authentication Errors

The error `AuthenticationFailed` indicates that the credentials are incorrect or the authentication method is misconfigured.

Common causes:

- **Incorrect access key** — the storage account access key was copied incorrectly or has been rotated since the connection was created.
- **Wrong account name** — the account name does not match the storage account.
- **Expired Client Secret** — when using Service Principal authentication, the Client Secret has expired.
- **Wrong Tenant ID** — the Tenant ID does not match the Microsoft Entra ID tenant where the app is registered.

!!! note
Access keys provide the simplest authentication but grant full access to the storage account. For least-privilege access, use Service Principal authentication with RBAC role assignments scoped to the specific container.

#### Permission Errors

The error `AuthorizationPermissionMismatch` or `This request is not authorized to perform this operation` means the credentials are valid but lack the required permissions.

Common causes:

- **Missing RBAC role** — the Service Principal does not have `Storage Blob Data Reader` (source) or `Storage Blob Data Contributor` (enrichment) assigned.
- **Role assigned at wrong scope** — the role is assigned at the subscription or resource group level but not at the container level, or vice versa.
- **ACL restrictions** — when using hierarchical namespace (Data Lake Storage Gen2), POSIX ACLs may restrict access even if RBAC roles are assigned.
- **Source vs. enrichment mismatch** — the Service Principal has `Storage Blob Data Reader` but the operation requires write access (enrichment).

#### Connection Errors

The error `ContainerNotFound` or `InvalidUri` indicates a configuration issue with the URI or container name.

Common causes:

- **Container does not exist** — the container name in the URI was misspelled or the container has not been created.
- **Invalid URI format** — the URI must follow `abfss://<container>@<account>.dfs.core.windows.net`. Missing the `abfss://` scheme or using the wrong account suffix (`.blob.core.windows.net` instead of `.dfs.core.windows.net`) will cause failures.
- **Storage account firewall** — the storage account firewall blocks connections from the Qualytics IP.

!!! tip
Start by confirming credentials are valid (authentication errors), then verify RBAC role assignments (permission errors), and finally check the URI format and container existence (connection errors).

## Add a Source Datastore

A source datastore is a storage location used to connect and access data from external sources. Azure Datalake Storage is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.
Expand Down
49 changes: 49 additions & 0 deletions docs/source-datastore/add-datastores/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,55 @@ Grants read and write access for data editing and management:
| 3. | `roles/bigquery.jobUser` | Enables running of jobs such as queries and data loading. |
| 4. | `roles/bigquery.readSessionUser` | Facilitates the creation of read sessions for efficient data retrieval. |

### Troubleshooting Common Errors

| Error | Likely Cause | Fix |
|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| `Access Denied: 403` | The service account lacks the required BigQuery roles | Assign `bigquery.dataViewer` and `bigquery.jobUser` roles to the service account |
| `Not found: Dataset` | The Dataset ID in the connection form does not exist or the service account cannot see it | Verify the Dataset ID in the BigQuery Console and ensure the service account has `bigquery.dataViewer` on the dataset |
| `Not found: Project` | The Project ID is incorrect or the service account does not belong to the project | Verify the Project ID in the Google Cloud Console |
| `The caller does not have bigquery.jobs.create permission` | The service account lacks the `bigquery.jobUser` role | Assign `roles/bigquery.jobUser` to the service account at the project level |
| `Invalid service account key` | The JSON key file is malformed, truncated, or belongs to a different project | Re-download the service account key from **IAM & Admin > Service Accounts** in the Google Cloud Console |

### Detailed Troubleshooting Notes

#### Authentication Errors

The error `Invalid service account key` indicates that the JSON key file used for authentication is incorrect or corrupted.

Common causes:

- **Malformed JSON** — the key file was modified or truncated after download.
- **Wrong project** — the service account key belongs to a different Google Cloud project than the one specified in the connection form.
- **Key deleted or disabled** — the key was deleted from the service account in the Google Cloud Console.

!!! note
Service account keys do not expire, but they can be deleted or disabled by project administrators. If the key stops working, verify its status in **IAM & Admin > Service Accounts > Keys**.

#### Permission Errors

The error `Access Denied: 403` or `The caller does not have bigquery.jobs.create permission` means the service account authenticated successfully but lacks the required roles.

Common causes:

- **Missing `bigquery.jobUser`** — the service account cannot run queries without this role. It must be assigned at the project level.
- **Missing `bigquery.dataViewer`** — the service account cannot read dataset or table metadata.
- **Missing `bigquery.readSessionUser`** — the service account cannot create read sessions for efficient data retrieval via the Storage API.
- **Dataset-level vs. project-level** — some roles are assigned at the dataset level but the operation requires project-level access (e.g., `bigquery.jobUser`).

#### Connection Errors

The error `Not found: Dataset` or `Not found: Project` indicates a configuration issue with the Project ID or Dataset ID.

Common causes:

- **Wrong Project ID** — the Project ID does not match the Google Cloud project.
- **Wrong Dataset ID** — the Dataset ID was misspelled or does not exist in the specified project.
- **Regional mismatch** — the temporary dataset is in a different region than the source dataset, causing cross-region query failures.

!!! tip
Start by confirming the service account key is valid (authentication errors), then verify BigQuery roles (permission errors), and finally check the Project ID and Dataset ID (connection errors).

## Add a Source Datastore

A source datastore is a storage location used to connect to and access data from external sources. BigQuery is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.
Expand Down
Loading
Loading