diff --git a/.typos.toml b/.typos.toml index 66767c5e76..99ace14d05 100644 --- a/.typos.toml +++ b/.typos.toml @@ -6,6 +6,8 @@ MyApp = "MyApp" OpenAPIv3 = "OpenAPIv3" AKS = "AKS" IST = "IST" +CREATEIN = "CREATEIN" +ALTERIN = "ALTERIN" [files] extend-exclude = [ diff --git a/docs/source-datastore/add-datastores/amazon-s3.md b/docs/source-datastore/add-datastores/amazon-s3.md index 6c631bb53e..b6d094e9a4 100644 --- a/docs/source-datastore/add-datastores/amazon-s3.md +++ b/docs/source-datastore/add-datastores/amazon-s3.md @@ -119,6 +119,56 @@ To create a policy, follow these steps: !!! warning Currently, object-level permissions alone are insufficient to authenticate the connection. Please ensure you also include bucket-level permissions as demonstrated in the example above. +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `AccessDenied` | The IAM identity lacks one or more of the required S3 permissions | Add the missing permissions to the IAM policy and re-test the connection | +| `InvalidAccessKeyId` | The Access Key ID does not exist or has been deactivated | Verify the Access Key ID in **IAM > Users > Security credentials** or generate a new key pair | +| `SignatureDoesNotMatch` | The Secret Access Key is incorrect or was copied with extra whitespace | Re-enter the Secret Access Key carefully, ensuring no trailing spaces or newlines | +| `NoSuchBucket` | The bucket name in the URI does not exist | Verify the bucket name and ensure the URI follows the format `s3://bucket-name` | +| `AllAccessDisabled` | The bucket policy explicitly denies access or the bucket is in a different account | Check the bucket policy for explicit `Deny` statements and verify the bucket is in the correct AWS account | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `InvalidAccessKeyId` or `SignatureDoesNotMatch` indicates that the AWS credentials are incorrect or malformed. + +Common causes: + +- **Incorrect Access Key ID** — the Access Key ID was misspelled or has been deactivated in the IAM Console. +- **Incorrect Secret Access Key** — the Secret Access Key was copied with extra whitespace, a trailing newline, or was truncated. +- **Rotated credentials** — the access key pair has been rotated since the connection was created. +- **Temporary credentials** — if using STS assumed-role credentials, the session token may be missing or expired. + +!!! note + The Secret Access Key is only visible once at creation time. If you cannot verify it, generate a new access key pair from the IAM Console. + +#### Permission Errors + +The error `AccessDenied` means the IAM identity authenticated successfully but lacks the required S3 permissions. + +Common causes: + +- **Missing bucket-level permissions** — the IAM policy grants object-level permissions (`s3:GetObject`) but not bucket-level permissions (`s3:ListBucket`). Both are required. +- **Bucket policy conflict** — the bucket has a resource-based policy with an explicit `Deny` that overrides the IAM policy. +- **S3 Block Public Access** — the bucket's public access settings may block access even for authenticated IAM users if the policy references public access. +- **Wrong resource ARN** — the IAM policy specifies a different bucket or path than the one in the connection form. + +#### Connection Errors + +The error `NoSuchBucket` or `AllAccessDisabled` indicates a configuration issue with the bucket. + +Common causes: + +- **Bucket does not exist** — the bucket name was misspelled or the bucket was deleted. +- **Wrong region** — the bucket is in a different AWS region than expected, causing endpoint resolution failures. +- **Bucket in different account** — the bucket belongs to a different AWS account and cross-account access is not configured. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify IAM policy permissions (permission errors), and finally check the bucket name and region (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect and access data from external sources. Amazon S3 is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/azure-datalake-storage.md b/docs/source-datastore/add-datastores/azure-datalake-storage.md index 83261707bb..834dc0ab60 100644 --- a/docs/source-datastore/add-datastores/azure-datalake-storage.md +++ b/docs/source-datastore/add-datastores/azure-datalake-storage.md @@ -65,6 +65,160 @@ After completing the setup, you will have the following credentials: !!! tip For detailed step-by-step instructions on creating a service principal in the Azure Portal, refer to the [**Microsoft documentation**](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal){:target="_blank"}. +## Datastore Azure Datalake Storage Privileges + +The permissions required depend on the authentication method and whether you are using Azure Datalake Storage as a source or enrichment datastore. + +### Minimum Permissions (Source Datastore) + +#### Access Key Authentication + +Access keys provide full read/write access to the storage account by default. No additional role assignments are needed. + +#### Service Principal Authentication + +The Service Principal must be assigned the following Azure RBAC role on the target container or storage account: + +| Role / Permission | Purpose | +|------------------------------------------------|-------------------------------------------------------------------------| +| `Storage Blob Data Reader` | Read and list blobs (files) in the container | + +Specific permissions included in this role: + +| Permission | Purpose | +|------------------------------------------------|-------------------------------------------------------------------------| +| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` | Read blob (file) contents for profiling and scanning | +| `Microsoft.Storage/storageAccounts/blobServices/containers/read` | List blobs in the container to discover data assets | + +### Additional Permissions for Enrichment Datastore + +#### Access Key Authentication + +Access keys provide full read/write access by default. No additional role assignments are needed. + +#### Service Principal Authentication + +For enrichment, the Service Principal must be assigned a higher-privilege role: + +| Role / Permission | Purpose | +|------------------------------------------------|-------------------------------------------------------------------------| +| `Storage Blob Data Contributor` | Read, write, and delete blobs in the container | + +Specific permissions included in this role: + +| Permission | Purpose | +|------------------------------------------------|-------------------------------------------------------------------------| +| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` | Read blob contents | +| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write` | Write enrichment result files | +| `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete` | Remove temporary or outdated enrichment files | +| `Microsoft.Storage/storageAccounts/blobServices/containers/read` | List blobs in the container | + +!!! note + If the storage account uses **hierarchical namespace** (Azure Data Lake Storage Gen2), ensure the Service Principal also has appropriate ACL permissions at the directory level if RBAC alone is not sufficient. + +### Example IAM Role Assignment + +Replace ``, ``, ``, ``, and `` with your actual values. + +#### Source Datastore (Read-Only) + +```json +{ + "properties": { + "roleDefinitionId": "/subscriptions//providers/Microsoft.Authorization/roleDefinitions/2a2b9908-6ea1-4ae2-8e65-a410df84e7d1", + "principalId": "", + "scope": "/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts//blobServices/default/containers/" + } +} +``` + +!!! note + The role definition ID `2a2b9908-6ea1-4ae2-8e65-a410df84e7d1` corresponds to the **Storage Blob Data Reader** built-in role. + +#### Enrichment Datastore (Read-Write) + +```json +{ + "properties": { + "roleDefinitionId": "/subscriptions//providers/Microsoft.Authorization/roleDefinitions/ba92f5b4-2d11-453d-a403-e96b0029c9fe", + "principalId": "", + "scope": "/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts//blobServices/default/containers/" + } +} +``` + +!!! note + The role definition ID `ba92f5b4-2d11-453d-a403-e96b0029c9fe` corresponds to the **Storage Blob Data Contributor** built-in role. + +#### Assigning via Azure CLI + +```bash +# Source Datastore (Read-Only) +az role assignment create \ + --assignee \ + --role "Storage Blob Data Reader" \ + --scope "/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts//blobServices/default/containers/" + +# Enrichment Datastore (Read-Write) +az role assignment create \ + --assignee \ + --role "Storage Blob Data Contributor" \ + --scope "/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts//blobServices/default/containers/" +``` + +!!! tip + You can also assign roles through the Azure Portal by navigating to the storage account or container, selecting **Access Control (IAM)**, and clicking **Add role assignment**. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `AuthenticationFailed` | Incorrect account name, access key, or Service Principal credentials (Client ID, Client Secret, Tenant ID) | Verify credentials in the Azure Portal — check the storage account access keys or app registration | +| `AuthorizationPermissionMismatch` | The Service Principal does not have the required RBAC role on the container or storage account | Assign `Storage Blob Data Reader` (source) or `Storage Blob Data Contributor` (enrichment) to the Service Principal | +| `ContainerNotFound` | The container name in the URI does not exist | Verify the container name in the Azure Portal under the storage account's **Containers** section | +| `InvalidUri` | The URI format is incorrect — it must follow `abfss://@.dfs.core.windows.net` | Verify the URI format matches the expected pattern | +| `This request is not authorized to perform this operation` | The Service Principal has `Storage Blob Data Reader` but the operation requires write access | Upgrade the role assignment to `Storage Blob Data Contributor` for enrichment datastores | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `AuthenticationFailed` indicates that the credentials are incorrect or the authentication method is misconfigured. + +Common causes: + +- **Incorrect access key** — the storage account access key was copied incorrectly or has been rotated since the connection was created. +- **Wrong account name** — the account name does not match the storage account. +- **Expired Client Secret** — when using Service Principal authentication, the Client Secret has expired. +- **Wrong Tenant ID** — the Tenant ID does not match the Microsoft Entra ID tenant where the app is registered. + +!!! note + Access keys provide the simplest authentication but grant full access to the storage account. For least-privilege access, use Service Principal authentication with RBAC role assignments scoped to the specific container. + +#### Permission Errors + +The error `AuthorizationPermissionMismatch` or `This request is not authorized to perform this operation` means the credentials are valid but lack the required permissions. + +Common causes: + +- **Missing RBAC role** — the Service Principal does not have `Storage Blob Data Reader` (source) or `Storage Blob Data Contributor` (enrichment) assigned. +- **Role assigned at wrong scope** — the role is assigned at the subscription or resource group level but not at the container level, or vice versa. +- **ACL restrictions** — when using hierarchical namespace (Data Lake Storage Gen2), POSIX ACLs may restrict access even if RBAC roles are assigned. +- **Source vs. enrichment mismatch** — the Service Principal has `Storage Blob Data Reader` but the operation requires write access (enrichment). + +#### Connection Errors + +The error `ContainerNotFound` or `InvalidUri` indicates a configuration issue with the URI or container name. + +Common causes: + +- **Container does not exist** — the container name in the URI was misspelled or the container has not been created. +- **Invalid URI format** — the URI must follow `abfss://@.dfs.core.windows.net`. Missing the `abfss://` scheme or using the wrong account suffix (`.blob.core.windows.net` instead of `.dfs.core.windows.net`) will cause failures. +- **Storage account firewall** — the storage account firewall blocks connections from the Qualytics IP. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify RBAC role assignments (permission errors), and finally check the URI format and container existence (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect and access data from external sources. Azure Datalake Storage is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/bigquery.md b/docs/source-datastore/add-datastores/bigquery.md index a5bb8020e6..472dbfd106 100644 --- a/docs/source-datastore/add-datastores/bigquery.md +++ b/docs/source-datastore/add-datastores/bigquery.md @@ -90,6 +90,55 @@ Grants read and write access for data editing and management: | 3. | `roles/bigquery.jobUser` | Enables running of jobs such as queries and data loading. | | 4. | `roles/bigquery.readSessionUser` | Facilitates the creation of read sessions for efficient data retrieval. | +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Access Denied: 403` | The service account lacks the required BigQuery roles | Assign `bigquery.dataViewer` and `bigquery.jobUser` roles to the service account | +| `Not found: Dataset` | The Dataset ID in the connection form does not exist or the service account cannot see it | Verify the Dataset ID in the BigQuery Console and ensure the service account has `bigquery.dataViewer` on the dataset | +| `Not found: Project` | The Project ID is incorrect or the service account does not belong to the project | Verify the Project ID in the Google Cloud Console | +| `The caller does not have bigquery.jobs.create permission` | The service account lacks the `bigquery.jobUser` role | Assign `roles/bigquery.jobUser` to the service account at the project level | +| `Invalid service account key` | The JSON key file is malformed, truncated, or belongs to a different project | Re-download the service account key from **IAM & Admin > Service Accounts** in the Google Cloud Console | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Invalid service account key` indicates that the JSON key file used for authentication is incorrect or corrupted. + +Common causes: + +- **Malformed JSON** — the key file was modified or truncated after download. +- **Wrong project** — the service account key belongs to a different Google Cloud project than the one specified in the connection form. +- **Key deleted or disabled** — the key was deleted from the service account in the Google Cloud Console. + +!!! note + Service account keys do not expire, but they can be deleted or disabled by project administrators. If the key stops working, verify its status in **IAM & Admin > Service Accounts > Keys**. + +#### Permission Errors + +The error `Access Denied: 403` or `The caller does not have bigquery.jobs.create permission` means the service account authenticated successfully but lacks the required roles. + +Common causes: + +- **Missing `bigquery.jobUser`** — the service account cannot run queries without this role. It must be assigned at the project level. +- **Missing `bigquery.dataViewer`** — the service account cannot read dataset or table metadata. +- **Missing `bigquery.readSessionUser`** — the service account cannot create read sessions for efficient data retrieval via the Storage API. +- **Dataset-level vs. project-level** — some roles are assigned at the dataset level but the operation requires project-level access (e.g., `bigquery.jobUser`). + +#### Connection Errors + +The error `Not found: Dataset` or `Not found: Project` indicates a configuration issue with the Project ID or Dataset ID. + +Common causes: + +- **Wrong Project ID** — the Project ID does not match the Google Cloud project. +- **Wrong Dataset ID** — the Dataset ID was misspelled or does not exist in the specified project. +- **Regional mismatch** — the temporary dataset is in a different region than the source dataset, causing cross-region query failures. + +!!! tip + Start by confirming the service account key is valid (authentication errors), then verify BigQuery roles (permission errors), and finally check the Project ID and Dataset ID (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. BigQuery is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/databricks.md b/docs/source-datastore/add-datastores/databricks.md index 3f3d07713e..0bbf1ccf03 100644 --- a/docs/source-datastore/add-datastores/databricks.md +++ b/docs/source-datastore/add-datastores/databricks.md @@ -64,6 +64,105 @@ To improve the performance of all-purpose compute using node pools, you can foll ![attach-compute-with-node-pool](../../assets/source-datastores/add-datastores/databricks/attach-compute-with-node-pool.png) +### Databricks Privileges and Permissions + +Qualytics connects to Databricks via JDBC using either a **Personal Access Token (PAT)** or **OAuth M2M** authentication. It uses the Hive metastore or Unity Catalog metadata APIs to discover catalogs, schemas, tables, columns, and partition information. + +#### Minimum Databricks Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------------|-------------------------------------------------------------------------| +| `USAGE ON CATALOG ` | Access the catalog | +| `USAGE ON SCHEMA .` | Access objects within the schema | +| `SELECT ON SCHEMA .` | Read data from all tables and views for profiling and scanning | +| `CAN USE` on the SQL Warehouse or Cluster | Execute queries on the compute resource | + +#### Additional Permissions for Enrichment Datastore + +When using Databricks as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|-----------------------------------------------------------------|-----------------------------------------------------------------| +| `CREATE TABLE ON SCHEMA .` | Create enrichment tables (`_qualytics_*`) | +| `MODIFY ON SCHEMA .` | Write, update, and delete data in enrichment tables | + +#### Example: Source Datastore Permissions (Read-Only) + +Replace ``, ``, and `` with your actual values. + +```sql +-- Grant catalog and schema access +GRANT USAGE ON CATALOG TO ; +GRANT USAGE ON SCHEMA . TO ; + +-- Grant read access to all tables in the schema +GRANT SELECT ON SCHEMA . TO ; +``` + +#### Example: Enrichment Datastore Permissions (Read-Write) + +```sql +-- Grant catalog and schema access +GRANT USAGE ON CATALOG TO ; +GRANT USAGE ON SCHEMA . TO ; + +-- Grant read-write access and table creation +GRANT SELECT, MODIFY ON SCHEMA . TO ; +GRANT CREATE TABLE ON SCHEMA . TO ; +``` + +!!! note + The Databricks user or service principal must also have `CAN USE` permission on the **compute resource** (SQL Warehouse or All-Purpose Cluster) used for the connection. Without compute access, queries cannot be executed regardless of data permissions. + +#### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `INVALID_CREDENTIALS` | The Personal Access Token (PAT) is invalid, expired, or revoked | Generate a new PAT from **User Settings > Developer > Access Tokens** in Databricks | +| `User does not have USAGE permission` | The user lacks `USAGE` on the catalog or schema | Run `GRANT USAGE ON CATALOG TO ` and `GRANT USAGE ON SCHEMA . TO ` | +| `User does not have SELECT permission` | The user lacks `SELECT` on the schema or specific tables | Run `GRANT SELECT ON SCHEMA . TO ` | +| `Cluster is not running` | The All-Purpose Cluster is stopped and `AUTO_RESUME` is not enabled | Start the cluster manually or switch to a SQL Warehouse with auto-resume | +| `HTTP Path is invalid` | The HTTP Path in the connection form does not match the warehouse or cluster | Copy the correct HTTP Path from **SQL Warehouses > Connection Details** or **Compute > JDBC/ODBC** | + +#### Detailed Troubleshooting Notes + +##### Authentication Errors + +The error `INVALID_CREDENTIALS` indicates that the Personal Access Token (PAT) or OAuth credentials are invalid. + +Common causes: + +- **Expired PAT** — Personal Access Tokens have a configurable expiration date. The token may have expired since the connection was created. +- **Revoked PAT** — the token was manually revoked from the Databricks workspace. +- **Wrong workspace** — the PAT was generated in a different Databricks workspace than the one being connected to. +- **OAuth secret expired** — when using OAuth M2M, the Service Principal's secret has expired. + +!!! note + Databricks PATs are workspace-scoped. A token generated in workspace A cannot be used to connect to workspace B. + +##### Permission Errors + +The error `User does not have USAGE permission` or `User does not have SELECT permission` means the user authenticated successfully but lacks Unity Catalog grants. + +Common causes: + +- **Missing `USAGE` on catalog** — the user cannot access the catalog. This is the most common issue — `USAGE` must be granted on both the catalog and the schema. +- **Missing `SELECT` on schema** — the user has catalog access but cannot read tables in the specific schema. +- **Unity Catalog not enabled** — the workspace uses the legacy Hive metastore instead of Unity Catalog, and permissions are managed differently. + +##### Compute Errors + +The error `Cluster is not running` or `HTTP Path is invalid` indicates a problem with the compute resource, not with data permissions. + +Common causes: + +- **Cluster stopped** — All-Purpose Clusters do not auto-resume by default. The cluster must be started manually or switched to a SQL Warehouse with auto-resume. +- **Wrong HTTP Path** — the HTTP Path was copied incorrectly or the warehouse/cluster has been recreated with a new path. +- **No compute access** — the user lacks `CAN USE` permission on the SQL Warehouse or cluster. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify Unity Catalog permissions (permission errors), and finally check compute resource availability (compute errors). + ### Retrieve the Connection Details This section explains how to retrieve the connection details that you need to connect to Databricks. @@ -80,6 +179,9 @@ To configure Databricks, you need the following credentials: | 4. | Database (Required) | Specify the database name to be accessed. | | 5. | Personal Access Token (Required) | Generate a Personal Access Token from your Databricks account and add it for authentication.| +!!! note + Databricks also supports **OAuth M2M (Machine-to-Machine)** authentication as an alternative to Personal Access Tokens. To use OAuth M2M, provide the **Service Principal Application ID** and **OAuth Secret** instead of a Personal Access Token. This method is recommended for production environments as it does not depend on individual user tokens. + #### Get Connection Details for the SQL Warehouse Follow the given steps to get the connection details for the SQL warehouse: diff --git a/docs/source-datastore/add-datastores/db2.md b/docs/source-datastore/add-datastores/db2.md index 92b514a0aa..235f688e30 100644 --- a/docs/source-datastore/add-datastores/db2.md +++ b/docs/source-datastore/add-datastores/db2.md @@ -8,6 +8,118 @@ By following these instructions, enterprises can ensure their DB2 environment is Let’s get started 🚀 +## DB2 Setup Guide + +Qualytics connects to DB2 through the **IBM DB2 JDBC driver**. It queries DB2 system catalogs (`SYSCAT.SCHEMATA`, `SYSCAT.TABLES`) to discover schemas and uses standard JDBC metadata APIs for tables, columns, and primary keys. + +### Minimum DB2 Permissions (Source Datastore) + +| Permission | Purpose | +|-------------------------------------------|-----------------------------------------------------------------------------| +| `CONNECT ON DATABASE` | Allow the user to connect to the database | +| `USAGE ON SCHEMA ` | Access objects within the target schema | +| `SELECT ON ALL TABLES IN SCHEMA` | Read data from all tables for profiling and scanning | +| `SELECT ON SYSCAT.SCHEMATA` | Discover available schemas in the database | +| `SELECT ON SYSCAT.TABLES` | Discover available tables and filter empty schemas | + +### Additional Permissions for Enrichment Datastore + +When using DB2 as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|-------------------------------------------|-----------------------------------------------------------------------------| +| `CREATETAB ON DATABASE` | Create enrichment tables (`_qualytics_*`) | +| `CREATEIN ON SCHEMA ` | Create new objects within the schema | +| `ALTERIN ON SCHEMA ` | Modify enrichment table schemas during version migrations | +| `INSERT ON ALL TABLES IN SCHEMA` | Write anomaly records, scan results, and check metrics | +| `UPDATE ON ALL TABLES IN SCHEMA` | Update enrichment records during rescans | +| `DELETE ON ALL TABLES IN SCHEMA` | Remove stale enrichment records | +| `DROPIN ON SCHEMA ` | Remove enrichment tables during cleanup or when the datastore is unlinked | + +### Example: Source Datastore User (Read-Only) + +Replace `` with your actual value. + +```sql +-- Grant connection and schema access +GRANT CONNECT ON DATABASE TO USER qualytics_read; +GRANT USAGE ON SCHEMA TO USER qualytics_read; + +-- Grant read access to all tables in the schema +GRANT SELECT ON ALL TABLES IN SCHEMA TO USER qualytics_read; +``` + +### Example: Enrichment Datastore User (Read-Write) + +```sql +-- Grant connection and schema access +GRANT CONNECT ON DATABASE TO USER qualytics_readwrite; +GRANT USAGE ON SCHEMA TO USER qualytics_readwrite; + +-- Grant full data manipulation on all tables +GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA TO USER qualytics_readwrite; + +-- Grant table creation and schema modification +GRANT CREATETAB ON DATABASE TO USER qualytics_readwrite; +GRANT CREATEIN ON SCHEMA TO USER qualytics_readwrite; +GRANT ALTERIN ON SCHEMA TO USER qualytics_readwrite; +``` + +!!! note + Qualytics queries DB2 system catalogs (`SYSCAT.SCHEMATA`, `SYSCAT.TABLES`) during catalog discovery. Ensure the Qualytics user has `SELECT` access to these system catalog views. + +!!! info + If your DB2 server requires encrypted connections, enable the **SSL** toggle in the connection form. This establishes a TLS-encrypted connection between Qualytics and the DB2 instance. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `SQL30082N: Security processing failed` | Incorrect username or password | Verify the credentials and ensure the user exists in the DB2 instance | +| `SQL1060N: User does not have the CONNECT privilege` | The user lacks `CONNECT` on the database | Run `GRANT CONNECT ON DATABASE TO USER ` | +| `SQL0551N: User does not have the required authorization` | The user lacks `SELECT` or other required permissions on a table | Grant the missing permission on the specific table or schema | +| `SQL0204N: Name is an undefined name` | The schema or table does not exist, or the user cannot see it | Verify the schema name matches exactly (DB2 stores unquoted schema names in uppercase by default) | +| `SQL0552N: User is not authorized to perform the requested command` | The enrichment user lacks `CREATETAB` or `CREATEIN` | Run `GRANT CREATETAB ON DATABASE TO USER ` and `GRANT CREATEIN ON SCHEMA TO USER ` | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `SQL30082N: Security processing failed` indicates that the credentials are incorrect. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the user. +- **User does not exist** — the username was misspelled or does not exist in the DB2 instance. +- **Authentication plugin mismatch** — the DB2 server uses a different authentication mechanism than expected (e.g., Kerberos, LDAP). + +!!! note + DB2 authentication is handled at the operating system or LDAP level, not within the database itself. Ensure the credentials match the OS or LDAP user account. + +#### Permission Errors + +The error `SQL0551N: User does not have the required authorization` means the user authenticated successfully but lacks the necessary grants. + +Common causes: + +- **Missing `SELECT` on tables** — the user does not have `SELECT` on the target tables in the schema. +- **Missing `CONNECT` on database** — the user cannot connect to the database. +- **Missing `USAGE` on schema** — the user cannot access objects within the schema. +- **System catalog access** — the user lacks `SELECT` access to `SYSCAT.SCHEMATA` or `SYSCAT.TABLES` needed for catalog discovery. + +#### Connection Errors + +The error `SQL1060N: User does not have the CONNECT privilege` means the user lacks the `CONNECT` privilege on the database. + +Common causes: + +- **Missing `CONNECT` grant** — `GRANT CONNECT ON DATABASE TO USER ` was not executed. +- **Database not cataloged** — the target database is not cataloged on the DB2 client. +- **Network issues** — a firewall is blocking connections on the DB2 port (default 50000). + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify schema/table permissions (permission errors), and finally check database connectivity (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. DB2 is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/dremio.md b/docs/source-datastore/add-datastores/dremio.md index b168095460..fea4e78b86 100644 --- a/docs/source-datastore/add-datastores/dremio.md +++ b/docs/source-datastore/add-datastores/dremio.md @@ -8,6 +8,92 @@ By following these instructions, enterprises can ensure their Dremio environment Let’s get started 🚀 +## Dremio Setup Guide + +Qualytics connects to Dremio through the **Dremio JDBC driver** (Arrow Flight SQL, default port 32010). It uses standard SQL queries for data profiling and scanning. Dremio permissions are managed through its own privilege model, which controls access to sources, spaces, and datasets. + +### Minimum Dremio Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `SELECT` on target datasets | Read data from tables/views for profiling and scanning | +| `VIEW` on the source | Browse available schemas and datasets | +| Access to the Dremio project (Cloud only) | Scope queries to the correct project in Dremio Cloud | + +!!! note + Qualytics does not support Dremio as an enrichment datastore. You can point to a different enrichment datastore instead. + +### Authentication Methods + +Dremio supports two authentication methods: + +| Method | Configuration | +|---------------------------------|--------------------------------------------------------------------------| +| **Basic (Username & Password)** | Use the Dremio username and password directly | +| **Personal Access Token (PAT)** | Generate a PAT from **Account Settings > Personal Access Tokens** in Dremio and use it as the authentication credential | + +### Example: Granting Permissions in Dremio + +In the Dremio UI, navigate to the target source or dataset and grant `SELECT` access to the Qualytics user: + +```sql +GRANT SELECT ON TABLE .. TO USER qualytics_read; +-- Or grant access to all datasets in a schema: +GRANT SELECT ON SCHEMA . TO USER qualytics_read; +``` + +!!! tip + If using **Dremio Cloud**, ensure the Project ID is included in the connection form. You can find the Project ID in the Dremio Cloud project settings. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Authentication failed` | Incorrect username/password or invalid PAT | Verify credentials or generate a new PAT from Dremio account settings | +| `Source not found` | The schema (source) name in the connection form does not match a configured Dremio source | Verify available sources in the Dremio UI or with `SHOW SCHEMAS` | +| `Permission denied on dataset` | The user lacks `SELECT` on the target dataset | Grant `SELECT` permission on the dataset in Dremio's privilege settings | +| `Connection refused` | The Dremio coordinator is not reachable or the port (default 32010) is incorrect | Verify the host, port, and that the Dremio coordinator is running | +| `Project ID is required` (Cloud only) | The Dremio Cloud project ID was not provided in the connection form | Add the Project ID from your Dremio Cloud project settings | +| `SSL handshake failed` | SSL is required but not configured, or the SSL certificate is not trusted | Enable SSL in the connection parameters or add the Dremio certificate to the trust store | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Authentication failed` indicates that the credentials are incorrect. + +Common causes: + +- **Incorrect password** — the password does not match the Dremio user account. +- **Invalid PAT** — the Personal Access Token is expired, revoked, or was generated in a different Dremio instance. +- **PAT permissions** — the PAT does not have sufficient permissions for the requested operation. + +!!! note + Dremio Personal Access Tokens are user-scoped. Ensure the PAT belongs to a user with the required permissions on the target datasets. + +#### Permission Errors + +The error `Permission denied on dataset` means the user authenticated successfully but lacks access to the target dataset. + +Common causes: + +- **Missing `SELECT` on dataset** — the user does not have `SELECT` permission on the specific dataset. +- **Source not shared** — in Dremio, datasets derived from a source must be explicitly shared with users. +- **Project scope** (Cloud only) — the dataset exists in a different project than the one configured in the connection. + +#### Connection Errors + +The error `Connection refused` or `SSL handshake failed` indicates a connectivity issue. + +Common causes: + +- **Coordinator not reachable** — the Dremio coordinator host or port (default 32010) is incorrect. +- **SSL not configured** — the Dremio server requires SSL but the connection is not configured for it. +- **Project ID missing** (Cloud only) — Dremio Cloud requires a Project ID to scope queries. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify dataset permissions (permission errors), and finally check coordinator connectivity and SSL configuration (connection errors). + ## Add the Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Dremio is an example of such a datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the Dremio datastore allows the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/fabric-analytics.md b/docs/source-datastore/add-datastores/fabric-analytics.md index e31c2d7959..3bab2bb3bb 100644 --- a/docs/source-datastore/add-datastores/fabric-analytics.md +++ b/docs/source-datastore/add-datastores/fabric-analytics.md @@ -100,6 +100,86 @@ Before configuring the Fabric Analytics datastore in Qualytics, ensure the follo ![azure-add-service-principal-to-workspace](../../assets/source-datastores/add-datastores/fabric/azure-add-service-principal-to-workspace.png) +### Datastore Fabric Analytics Privileges + +Qualytics connects to Fabric Analytics through the **Microsoft JDBC driver for SQL Server** using the SQL analytics endpoint. It queries system views (`sys.schemas`, `sys.database_principals`) to discover schemas and uses standard JDBC metadata APIs for tables, columns, and primary keys. + +#### Minimum Fabric Analytics Permissions (Source Datastore) + +| Permission | Purpose | +|---------------------------------------------------------------|-------------------------------------------------------------------------| +| **Contributor** role (or higher) on the Fabric workspace | Access the workspace and its Lakehouse/Warehouse resources | +| `SELECT` on target tables/views | Read data from tables for profiling and scanning | +| `SELECT ON sys.schemas` | Discover available schemas in the database | +| `SELECT ON sys.database_principals` | Resolve schema ownership for catalog discovery | +| **Service principals can use Fabric APIs** (tenant setting) | Allow the Service Principal to authenticate via the SQL analytics endpoint | + +!!! note + Qualytics does not support Fabric Analytics as an enrichment datastore. You can point to a different enrichment datastore instead. + +#### Example: Verifying Service Principal Access + +After configuring the Service Principal, you can verify access using the Azure CLI: + +```bash +# Verify the app registration exists +az ad app show --id + +# Verify the Service Principal has the Contributor role on the Fabric workspace +az role assignment list --assignee --scope +``` + +!!! tip + You can also verify the connection by using the SQL analytics endpoint directly with a SQL client tool (e.g., Azure Data Studio) before configuring it in Qualytics. + +#### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Login failed for user` | The Service Principal credentials (Client ID, Client Secret, Tenant ID) are incorrect or the Client Secret has expired | Verify the credentials in the Azure Portal app registration and regenerate the Client Secret if needed | +| `Cannot open database requested by the login` | The Service Principal does not have access to the Fabric workspace | Add the Service Principal as a **Contributor** (or higher) to the target Fabric workspace | +| `Service principals are not allowed` | The Fabric tenant setting **"Service principals can use Fabric APIs"** is disabled | Enable the setting in the Fabric Admin Portal under **Tenant settings** | +| `The SELECT permission was denied on object` | The Service Principal lacks `SELECT` on one or more tables | Verify the workspace role grants sufficient read access to the Lakehouse/Warehouse | +| `SQL analytics endpoint is not available` | The SQL analytics endpoint is not enabled for the Lakehouse/Warehouse | Ensure the SQL analytics endpoint is enabled in the Fabric workspace settings | + +#### Detailed Troubleshooting Notes + +##### Authentication Errors + +The error `Login failed for user` indicates that the Service Principal credentials are incorrect or expired. + +Common causes: + +- **Expired Client Secret** — the Client Secret has a configurable expiration date. It may have expired since the connection was created. +- **Wrong Tenant ID** — the Tenant ID does not match the Microsoft Entra ID tenant where the app is registered. +- **Wrong Client ID** — the Client ID (Application ID) does not match the app registration. + +!!! note + Service Principal Client Secrets have an expiration date configured during creation. Set a reminder to rotate the secret before it expires to avoid connection failures. + +##### Permission Errors + +The error `The SELECT permission was denied on object` or `Service principals are not allowed` means the Service Principal authenticated successfully but lacks the necessary access. + +Common causes: + +- **Workspace role insufficient** — the Service Principal needs at least the **Contributor** role on the Fabric workspace. +- **Tenant setting disabled** — the **"Service principals can use Fabric APIs"** setting is not enabled in the Fabric Admin Portal. +- **Security group restriction** — the tenant setting is enabled but restricted to a specific security group that does not include the Service Principal. + +##### Connection Errors + +The error `SQL analytics endpoint is not available` or `Cannot open database requested by the login` indicates a configuration issue. + +Common causes: + +- **SQL endpoint not enabled** — the SQL analytics endpoint is not enabled for the Lakehouse or Warehouse. +- **Wrong endpoint** — the SQL analytics endpoint URL does not match the target Lakehouse/Warehouse. +- **Fabric capacity paused** — the Fabric capacity is paused, making the SQL endpoint unavailable. + +!!! tip + Start by confirming Service Principal credentials are valid (authentication errors), then verify workspace roles and tenant settings (permission errors), and finally check the SQL analytics endpoint availability (connection errors). + ### Retrieve the SQL Analytics Endpoint **1.** Open your **Lakehouse** or **Warehouse** in the Fabric workspace and copy the **SQL analytics endpoint** from the connection string area. diff --git a/docs/source-datastore/add-datastores/google-cloud-storage.md b/docs/source-datastore/add-datastores/google-cloud-storage.md index d94ff07427..900b6f078a 100644 --- a/docs/source-datastore/add-datastores/google-cloud-storage.md +++ b/docs/source-datastore/add-datastores/google-cloud-storage.md @@ -57,8 +57,147 @@ For example, once you generate the keys, they might look like this: - Secret Key: `abcd1234efgh5678ijklmnopqrstuvwx` -!!! warning - Make sure to store these keys securely, as they provide access to your Google Cloud Storage resources. +!!! warning + Make sure to store these keys securely, as they provide access to your Google Cloud Storage resources. + +## Datastore Google Cloud Storage Privileges + +The permissions required depend on whether you are using Google Cloud Storage as a source or enrichment datastore. Qualytics accesses GCS using HMAC keys (Access Key / Secret Key) or a Service Account Key. + +### Minimum Permissions (Source Datastore) + +The service account or HMAC key must have the following permissions: + +| Permission | Purpose | +|---------------------------|-------------------------------------------------------------------------| +| `storage.buckets.get` | Validate the bucket exists and retrieve its metadata | +| `storage.objects.get` | Read file contents for profiling and scanning | +| `storage.objects.list` | List files in the bucket to discover data assets | + +!!! tip + You can grant these permissions by assigning the **Storage Object Viewer** (`roles/storage.objectViewer`) role to the service account on the target bucket. + +### Additional Permissions for Enrichment Datastore + +When using Google Cloud Storage as an enrichment datastore, the following additional permissions are required: + +| Permission | Purpose | +|---------------------------|-------------------------------------------------------------------------| +| `storage.objects.create` | Write enrichment result files | +| `storage.objects.delete` | Remove temporary or outdated enrichment files | + +!!! tip + You can grant all required permissions (read + write) by assigning the **Storage Object Admin** (`roles/storage.objectAdmin`) role to the service account on the target bucket. + +### Example IAM Policy + +Replace `` and `` with your actual values. + +#### Source Datastore (Read-Only) + +```json +{ + "bindings": [ + { + "role": "roles/storage.objectViewer", + "members": [ + "serviceAccount:" + ] + } + ] +} +``` + +#### Enrichment Datastore (Read-Write) + +```json +{ + "bindings": [ + { + "role": "roles/storage.objectAdmin", + "members": [ + "serviceAccount:" + ] + } + ] +} +``` + +!!! tip + If you need both `storage.buckets.get` and object-level permissions but want to avoid a broader role, you can create a custom role with only the specific permissions listed in the [Minimum Permissions](#minimum-permissions-source-datastore) section. + +#### Assigning via gcloud CLI + +```bash +# Source Datastore (Read-Only) +gsutil iam ch \ + serviceAccount::roles/storage.objectViewer \ + gs:// + +# Enrichment Datastore (Read-Write) +gsutil iam ch \ + serviceAccount::roles/storage.objectAdmin \ + gs:// +``` + +!!! tip + You can also assign roles through the Google Cloud Console by navigating to the bucket, selecting **Permissions**, and clicking **Grant Access**. + +### GCS Roles Summary + +| Role | Use Case | Permissions Included | +|-----------------------------------------------|---------------------------|-----------------------------------------------------------------------| +| `roles/storage.objectViewer` | Source Datastore | `storage.objects.get`, `storage.objects.list`, `storage.buckets.get` | +| `roles/storage.objectAdmin` | Enrichment Datastore | `storage.objects.get`, `storage.objects.list`, `storage.objects.create`, `storage.objects.delete`, `storage.buckets.get` | + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `403 Forbidden` | The service account or HMAC key lacks the required permissions on the bucket | Assign the appropriate role (`Storage Object Viewer` or `Storage Object Admin`) to the service account on the target bucket | +| `404 Not Found: Bucket not found` | The bucket name in the URI is incorrect or the bucket does not exist | Verify the bucket name and ensure the URI follows the format `gs://bucket-name` | +| `Invalid credentials` | The Access Key / Secret Key pair is incorrect or the service account key file is malformed | Regenerate the HMAC keys from **Cloud Storage > Settings > Interoperability** or re-download the service account key | +| `The caller does not have storage.objects.list access` | The service account has object-level access but lacks bucket-level `list` permission | Assign the `Storage Object Viewer` role at the bucket level (not just object level) | +| `The caller does not have storage.objects.create access` | The enrichment service account lacks write permissions | Upgrade the role assignment from `Storage Object Viewer` to `Storage Object Admin` | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Invalid credentials` indicates that the HMAC keys or service account key are incorrect or malformed. + +Common causes: + +- **Incorrect Access Key / Secret Key** — the HMAC key pair was copied incorrectly or has been deleted. +- **Malformed service account key** — the JSON key file is corrupted, truncated, or belongs to a different project. +- **Service account disabled** — the service account has been disabled in the Google Cloud Console. + +!!! note + HMAC keys are tied to a specific service account. If the service account is deleted or disabled, the HMAC keys will stop working even if they have not been explicitly revoked. + +#### Permission Errors + +The error `403 Forbidden` or `The caller does not have storage.objects.list access` means the credentials are valid but lack the required IAM permissions. + +Common causes: + +- **Missing IAM role** — the service account does not have `Storage Object Viewer` (source) or `Storage Object Admin` (enrichment) assigned on the target bucket. +- **Role assigned at wrong level** — the role is assigned at the project level but a bucket-level policy overrides it. +- **Uniform bucket-level access** — if the bucket uses uniform bucket-level access (recommended), ensure IAM policies are set at the bucket level, not through ACLs. +- **Source vs. enrichment mismatch** — the service account has `Storage Object Viewer` but the operation requires write access (enrichment). + +#### Connection Errors + +The error `404 Not Found: Bucket not found` indicates a configuration issue with the bucket name or URI. + +Common causes: + +- **Bucket does not exist** — the bucket name was misspelled or the bucket has been deleted. +- **Wrong project** — the service account belongs to a different Google Cloud project than the bucket. +- **Invalid URI format** — the URI must follow `gs://bucket-name`. Extra path segments or incorrect formatting will cause failures. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify IAM role assignments (permission errors), and finally check the bucket name and URI format (connection errors). ## Add a Source Datastore diff --git a/docs/source-datastore/add-datastores/hive.md b/docs/source-datastore/add-datastores/hive.md index 141a4d70ca..0b3db753f5 100644 --- a/docs/source-datastore/add-datastores/hive.md +++ b/docs/source-datastore/add-datastores/hive.md @@ -8,6 +8,85 @@ By following these instructions, enterprises can ensure their Hive environment i Let’s get started 🚀 +## Hive Setup Guide + +Qualytics connects to Hive through the **Hive JDBC driver** (HiveServer2). It accesses the Hive metastore for schema and table discovery, and reads table data via HiveQL `SELECT` queries for profiling and scanning operations. Qualytics also identifies partition columns from Hive metastore metadata for optimized data reading. + +### Minimum Hive Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `SELECT ON DATABASE ` | Access the database and its metadata | +| `SELECT ON TABLE .*` | Read data from all tables for profiling and scanning | + +!!! note + Qualytics does not support Hive as an enrichment datastore. You can point to a different enrichment datastore instead. + +### Example: Source Datastore User (Read-Only) + +Replace `` with your actual value. + +```sql +-- Grant read access to the target database and all its tables +GRANT SELECT ON DATABASE TO USER qualytics_read; +GRANT SELECT ON ALL TABLES IN DATABASE TO USER qualytics_read; +``` + +!!! note + If using **Kerberos authentication**, ensure the Kerberos principal has been granted the same `SELECT` privileges on the target database. Configure the Kerberos principal in the connection form instead of username/password. + +!!! info + If your Hive environment uses **ZooKeeper** for HiveServer2 high availability (HA), enable the **ZooKeeper HA** toggle in the connection form. This allows Qualytics to discover and connect to available HiveServer2 instances automatically through the ZooKeeper quorum. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `User is not allowed to impersonate` | The HiveServer2 proxy user configuration does not allow the Qualytics user | Add the Qualytics user to the `hadoop.proxyuser..users` property in `core-site.xml` | +| `Permission denied: user does not have SELECT privilege` | The user lacks `SELECT` on the target database or table | Run `GRANT SELECT ON DATABASE TO USER ` | +| `GSS initiate failed` (Kerberos) | Kerberos ticket is expired, the principal is incorrect, or the KDC is unreachable | Verify the Kerberos principal, ensure the keytab is valid, and check KDC connectivity | +| `Could not open client transport` | HiveServer2 is not reachable or the port (default 10000) is incorrect | Verify the host, port, and that HiveServer2 is running | +| `Database does not exist` | The database name (schema) in the connection form is incorrect | Verify the database name with `SHOW DATABASES` in HiveQL | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `GSS initiate failed` (Kerberos) or `Could not open client transport` indicates an authentication or transport problem. + +Common causes: + +- **Kerberos ticket expired** — the Kerberos ticket has expired and needs to be renewed with `kinit`. +- **Wrong principal** — the Kerberos principal in the connection form does not match the one configured in the Hive server. +- **KDC unreachable** — the Key Distribution Center (KDC) is not reachable from the Qualytics server. +- **HiveServer2 not running** — the HiveServer2 process is not started or has crashed. + +!!! note + If using basic authentication (username/password), ensure HiveServer2 is configured to accept password-based authentication (`hive.server2.authentication=CUSTOM` or `LDAP`). + +#### Permission Errors + +The error `Permission denied: user does not have SELECT privilege` means the user authenticated successfully but lacks the necessary Hive grants. + +Common causes: + +- **Missing `SELECT` on database** — the user does not have `SELECT` on the target database. +- **Missing `SELECT` on table** — the user has database-level access but not table-level access. +- **Ranger/Sentry policy** — if Apache Ranger or Sentry is enabled, permissions are managed through policies rather than Hive `GRANT` statements. + +#### Connection Errors + +The error `Could not open client transport` or `User is not allowed to impersonate` indicates a transport or proxy user issue. + +Common causes: + +- **HiveServer2 not reachable** — the host or port (default 10000) is incorrect. +- **Proxy user not allowed** — the `hadoop.proxyuser..users` property in `core-site.xml` does not include the Qualytics user. +- **SSL required** — HiveServer2 requires SSL but the connection is not configured for it. + +!!! tip + Start by confirming credentials and Kerberos configuration are valid (authentication errors), then verify Hive grants or Ranger policies (permission errors), and finally check HiveServer2 connectivity (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Hive is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/maria-db.md b/docs/source-datastore/add-datastores/maria-db.md index 87ebc9fed5..c03a1d00ed 100644 --- a/docs/source-datastore/add-datastores/maria-db.md +++ b/docs/source-datastore/add-datastores/maria-db.md @@ -8,6 +8,116 @@ By following these instructions, enterprises can ensure their MariaDB environmen Let’s get started 🚀 +## MariaDB Setup Guide + +Qualytics connects to MariaDB through the **MariaDB JDBC driver**. It uses standard JDBC metadata APIs to discover databases, tables, columns, and primary keys. MariaDB uses the same permission model as MySQL — the database name you provide in the connection form is the scope for all operations. + +### Minimum MariaDB Permissions (Source Datastore) + +| Permission | Purpose | +|---------------|-----------------------------------------------------------------------------| +| `SELECT` | Read data from all tables for profiling and scanning | +| `SHOW VIEW` | Read view definitions for metadata discovery | +| `PROCESS` | View active queries (used by the JDBC driver for connection metadata) | + +### Additional Permissions for Enrichment Datastore + +When using MariaDB as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|---------------|-----------------------------------------------------------------------------| +| `CREATE` | Create enrichment tables (`_qualytics_*`) | +| `ALTER` | Modify enrichment table schemas during version migrations | +| `INSERT` | Write anomaly records, scan results, and check metrics | +| `UPDATE` | Update enrichment records during rescans | +| `DELETE` | Remove stale enrichment records | +| `DROP` | Remove enrichment tables if the datastore is unlinked | + +### Example: Source Datastore User (Read-Only) + +Replace `` and `` with your actual values. + +```sql +-- Create a dedicated read-only user +CREATE USER ‘qualytics_read’@’%’ IDENTIFIED BY ‘’; + +-- Grant read access to all tables and views +GRANT SELECT, SHOW VIEW ON .* TO ‘qualytics_read’@’%’; + +-- Grant the global PROCESS privilege (required by the JDBC driver for connection metadata) +GRANT PROCESS ON *.* TO ‘qualytics_read’@’%’; + +-- Apply the changes +FLUSH PRIVILEGES; +``` + +### Example: Enrichment Datastore User (Read-Write) + +```sql +-- Create a dedicated read-write user +CREATE USER ‘qualytics_readwrite’@’%’ IDENTIFIED BY ‘’; + +-- Grant full data manipulation and table management +GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, SHOW VIEW ON .* TO ‘qualytics_readwrite’@’%’; + +-- Grant the global PROCESS privilege (required by the JDBC driver for connection metadata) +GRANT PROCESS ON *.* TO ‘qualytics_readwrite’@’%’; + +-- Apply the changes +FLUSH PRIVILEGES; +``` + +!!! note + Qualytics automatically filters out system databases (`information_schema`, `mysql`, `performance_schema`, `sys`) during catalog discovery. You do not need to restrict access to these databases manually. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Access denied for user` | Incorrect username, password, or the user does not have access from the connecting host | Verify credentials and ensure the user is created with `’%’` or the specific Qualytics host IP | +| `Host is not allowed to connect` | The MariaDB server rejects connections from the Qualytics host IP | Create the user with `’qualytics_read’@’’` or use `’%’` for any host | +| `SELECT command denied to user` | The user lacks `SELECT` on the target database | Run `GRANT SELECT ON .* TO ‘’@’%’` | +| `CREATE command denied to user` | The enrichment user lacks `CREATE` on the database | Run `GRANT CREATE ON .* TO ‘’@’%’` | +| `SSL connection is required` | The MariaDB server enforces SSL but the connection is not configured for it | Enable SSL in the connection parameters or configure the MariaDB user to not require SSL | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Access denied for user` indicates that the credentials are incorrect or the user does not have access from the connecting host. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the user. +- **Host restriction** — the user was created with a specific host (e.g., `'user'@'localhost'`) but Qualytics connects from a different IP. +- **User does not exist** — the username was misspelled or was never created. + +!!! note + MariaDB differentiates users by both username and host. `'qualytics'@'localhost'` and `'qualytics'@'%'` are treated as two separate users with potentially different passwords and permissions. + +#### Permission Errors + +The error `SELECT command denied to user` means the user authenticated successfully but lacks the necessary grants on the target database. + +Common causes: + +- **Missing `SELECT` grant** — the user does not have `SELECT` on the target database. +- **Wrong database** — the user has permissions on a different database than the one specified in the connection form. +- **Grant not flushed** — after running `GRANT` statements, `FLUSH PRIVILEGES` was not executed. + +#### Connection Errors + +The error `Host is not allowed to connect` means the MariaDB server rejects the connection from the Qualytics host IP. + +Common causes: + +- **User host restriction** — the user was created with `'user'@'localhost'` instead of `'user'@'%'`. +- **Firewall or network** — a firewall is blocking connections on port 3306. +- **Bind address** — MariaDB is configured to listen only on `127.0.0.1` (`bind-address` in `my.cnf`). + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify database permissions (permission errors), and finally check network connectivity (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. MariaDB is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/microsoft-sql-server.md b/docs/source-datastore/add-datastores/microsoft-sql-server.md index 037fd70a08..6ed0947711 100644 --- a/docs/source-datastore/add-datastores/microsoft-sql-server.md +++ b/docs/source-datastore/add-datastores/microsoft-sql-server.md @@ -8,6 +8,128 @@ By following these instructions, enterprises can ensure their Microsoft SQL Serv Let’s get started 🚀 +## Microsoft SQL Server Setup Guide + +Qualytics connects to Microsoft SQL Server through the **Microsoft JDBC driver for SQL Server**. It queries system views (`sys.schemas`, `sys.database_principals`) to discover schemas and uses standard JDBC metadata APIs for tables, columns, and primary keys. + +### Minimum SQL Server Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------|-------------------------------------------------------------------------| +| `CONNECT` | Allow the user to connect to the database | +| `SELECT ON SCHEMA::` | Read data from all tables and views for profiling and scanning | +| `VIEW DEFINITION ON SCHEMA::` | Read object definitions for metadata discovery | +| `SELECT ON sys.schemas` | Discover available schemas in the database | +| `SELECT ON sys.database_principals` | Resolve schema ownership for catalog discovery | + +### Additional Permissions for Enrichment Datastore + +When using SQL Server as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|----------------------------------------------|--------------------------------------------------------------------| +| `CREATE TABLE` | Create enrichment tables (`_qualytics_*`) | +| `INSERT ON SCHEMA::` | Write anomaly records, scan results, and check metrics | +| `UPDATE ON SCHEMA::` | Update enrichment records during rescans | +| `DELETE ON SCHEMA::` | Remove stale enrichment records | +| `ALTER ON SCHEMA::` | Modify enrichment table schemas during version migrations | +| `DROP TABLE` | Remove enrichment tables during cleanup or when the datastore is unlinked | + +### Example: Source Datastore User (Read-Only) + +Replace ``, ``, and `` with your actual values. + +```sql +-- Create a login at the server level +CREATE LOGIN qualytics_read WITH PASSWORD = ‘’; + +-- Switch to the target database +USE ; + +-- Create a user mapped to the login +CREATE USER qualytics_read FOR LOGIN qualytics_read; + +-- Grant connection and read-only access +GRANT CONNECT TO qualytics_read; +GRANT SELECT ON SCHEMA:: TO qualytics_read; +GRANT VIEW DEFINITION ON SCHEMA:: TO qualytics_read; +``` + +### Example: Enrichment Datastore User (Read-Write) + +```sql +-- Create a login at the server level +CREATE LOGIN qualytics_readwrite WITH PASSWORD = ‘’; + +-- Switch to the target database +USE ; + +-- Create a user mapped to the login +CREATE USER qualytics_readwrite FOR LOGIN qualytics_readwrite; + +-- Grant connection, read-write, and table management access +GRANT CONNECT TO qualytics_readwrite; +GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA:: TO qualytics_readwrite; +GRANT CREATE TABLE TO qualytics_readwrite; +GRANT ALTER ON SCHEMA:: TO qualytics_readwrite; +GRANT VIEW DEFINITION ON SCHEMA:: TO qualytics_readwrite; +``` + +!!! note + If using **Service Principal** authentication, ensure the Service Principal has been added as an external user in the database with the same permissions listed above. Use the Client ID, Client Secret, and Tenant ID from your Microsoft Entra ID app registration. + +!!! note + Qualytics automatically filters out system schemas (`INFORMATION_SCHEMA`, `sys`, and schemas starting with `db_`) during catalog discovery. You do not need to restrict access to these schemas manually. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Login failed for user` | Incorrect username or password, or the login does not exist | Verify the login exists at the server level with `SELECT name FROM sys.sql_logins` | +| `Cannot open database requested by the login` | The user does not have access to the specified database | Ensure a user is mapped to the login in the target database with `CREATE USER ... FOR LOGIN` | +| `The SELECT permission was denied on object` | The user lacks `SELECT` on one or more tables in the schema | Run `GRANT SELECT ON SCHEMA:: TO ` | +| `CREATE TABLE permission denied in database` | The enrichment user lacks `CREATE TABLE` permission | Run `GRANT CREATE TABLE TO ` | +| `Cannot find the object because it does not exist or you do not have permissions` | The user lacks `VIEW DEFINITION` on the schema | Run `GRANT VIEW DEFINITION ON SCHEMA:: TO ` | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Login failed for user` indicates that the credentials are incorrect or the login does not exist at the server level. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the login. +- **Login does not exist** — the login was never created at the server level with `CREATE LOGIN`. +- **User not mapped** — the login exists but no user is mapped to it in the target database. +- **Service Principal misconfiguration** — when using Entra ID authentication, the Client ID, Client Secret, or Tenant ID is incorrect. + +!!! note + SQL Server distinguishes between **logins** (server-level) and **users** (database-level). A login must exist at the server level, and a corresponding user must be created in each target database. + +#### Permission Errors + +The error `The SELECT permission was denied on object` means the user authenticated successfully but lacks the necessary grants on the target schema. + +Common causes: + +- **Missing `SELECT ON SCHEMA`** — the user does not have `SELECT` on the target schema. +- **Wrong schema** — the user has permissions on `dbo` but the target tables are in a different schema. +- **Missing `VIEW DEFINITION`** — the user cannot see object metadata needed for catalog discovery. + +#### Connection Errors + +The error `Cannot open database requested by the login` means the user does not have access to the specified database. + +Common causes: + +- **No user in database** — the login exists but `CREATE USER ... FOR LOGIN` was not run in the target database. +- **Database does not exist** — the database name in the connection form is incorrect. +- **Database is offline** — the target database is in a recovery or offline state. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify schema/table permissions (permission errors), and finally check database access (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Microsoft SQL Server is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/mysql.md b/docs/source-datastore/add-datastores/mysql.md index 2016109163..027313b728 100644 --- a/docs/source-datastore/add-datastores/mysql.md +++ b/docs/source-datastore/add-datastores/mysql.md @@ -8,6 +8,116 @@ By following these instructions, enterprises can ensure their MySQL environment Let’s get started 🚀 +## MySQL Setup Guide + +Qualytics connects to MySQL through the **MySQL JDBC driver**. It uses standard JDBC metadata APIs to discover databases, tables, columns, and primary keys. MySQL uses the concept of "database" instead of "schema" — the database name you provide in the connection form is the scope for all operations. + +### Minimum MySQL Permissions (Source Datastore) + +| Permission | Purpose | +|---------------|-----------------------------------------------------------------------------| +| `SELECT` | Read data from all tables for profiling and scanning | +| `SHOW VIEW` | Read view definitions for metadata discovery | +| `PROCESS` | View active queries (used by the JDBC driver for connection metadata) | + +### Additional Permissions for Enrichment Datastore + +When using MySQL as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|---------------|-----------------------------------------------------------------------------| +| `CREATE` | Create enrichment tables (`_qualytics_*`) | +| `ALTER` | Modify enrichment table schemas during version migrations | +| `INSERT` | Write anomaly records, scan results, and check metrics | +| `UPDATE` | Update enrichment records during rescans | +| `DELETE` | Remove stale enrichment records | +| `DROP` | Remove enrichment tables if the datastore is unlinked | + +### Example: Source Datastore User (Read-Only) + +Replace `` and `` with your actual values. + +```sql +-- Create a dedicated read-only user +CREATE USER ‘qualytics_read’@’%’ IDENTIFIED BY ‘’; + +-- Grant read access to all tables and views +GRANT SELECT, SHOW VIEW ON .* TO ‘qualytics_read’@’%’; + +-- Grant the global PROCESS privilege (required by the JDBC driver for connection metadata) +GRANT PROCESS ON *.* TO ‘qualytics_read’@’%’; + +-- Apply the changes +FLUSH PRIVILEGES; +``` + +### Example: Enrichment Datastore User (Read-Write) + +```sql +-- Create a dedicated read-write user +CREATE USER ‘qualytics_readwrite’@’%’ IDENTIFIED BY ‘’; + +-- Grant full data manipulation and table management +GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, SHOW VIEW ON .* TO ‘qualytics_readwrite’@’%’; + +-- Grant the global PROCESS privilege (required by the JDBC driver for connection metadata) +GRANT PROCESS ON *.* TO ‘qualytics_readwrite’@’%’; + +-- Apply the changes +FLUSH PRIVILEGES; +``` + +!!! note + Qualytics automatically filters out system databases (`information_schema`, `mysql`, `performance_schema`, `sys`) during catalog discovery. You do not need to restrict access to these databases manually. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Access denied for user` | Incorrect username, password, or the user does not have access from the connecting host | Verify credentials and ensure the user is created with `’%’` or the specific Qualytics host IP | +| `Host is not allowed to connect` | The MySQL server rejects connections from the Qualytics host IP | Create the user with `’qualytics_read’@’’` or use `’%’` for any host | +| `SELECT command denied to user` | The user lacks `SELECT` on the target database | Run `GRANT SELECT ON .* TO ‘’@’%’` | +| `CREATE command denied to user` | The enrichment user lacks `CREATE` on the database | Run `GRANT CREATE ON .* TO ‘’@’%’` | +| `SSL connection is required` | The MySQL server enforces SSL but the connection is not configured for it | Enable SSL in the connection parameters or configure the MySQL user to not require SSL | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Access denied for user` indicates that the credentials are incorrect or the user does not have access from the connecting host. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the user. +- **Host restriction** — the user was created with a specific host (e.g., `'user'@'localhost'`) but Qualytics connects from a different IP. +- **User does not exist** — the username was misspelled or was never created. + +!!! note + MySQL differentiates users by both username and host. `'qualytics'@'localhost'` and `'qualytics'@'%'` are treated as two separate users with potentially different passwords and permissions. + +#### Permission Errors + +The error `SELECT command denied to user` means the user authenticated successfully but lacks the necessary grants on the target database. + +Common causes: + +- **Missing `SELECT` grant** — the user does not have `SELECT` on the target database. +- **Wrong database** — the user has permissions on a different database than the one specified in the connection form. +- **Grant not flushed** — after running `GRANT` statements, `FLUSH PRIVILEGES` was not executed. + +#### Connection Errors + +The error `Host is not allowed to connect` means the MySQL server rejects the connection from the Qualytics host IP. + +Common causes: + +- **User host restriction** — the user was created with `'user'@'localhost'` instead of `'user'@'%'`. +- **Firewall or network** — a firewall is blocking connections on port 3306. +- **Bind address** — MySQL is configured to listen only on `127.0.0.1` (`bind-address` in `my.cnf`). + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify database permissions (permission errors), and finally check network connectivity (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. MySQL is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/oracle.md b/docs/source-datastore/add-datastores/oracle.md index 712b453d63..7306bb9960 100644 --- a/docs/source-datastore/add-datastores/oracle.md +++ b/docs/source-datastore/add-datastores/oracle.md @@ -8,6 +8,114 @@ By following these instructions, enterprises can ensure their Oracle environment Let’s get started 🚀 +## Oracle Setup Guide + +Qualytics connects to Oracle through the **Oracle JDBC Thin driver**. It uses JDBC metadata APIs to discover schemas, tables, columns, and primary keys. Qualytics automatically filters out Oracle system schemas (`SYS`, `SYSTEM`, `DBSNMP`, `OUTLN`, `APPQOSSYS`, `CTXSYS`, `MDSYS`, `OLAPSYS`, `ORDDATA`, `ORDSYS`, `WMSYS`, `XDB`, `LBACSYS`, `DVSYS`, `AUDSYS`, and others) during catalog discovery. + +### Minimum Oracle Permissions (Source Datastore) + +| Permission | Purpose | +|-------------------------------------|-----------------------------------------------------------------------------| +| `CREATE SESSION` | Allow the user to connect to the database instance | +| `SELECT ON .` | Read data from tables for profiling and scanning | +| `SELECT_CATALOG_ROLE` (optional) | Read data dictionary views for comprehensive metadata discovery | + +!!! note + Qualytics does not support Oracle as an enrichment datastore. You can point to a different enrichment datastore instead. + +!!! info + Oracle connections support both **TCP** and **TCPS** (SSL/TLS) protocols. If your Oracle server requires encrypted connections, select **TCPS** as the protocol in the connection form and ensure the Oracle server's SSL certificate is trusted. + +### Example: Source Datastore User (Read-Only) — Schema-Level Access + +Replace `` and `` with your actual values. + +```sql +-- Create a dedicated read-only user +CREATE USER qualytics_read IDENTIFIED BY "" + DEFAULT TABLESPACE users + TEMPORARY TABLESPACE temp; + +-- Grant connection privileges +GRANT CREATE SESSION TO qualytics_read; + +-- Option A: Grant read access to ALL tables (broad access) +GRANT SELECT ANY TABLE TO qualytics_read; + +-- Option B: Grant read access to a specific schema (restrictive access) +-- Run this for each table you want Qualytics to access: +-- GRANT SELECT ON . TO qualytics_read; +``` + +### Example: Source Datastore User (Read-Only) — Using a Role + +For organizations that prefer role-based access control: + +```sql +-- Create a custom read-only role +CREATE ROLE qualytics_read_role; + +-- Grant SELECT on all tables in the target schema +BEGIN + FOR t IN (SELECT table_name FROM all_tables WHERE owner = UPPER('')) + LOOP + EXECUTE IMMEDIATE 'GRANT SELECT ON .' || t.table_name || ' TO qualytics_read_role'; + END LOOP; +END; +/ + +-- Assign the role to the Qualytics user +GRANT qualytics_read_role TO qualytics_read; +``` + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `ORA-01017: invalid username/password` | Incorrect username or password | Verify the credentials and ensure the user exists with `SELECT username FROM dba_users` | +| `ORA-01045: user lacks CREATE SESSION privilege` | The user cannot establish a session | Run `GRANT CREATE SESSION TO ` | +| `ORA-00942: table or view does not exist` | The user lacks `SELECT` on the target table, or the table does not exist | Grant `SELECT` on the specific table or use `SELECT ANY TABLE` | +| `ORA-12505: TNS:listener does not currently know of SID` | The SID provided in the connection form does not match the database instance | Verify the SID or switch to using a Service Name instead | +| `ORA-12514: TNS:listener does not currently know of service` | The Service Name is incorrect or the service is not registered with the listener | Verify the service name with `lsnrctl status` on the Oracle server | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `ORA-01017: invalid username/password` indicates that the credentials are incorrect. + +Common causes: + +- **Incorrect password** — the password does not match. Oracle passwords are case-sensitive by default. +- **Account locked** — the account has been locked due to too many failed login attempts. Unlock with `ALTER USER ACCOUNT UNLOCK`. +- **Password expired** — the password has expired per the user's profile settings. + +!!! note + Oracle passwords are case-sensitive by default (since Oracle 11g). Ensure the password is entered with the correct case in the connection form. + +#### Permission Errors + +The error `ORA-00942: table or view does not exist` can mean either the object truly does not exist or the user lacks `SELECT` access to it. + +Common causes: + +- **Missing `SELECT` grant** — the user does not have `SELECT` on the target table. Oracle does not distinguish between "table not found" and "no permission" for security reasons. +- **Schema not specified** — the table exists in a different schema and the user is querying without the schema prefix. +- **Synonym not created** — the user expects to access the table without the schema prefix, but no synonym exists. + +#### Connection Errors + +The error `ORA-12505: TNS:listener does not currently know of SID` or `ORA-12514: TNS:listener does not currently know of service` means the connection identifier is incorrect. + +Common causes: + +- **Wrong SID or Service Name** — the value does not match the database instance configuration. +- **Listener not running** — the Oracle listener process is not started on the server. +- **Wrong host or port** — the host or port (default 1521) does not match the Oracle server configuration. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify table permissions (permission errors), and finally check the connection identifier — SID or Service Name (connection errors). + ## Add the Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Oracle, for example, is a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the Oracle datastore allows the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/postgresql.md b/docs/source-datastore/add-datastores/postgresql.md index 421cda66bf..f400b69a6a 100644 --- a/docs/source-datastore/add-datastores/postgresql.md +++ b/docs/source-datastore/add-datastores/postgresql.md @@ -8,6 +8,119 @@ By following these instructions, enterprises can ensure their PostgreSQL environ Let’s get started 🚀 +## PostgreSQL Setup Guide + +Qualytics connects to PostgreSQL through the **PostgreSQL JDBC driver**. It uses standard JDBC metadata APIs and queries `pg_catalog` system tables to discover schemas, tables, columns, primary keys, and incremental fields. + +### Minimum PostgreSQL Permissions (Source Datastore) + +| Permission | Purpose | +|-------------------------------------|-------------------------------------------------------------------------| +| `CONNECT ON DATABASE` | Allow the role to connect to the target database | +| `USAGE ON SCHEMA` | Access objects within the schema | +| `SELECT ON ALL TABLES IN SCHEMA` | Read data from all existing tables for profiling and scanning | +| `SELECT ON ALL SEQUENCES IN SCHEMA` | Read sequence metadata for incremental field detection | + +### Additional Permissions for Enrichment Datastore + +When using PostgreSQL as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|---------------------------------------------------|-----------------------------------------------------------------| +| `CREATE ON SCHEMA` | Create enrichment tables (`_qualytics_*`) | +| `INSERT ON ALL TABLES IN SCHEMA` | Write anomaly records, scan results, and check metrics | +| `UPDATE ON ALL TABLES IN SCHEMA` | Update enrichment records during rescans | +| `DELETE ON ALL TABLES IN SCHEMA` | Remove stale enrichment records | +| `ALTER TABLE` | Modify enrichment table schemas during version migrations | +| `DROP` (on enrichment tables) | Remove enrichment tables if the datastore is unlinked or during cleanup | + +### Example: Source Datastore Role (Read-Only) + +Replace ``, ``, and `` with your actual values. + +```sql +-- Create a dedicated read-only role +CREATE ROLE qualytics_read_role LOGIN PASSWORD ‘’; + +-- Grant connection and schema access +GRANT CONNECT ON DATABASE TO qualytics_read_role; +GRANT USAGE ON SCHEMA TO qualytics_read_role; + +-- Grant read access to all existing and future tables +GRANT SELECT ON ALL TABLES IN SCHEMA TO qualytics_read_role; +GRANT SELECT ON ALL SEQUENCES IN SCHEMA TO qualytics_read_role; +ALTER DEFAULT PRIVILEGES IN SCHEMA GRANT SELECT ON TABLES TO qualytics_read_role; +``` + +### Example: Enrichment Datastore Role (Read-Write) + +```sql +-- Create a dedicated read-write role +CREATE ROLE qualytics_readwrite_role LOGIN PASSWORD ‘’; + +-- Grant connection, schema access, and table creation +GRANT CONNECT ON DATABASE TO qualytics_readwrite_role; +GRANT USAGE, CREATE ON SCHEMA TO qualytics_readwrite_role; + +-- Grant full data manipulation on all existing and future tables +GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA TO qualytics_readwrite_role; +ALTER DEFAULT PRIVILEGES IN SCHEMA GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO qualytics_readwrite_role; +``` + +!!! note + Qualytics automatically filters out system schemas (`pg_catalog`, `pg_toast`, `pg_internal`, `information_schema`) during catalog discovery. You do not need to restrict access to these schemas manually. + +!!! info + For optimal **incremental profiling** performance, it is recommended to enable `track_commit_timestamp = on` in your PostgreSQL configuration (`postgresql.conf`). This allows Qualytics to detect recently modified rows using transaction commit timestamps (`pg_xact_commit_timestamp`), reducing the amount of data scanned during profiling operations. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|----------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `FATAL: password authentication failed` | Incorrect username or password | Verify the credentials and ensure the role exists with `\du` in psql | +| `FATAL: no pg_hba.conf entry for host` | The PostgreSQL server does not allow connections from the Qualytics host IP | Add the Qualytics IP to `pg_hba.conf` and reload the configuration | +| `permission denied for schema` | The role lacks `USAGE` on the target schema | Run `GRANT USAGE ON SCHEMA TO ` | +| `permission denied for table` | The role lacks `SELECT` on one or more tables | Run `GRANT SELECT ON ALL TABLES IN SCHEMA TO ` | +| `permission denied to create table` | The enrichment role lacks `CREATE` on the schema | Run `GRANT CREATE ON SCHEMA TO ` | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `FATAL: password authentication failed` indicates that the credentials provided are incorrect or the role does not exist. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the role. +- **Role does not exist** — the role name was misspelled or was never created. +- **Authentication method mismatch** — the `pg_hba.conf` file requires a different authentication method (e.g., `md5` vs `scram-sha-256`). + +!!! note + PostgreSQL logs detailed authentication errors in the server log. Check `pg_log` or `log_directory` for the exact reason. + +#### Permission Errors + +The error `permission denied for schema` or `permission denied for table` means the role authenticated successfully but lacks the necessary grants. + +Common causes: + +- **Missing `USAGE` on schema** — the role cannot access the schema even if table-level grants exist. +- **Missing `SELECT` on tables** — the role has schema access but cannot read specific tables. +- **Default privileges not set** — new tables created after the initial grant are not automatically accessible. Use `ALTER DEFAULT PRIVILEGES` to fix this. + +#### Connection Errors + +The error `FATAL: no pg_hba.conf entry for host` means the PostgreSQL server does not recognize the Qualytics host IP. + +Common causes: + +- **IP not whitelisted** — the Qualytics server IP is not listed in `pg_hba.conf`. +- **Wrong database name** — the `pg_hba.conf` entry restricts access to specific databases. +- **SSL required** — the server requires SSL connections but the client is connecting without SSL. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify schema/table permissions (permission errors), and finally check network connectivity (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. PostgreSQL is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/presto.md b/docs/source-datastore/add-datastores/presto.md index 606921d099..4b659c0389 100644 --- a/docs/source-datastore/add-datastores/presto.md +++ b/docs/source-datastore/add-datastores/presto.md @@ -8,6 +8,96 @@ By following these instructions, enterprises can ensure their Presto environment Let’s get started 🚀 +## Presto Setup Guide + +Qualytics connects to Presto through the **Presto JDBC driver**. It uses standard SQL queries for data profiling and scanning. Since Presto is a distributed query engine, permissions are determined by the underlying data source (connector) configured in the Presto catalog (e.g., Hive, MySQL, PostgreSQL). + +### Minimum Presto Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `SELECT` on target tables | Read data from tables for profiling and scanning | +| Access to the Presto catalog | Browse available schemas and tables | +| Access to the Presto schema | Browse available tables and columns | + +The actual permissions depend on the Presto security model configured for your deployment: + +| Security Model | How Permissions Are Managed | +|-----------------------------|------------------------------------------------------------------------------| +| **No security (default)** | All users have full read access to all catalogs and schemas | +| **File-based access control** | Permissions are defined in `rules.json` — ensure the Qualytics user has `SELECT` on the target catalog and schema | +| **Connector-level security** | Permissions are delegated to the underlying data source (e.g., Hive Metastore, RDBMS grants) — ensure the Qualytics user has read access at the source level | + +!!! note + Qualytics does not support Presto as an enrichment datastore. You can point to a different enrichment datastore instead. + +### Example: File-Based Access Control Configuration + +If your Presto deployment uses file-based access control (`rules.json`), ensure the Qualytics user has `SELECT` access to the target catalog and schema: + +```json +{ + "catalogs": [ + { + "user": "qualytics_read", + "catalog": "", + "allow": "read-only" + } + ] +} +``` + +!!! tip + If your Presto deployment uses connector-level security (e.g., Hive Metastore), grant the equivalent `SELECT` permissions directly in the underlying data source instead of using `rules.json`. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Access Denied: Cannot select from table` | The user lacks `SELECT` on the target table in the Presto access control rules or the underlying connector | Add `SELECT` permission for the user in `rules.json` or grant access in the underlying data source | +| `Catalog does not exist` | The catalog name in the connection form does not match a configured Presto catalog | Verify available catalogs with `SHOW CATALOGS` in Presto | +| `Schema does not exist` | The schema name does not exist in the specified catalog | Verify available schemas with `SHOW SCHEMAS FROM ` | +| `Connection refused` | The Presto coordinator is not reachable or the port (default 8080) is incorrect | Verify the host, port, and that the Presto coordinator is running | +| `Authentication failed` | Incorrect username or password, or the Presto server requires a different authentication method | Verify credentials and check if the Presto server uses LDAP, Kerberos, or password file authentication | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Authentication failed` indicates that the credentials are incorrect or the authentication method does not match the server configuration. + +Common causes: + +- **Incorrect password** — the password does not match the one configured in the Presto server. +- **Wrong authentication method** — the Presto server uses LDAP or Kerberos, but the connection form provides basic username/password. +- **HTTPS required** — the Presto coordinator requires HTTPS connections, but the connection is using HTTP. + +!!! note + Presto authentication is configured at the coordinator level. Check the `password-authenticator.properties` file for the configured authentication method. + +#### Permission Errors + +The error `Access Denied: Cannot select from table` means the user authenticated successfully but lacks access to the target table. + +Common causes: + +- **File-based access control** — the `rules.json` file does not grant `SELECT` access to the Qualytics user on the target catalog or schema. +- **Connector-level security** — the underlying data source (e.g., Hive Metastore) does not grant read access to the user. +- **Catalog-level restriction** — the user has access to the schema but the catalog itself is restricted. + +#### Connection Errors + +The error `Connection refused` or `Catalog does not exist` indicates a connectivity or configuration issue. + +Common causes: + +- **Coordinator not reachable** — the Presto coordinator host or port (default 8080) is incorrect. +- **Wrong catalog name** — the catalog name in the connection form does not match a configured Presto catalog. +- **Coordinator not running** — the Presto coordinator process is not started. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify access control rules (permission errors), and finally check coordinator connectivity (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Presto is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/redshift.md b/docs/source-datastore/add-datastores/redshift.md index bd760cf797..46589c1efa 100644 --- a/docs/source-datastore/add-datastores/redshift.md +++ b/docs/source-datastore/add-datastores/redshift.md @@ -8,6 +8,115 @@ By following these instructions, enterprises can ensure their Redshift environme Let’s get started 🚀 +## Redshift Setup Guide + +Qualytics connects to Amazon Redshift through the **Redshift JDBC driver** (PostgreSQL-compatible). It uses standard JDBC metadata APIs to discover schemas, tables, columns, and primary keys. Qualytics automatically filters out system schemas (`pg_catalog`, `pg_toast`, `pg_internal`, `information_schema`) during catalog discovery. + +### Minimum Redshift Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `USAGE ON SCHEMA ` | Access objects within the target schema | +| `SELECT ON ALL TABLES IN SCHEMA` | Read data from all tables for profiling and scanning | + +### Additional Permissions for Enrichment Datastore + +When using Redshift as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `CREATE ON SCHEMA ` | Create enrichment tables (`_qualytics_*`) | +| `INSERT ON ALL TABLES IN SCHEMA` | Write anomaly records, scan results, and check metrics | +| `UPDATE ON ALL TABLES IN SCHEMA` | Update enrichment records during rescans | +| `DELETE ON ALL TABLES IN SCHEMA` | Remove stale enrichment records | +| `ALTER TABLE` | Modify enrichment table schemas during version migrations | +| `DROP TABLE` | Remove enrichment tables during cleanup or when the datastore is unlinked | + +### Example: Source Datastore User (Read-Only) + +Replace `` and `` with your actual values. + +```sql +-- Create a dedicated read-only user +CREATE USER qualytics_read PASSWORD ‘’; + +-- Grant schema access and read permissions +GRANT USAGE ON SCHEMA TO qualytics_read; +GRANT SELECT ON ALL TABLES IN SCHEMA TO qualytics_read; + +-- Grant read access to future tables automatically +ALTER DEFAULT PRIVILEGES IN SCHEMA GRANT SELECT ON TABLES TO qualytics_read; +``` + +### Example: Enrichment Datastore User (Read-Write) + +```sql +-- Create a dedicated read-write user +CREATE USER qualytics_readwrite PASSWORD ‘’; + +-- Grant schema access, table creation, and data manipulation +GRANT USAGE, CREATE ON SCHEMA TO qualytics_readwrite; +GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA TO qualytics_readwrite; + +-- Grant full access to future tables automatically +ALTER DEFAULT PRIVILEGES IN SCHEMA GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO qualytics_readwrite; +``` + +!!! note + The enrichment user also needs `ALTER TABLE` and `DROP TABLE` permissions for schema migrations and cleanup operations. The `ALTER DEFAULT PRIVILEGES` command with `SELECT, INSERT, UPDATE, DELETE` covers most operations, but `ALTER TABLE` and `DROP TABLE` are inherited through table ownership when Qualytics creates the enrichment tables. + +!!! note + Qualytics automatically filters out system schemas (`pg_catalog`, `pg_toast`, `pg_internal`, `information_schema`) during catalog discovery. You do not need to restrict access to these schemas manually. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `FATAL: password authentication failed` | Incorrect username or password | Verify the credentials and ensure the user exists in the Redshift cluster | +| `permission denied for schema` | The user lacks `USAGE` on the target schema | Run `GRANT USAGE ON SCHEMA TO ` | +| `permission denied for relation` | The user lacks `SELECT` on one or more tables | Run `GRANT SELECT ON ALL TABLES IN SCHEMA TO ` | +| `permission denied to create relation` | The enrichment user lacks `CREATE` on the schema | Run `GRANT CREATE ON SCHEMA TO ` | +| `Connection refused` | The Redshift cluster is not reachable or the security group blocks the Qualytics IP | Add the Qualytics IP to the Redshift cluster security group inbound rules | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `FATAL: password authentication failed` indicates that the credentials are incorrect. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the user. +- **User does not exist** — the username was misspelled or was never created. +- **Master user required** — some operations may require the Redshift cluster's master user credentials. + +!!! note + Redshift uses PostgreSQL-compatible authentication. Check the Redshift cluster's parameter group for authentication settings. + +#### Permission Errors + +The error `permission denied for schema` or `permission denied for relation` means the user authenticated successfully but lacks the necessary grants. + +Common causes: + +- **Missing `USAGE` on schema** — the user cannot access the schema even if table-level grants exist. +- **Missing `SELECT` on tables** — the user has schema access but cannot read specific tables. +- **Default privileges not set** — new tables created by other users after the initial grant are not automatically accessible. Use `ALTER DEFAULT PRIVILEGES` to fix this. +- **Table owner mismatch** — the table was created by a different user, and default privileges were not granted. + +#### Connection Errors + +The error `Connection refused` means the Redshift cluster is not reachable from the Qualytics server. + +Common causes: + +- **Security group** — the Redshift cluster's VPC security group does not allow inbound connections from the Qualytics IP on port 5439. +- **Cluster not publicly accessible** — the cluster was created without public accessibility and Qualytics is connecting from outside the VPC. +- **Cluster paused** — the Redshift cluster is in a paused state and needs to be resumed. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify schema/table permissions (permission errors), and finally check network connectivity and security group rules (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Redshift is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/snowflake.md b/docs/source-datastore/add-datastores/snowflake.md index 8910bea1e1..3956adf427 100644 --- a/docs/source-datastore/add-datastores/snowflake.md +++ b/docs/source-datastore/add-datastores/snowflake.md @@ -110,6 +110,57 @@ For detailed information on the migration plan and implementation: !!! info "Migration Recommendation" While Basic authentication is currently supported, migrating to Key-Pair authentication ensures your Snowflake connections remain secure and future-proof as Snowflake implements their deprecation timeline. +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Incorrect username or password` | The username or password is incorrect | Verify the credentials and ensure the user exists with `SHOW USERS` in Snowflake | +| `No active warehouse selected` | The user does not have a default warehouse or the warehouse specified in the connection form does not exist | Verify the warehouse name and run `ALTER USER SET DEFAULT_WAREHOUSE = ` | +| `Insufficient privileges to operate on schema` | The role lacks `USAGE` on the target schema | Run `GRANT USAGE ON SCHEMA . TO ROLE ` | +| `Object does not exist or not authorized` | The role lacks `SELECT` on the target table or the table does not exist | Run `GRANT SELECT ON ALL TABLES IN SCHEMA . TO ROLE ` | +| `Warehouse is suspended` | The warehouse is suspended and `AUTO_RESUME` is not enabled | Resume the warehouse with `ALTER WAREHOUSE RESUME` or enable `AUTO_RESUME` | +| `Private key provided is invalid` | The private key file is malformed or the passphrase is incorrect (Key-Pair auth) | Verify the private key format (PKCS#8 PEM) and the passphrase | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Incorrect username or password` or `Private key provided is invalid` indicates that the credentials are incorrect. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the user. +- **Account identifier wrong** — the Snowflake host format must be `..snowflakecomputing.com`. An incorrect account identifier will fail to connect. +- **Key-Pair format** — the private key must be in PKCS#8 PEM format. DER or PKCS#1 formats are not supported. +- **Passphrase mismatch** — if the private key is encrypted, the passphrase provided does not match. + +!!! note + Snowflake is migrating service accounts to Key-Pair authentication. If using basic authentication with a `TYPE=SERVICE` user, consider migrating to Key-Pair before Snowflake deprecates basic auth for service users. + +#### Permission Errors + +The error `Insufficient privileges` or `Object does not exist or not authorized` means the role authenticated successfully but lacks the necessary grants. + +Common causes: + +- **Wrong role** — the user's current role does not have the required privileges. Verify the role with `SELECT CURRENT_ROLE()`. +- **Missing `USAGE` on warehouse** — the role cannot execute queries without warehouse access. +- **Missing `USAGE` on database/schema** — the role cannot browse objects in the database or schema. +- **Future grants not set** — new tables created after the initial grant are not automatically accessible. Use `GRANT SELECT ON FUTURE TABLES IN SCHEMA` to fix this. + +#### Connection Errors + +The error `No active warehouse selected` or `Warehouse is suspended` indicates a compute resource issue. + +Common causes: + +- **Warehouse does not exist** — the warehouse name in the connection form was misspelled or the warehouse was dropped. +- **Warehouse suspended** — the warehouse is suspended and `AUTO_RESUME = FALSE`. Resume it manually or enable auto-resume. +- **No default warehouse** — the user does not have a default warehouse assigned with `ALTER USER ... SET DEFAULT_WAREHOUSE`. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify role privileges (permission errors), and finally check warehouse availability (connection errors). + ## Add a Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Snowflake is an example of a source datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the JDBC datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/synapse.md b/docs/source-datastore/add-datastores/synapse.md index 65d0e2a77f..0c2e47f47f 100644 --- a/docs/source-datastore/add-datastores/synapse.md +++ b/docs/source-datastore/add-datastores/synapse.md @@ -8,6 +8,124 @@ By following these instructions, enterprises can ensure their Synapse environmen Let’s get started 🚀 +## Synapse Setup Guide + +Qualytics connects to Azure Synapse Analytics through the **Microsoft JDBC driver for SQL Server**. Synapse follows the same permission model as Microsoft SQL Server. It queries system views (`sys.schemas`, `sys.database_principals`) to discover schemas and uses standard JDBC metadata APIs for tables, columns, and primary keys. + +### Minimum Synapse Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------|-------------------------------------------------------------------------| +| `CONNECT` | Allow the user to connect to the database | +| `SELECT ON SCHEMA::` | Read data from all tables and views for profiling and scanning | +| `VIEW DEFINITION ON SCHEMA::` | Read object definitions for metadata discovery | +| `SELECT ON sys.schemas` | Discover available schemas in the database | +| `SELECT ON sys.database_principals` | Resolve schema ownership for catalog discovery | + +### Additional Permissions for Enrichment Datastore + +When using Synapse as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|----------------------------------------------|--------------------------------------------------------------------| +| `CREATE TABLE` | Create enrichment tables (`_qualytics_*`) | +| `INSERT ON SCHEMA::` | Write anomaly records, scan results, and check metrics | +| `UPDATE ON SCHEMA::` | Update enrichment records during rescans | +| `DELETE ON SCHEMA::` | Remove stale enrichment records | +| `ALTER ON SCHEMA::` | Modify enrichment table schemas during version migrations | +| `DROP TABLE` | Remove enrichment tables during cleanup or when the datastore is unlinked | + +### Example: Source Datastore User (Read-Only) + +Replace ``, ``, and `` with your actual values. + +```sql +-- Create a login at the server level +CREATE LOGIN qualytics_read WITH PASSWORD = ‘’; + +-- Switch to the target database +USE ; + +-- Create a user mapped to the login +CREATE USER qualytics_read FOR LOGIN qualytics_read; + +-- Grant connection and read-only access +GRANT CONNECT TO qualytics_read; +GRANT SELECT ON SCHEMA:: TO qualytics_read; +GRANT VIEW DEFINITION ON SCHEMA:: TO qualytics_read; +``` + +### Example: Enrichment Datastore User (Read-Write) + +```sql +-- Create a login at the server level +CREATE LOGIN qualytics_readwrite WITH PASSWORD = ‘’; + +-- Switch to the target database +USE ; + +-- Create a user mapped to the login +CREATE USER qualytics_readwrite FOR LOGIN qualytics_readwrite; + +-- Grant connection, read-write, and table management access +GRANT CONNECT TO qualytics_readwrite; +GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA:: TO qualytics_readwrite; +GRANT CREATE TABLE TO qualytics_readwrite; +GRANT ALTER ON SCHEMA:: TO qualytics_readwrite; +GRANT VIEW DEFINITION ON SCHEMA:: TO qualytics_readwrite; +``` + +!!! note + Qualytics automatically filters out system schemas (`INFORMATION_SCHEMA`, `sys`, and schemas starting with `db_`) during catalog discovery. You do not need to restrict access to these schemas manually. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Login failed for user` | Incorrect username or password, or the login does not exist | Verify the login exists at the server level with `SELECT name FROM sys.sql_logins` | +| `Cannot open database requested by the login` | The user does not have access to the specified database | Ensure a user is mapped to the login in the target database with `CREATE USER ... FOR LOGIN` | +| `The SELECT permission was denied on object` | The user lacks `SELECT` on one or more tables in the schema | Run `GRANT SELECT ON SCHEMA:: TO ` | +| `CREATE TABLE permission denied in database` | The enrichment user lacks `CREATE TABLE` permission | Run `GRANT CREATE TABLE TO ` | +| `Cannot find the object because it does not exist or you do not have permissions` | The user lacks `VIEW DEFINITION` on the schema | Run `GRANT VIEW DEFINITION ON SCHEMA:: TO ` | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Login failed for user` indicates that the credentials are incorrect or the login does not exist at the server level. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the login. +- **Login does not exist** — the login was never created at the server level with `CREATE LOGIN`. +- **User not mapped** — the login exists but no user is mapped to it in the target database. + +!!! note + Synapse uses the same authentication model as SQL Server. A login must exist at the server level, and a corresponding user must be created in each target database. + +#### Permission Errors + +The error `The SELECT permission was denied on object` means the user authenticated successfully but lacks the necessary grants on the target schema. + +Common causes: + +- **Missing `SELECT ON SCHEMA`** — the user does not have `SELECT` on the target schema. +- **Wrong schema** — the user has permissions on `dbo` but the target tables are in a different schema. +- **Missing `VIEW DEFINITION`** — the user cannot see object metadata needed for catalog discovery. + +#### Connection Errors + +The error `Cannot open database requested by the login` means the user does not have access to the specified database. + +Common causes: + +- **No user in database** — the login exists but `CREATE USER ... FOR LOGIN` was not run in the target database. +- **Database does not exist** — the database name in the connection form is incorrect. +- **Synapse pool paused** — the dedicated SQL pool is paused and needs to be resumed. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify schema/table permissions (permission errors), and finally check database access (connection errors). + ## Add the Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Synapse is an example of such a datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the Synapse datastore allows the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/teradata.md b/docs/source-datastore/add-datastores/teradata.md index e413ee543b..6314e9c0e3 100644 --- a/docs/source-datastore/add-datastores/teradata.md +++ b/docs/source-datastore/add-datastores/teradata.md @@ -8,6 +8,92 @@ By following these instructions, enterprises can ensure their Teradata environme Let’s get started 🚀 +## Teradata Setup Guide + +Qualytics connects to Teradata through the **Teradata JDBC driver**. It uses JDBC metadata APIs to discover databases, tables, columns, and primary keys. Qualytics automatically filters out Teradata system databases (`DBC`, `SYSLIB`, `SYSSPATIAL`, `SYSUDTLIB`, `SystemFe`, `TDQCD`, `TDStats`, `TDPUSER`, `SYSUIF`, `All`, `Crashdumps`, `EXTUSER`, `LockLogShredder`, `SQLJ`, `SYSADMIN`, `SYSBAR`, `SYSJDBC`) during catalog discovery. + +### Minimum Teradata Permissions (Source Datastore) + +| Permission | Purpose | +|-------------------------------------|-----------------------------------------------------------------------------| +| `LOGON` | Allow the user to log on to the Teradata system | +| `SELECT ON ` | Read data from all tables for profiling and scanning | +| `SHOW ON ` | View object definitions (DDL) for metadata discovery | +| `SELECT ON DBC.DatabasesV` | Read database metadata for catalog discovery | + +!!! note + Qualytics does not support Teradata as an enrichment datastore. You can point to a different enrichment datastore instead. + +### Example: Source Datastore User (Read-Only) + +Replace `` and `` with your actual values. + +```sql +-- Create a dedicated read-only user +CREATE USER qualytics_read AS + PASSWORD = '' + PERM = 0 + SPOOL = 1000000000; + +-- Grant logon access +GRANT LOGON ON ALL TO qualytics_read; + +-- Grant read access to the target database +GRANT SELECT ON TO qualytics_read; +GRANT SHOW ON TO qualytics_read; +``` + +!!! tip + If using **LDAP authentication**, ensure the LDAP user has the same `SELECT` and `SHOW` privileges on the target database. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Authentication failed` | Incorrect username or password | Verify the credentials and ensure the user exists in the Teradata system | +| `User does not have SELECT access` | The user lacks `SELECT` on the target database or table | Run `GRANT SELECT ON TO ` | +| `User does not have SHOW access` | The user lacks `SHOW` on the target database | Run `GRANT SHOW ON TO ` | +| `Connection refused` | The Teradata server is not reachable or the port is incorrect | Verify the host and port, and ensure the Teradata server allows connections from the Qualytics IP | +| `Database does not exist` | The database name provided in the connection form is incorrect | Verify the database name with `SELECT DatabaseName FROM DBC.DatabasesV` | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Authentication failed` indicates that the credentials are incorrect. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the user. +- **User does not exist** — the username was misspelled or does not exist in the Teradata system. +- **LDAP authentication** — if LDAP is enabled, the credentials must match the LDAP directory, not the Teradata internal user store. + +!!! note + Teradata authentication can be configured to use internal, LDAP, or Kerberos mechanisms. Ensure the authentication method in the connection form matches the server configuration. + +#### Permission Errors + +The error `User does not have SELECT access` means the user authenticated successfully but lacks the necessary grants on the target database. + +Common causes: + +- **Missing `SELECT` on database** — the user does not have `SELECT` on the target database or specific tables. +- **Missing `SHOW` on database** — the user cannot view object definitions needed for metadata discovery. +- **Access to system databases** — the user is trying to access a filtered system database (e.g., `DBC`, `SYSLIB`). + +#### Connection Errors + +The error `Connection refused` means the Teradata server is not reachable from the Qualytics server. + +Common causes: + +- **Firewall** — a firewall is blocking connections on the Teradata port (default 1025). +- **Server not running** — the Teradata server is not started or is in a maintenance state. +- **Wrong host** — the hostname or IP address in the connection form is incorrect. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify database permissions (permission errors), and finally check network connectivity (connection errors). + ## Add the Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Teradata is an example of such a datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the Teradata datastore allows the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/timescale-db.md b/docs/source-datastore/add-datastores/timescale-db.md index 4f93babb55..a770716eea 100644 --- a/docs/source-datastore/add-datastores/timescale-db.md +++ b/docs/source-datastore/add-datastores/timescale-db.md @@ -8,6 +8,91 @@ By following these instructions, enterprises can ensure their TimescaleDB enviro Let’s get started 🚀 +## TimescaleDB Setup Guide + +Qualytics connects to TimescaleDB through the **PostgreSQL JDBC driver**. TimescaleDB is a PostgreSQL extension, so the permission model follows the same PostgreSQL conventions. Qualytics uses standard JDBC metadata APIs and queries `pg_catalog` system tables to discover schemas, tables (including hypertables), columns, and primary keys. Qualytics automatically filters out TimescaleDB internal schemas (`timescaledb_information`, `timescaledb_experimental`) during catalog discovery. + +### Minimum TimescaleDB Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `CONNECT ON DATABASE` | Allow the role to connect to the target database | +| `USAGE ON SCHEMA ` | Access objects within the target schema | +| `SELECT ON ALL TABLES IN SCHEMA` | Read data from all existing tables (including hypertables) for profiling and scanning | +| `SELECT ON ALL SEQUENCES IN SCHEMA` | Read sequence metadata for incremental field detection | + +!!! note + Qualytics does not support TimescaleDB as an enrichment datastore. You can point to a different enrichment datastore instead. + +### Example: Source Datastore Role (Read-Only) + +Replace ``, ``, and `` with your actual values. + +```sql +-- Create a dedicated read-only role +CREATE ROLE qualytics_read_role LOGIN PASSWORD ‘’; + +-- Grant connection and schema access +GRANT CONNECT ON DATABASE TO qualytics_read_role; +GRANT USAGE ON SCHEMA TO qualytics_read_role; + +-- Grant read access to all existing and future tables (including hypertables) +GRANT SELECT ON ALL TABLES IN SCHEMA TO qualytics_read_role; +GRANT SELECT ON ALL SEQUENCES IN SCHEMA TO qualytics_read_role; +ALTER DEFAULT PRIVILEGES IN SCHEMA GRANT SELECT ON TABLES TO qualytics_read_role; +``` + +!!! note + Qualytics automatically filters out system schemas (`pg_catalog`, `pg_toast`, `pg_internal`, `information_schema`, `timescaledb_information`, `timescaledb_experimental`) during catalog discovery. You do not need to restrict access to these schemas manually. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `FATAL: password authentication failed` | Incorrect username or password | Verify the credentials and ensure the role exists with `\du` in psql | +| `FATAL: no pg_hba.conf entry for host` | The TimescaleDB server does not allow connections from the Qualytics host IP | Add the Qualytics IP to `pg_hba.conf` and reload the configuration | +| `permission denied for schema` | The role lacks `USAGE` on the target schema | Run `GRANT USAGE ON SCHEMA TO ` | +| `permission denied for table` | The role lacks `SELECT` on one or more tables | Run `GRANT SELECT ON ALL TABLES IN SCHEMA TO ` | +| `relation does not exist` | The hypertable or table name is incorrect, or the user cannot see it | Verify the table exists with `\dt` in psql and check schema permissions | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `FATAL: password authentication failed` indicates that the credentials provided are incorrect or the role does not exist. + +Common causes: + +- **Incorrect password** — the password does not match the one set for the role. +- **Role does not exist** — the role name was misspelled or was never created. +- **Authentication method mismatch** — the `pg_hba.conf` file requires a different authentication method (e.g., `md5` vs `scram-sha-256`). + +!!! note + TimescaleDB uses the same authentication system as PostgreSQL. Check `pg_hba.conf` and PostgreSQL server logs for detailed error information. + +#### Permission Errors + +The error `permission denied for schema` or `permission denied for table` means the role authenticated successfully but lacks the necessary grants. + +Common causes: + +- **Missing `USAGE` on schema** — the role cannot access the schema even if table-level grants exist. +- **Missing `SELECT` on tables** — the role has schema access but cannot read specific tables or hypertables. +- **Default privileges not set** — new tables created after the initial grant are not automatically accessible. Use `ALTER DEFAULT PRIVILEGES` to fix this. + +#### Connection Errors + +The error `FATAL: no pg_hba.conf entry for host` means the TimescaleDB server does not recognize the Qualytics host IP. + +Common causes: + +- **IP not whitelisted** — the Qualytics server IP is not listed in `pg_hba.conf`. +- **Wrong database name** — the `pg_hba.conf` entry restricts access to specific databases. +- **SSL required** — the server requires SSL connections but the client is connecting without SSL. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify schema/table permissions (permission errors), and finally check network connectivity (connection errors). + ## Add the Source Datastore A source datastore is a storage location used to connect to and access data from external sources. TimescaleDB is an example of such a datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the TimescaleDB datastore allows the Qualytics platform to access and perform operations on the data, thereby generating valuable insights. diff --git a/docs/source-datastore/add-datastores/trino.md b/docs/source-datastore/add-datastores/trino.md index eaa1b70e0c..18a8721659 100644 --- a/docs/source-datastore/add-datastores/trino.md +++ b/docs/source-datastore/add-datastores/trino.md @@ -8,6 +8,108 @@ By following these instructions, enterprises can ensure their Trino environment Let’s get started 🚀 +## Trino Setup Guide + +Qualytics connects to Trino through the **Trino JDBC driver**. It uses standard SQL queries for data profiling and scanning. Since Trino is a distributed query engine, permissions are determined by the underlying data source (connector) configured in the Trino catalog (e.g., Hive, Delta Lake, Iceberg, RDBMS). + +### Minimum Trino Permissions (Source Datastore) + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `SELECT` on target tables | Read data from tables for profiling and scanning | +| Access to the Trino catalog | Browse available schemas and tables | +| Access to the Trino schema | Browse available tables and columns | + +### Additional Permissions for Enrichment Datastore + +When using Trino as an enrichment datastore, the following additional permissions are required for Qualytics to write metadata tables (e.g., `_qualytics_*`): + +| Permission | Purpose | +|-----------------------------------------------|-------------------------------------------------------------------------| +| `CREATE TABLE` in the schema | Create enrichment tables (`_qualytics_*`) | +| `INSERT` into tables | Write anomaly records, scan results, and check metrics | +| `DELETE` from tables | Remove stale enrichment records | +| `ALTER TABLE` in the schema | Modify enrichment table schemas during version migrations | +| `DROP TABLE` in the schema | Remove enrichment tables during cleanup or when the datastore is unlinked | + +The actual permissions depend on the Trino security model configured for your deployment: + +| Security Model | How Permissions Are Managed | +|-----------------------------|------------------------------------------------------------------------------| +| **No security (default)** | All users have full read/write access to all catalogs and schemas | +| **File-based access control** | Permissions are defined in `rules.json` — ensure the Qualytics user has `SELECT` (and `INSERT`, `CREATE TABLE` for enrichment) on the target catalog and schema | +| **Connector-level security** | Permissions are delegated to the underlying data source — ensure the Qualytics user has read (and write for enrichment) access at the source level | + +### Example: File-Based Access Control Configuration + +If your Trino deployment uses file-based access control (`rules.json`), ensure the Qualytics user has appropriate access to the target catalog and schema: + +```json +{ + "catalogs": [ + { + "user": "qualytics_read", + "catalog": "", + "allow": "read-only" + } + ] +} +``` + +For enrichment datastores, use `"allow": "all"` instead of `"read-only"` to enable write operations. + +!!! note + Trino permissions are managed through the underlying connector's security model (e.g., Hive, Delta Lake, Iceberg). Ensure the Trino user has the appropriate access to the backing data source. + +### Troubleshooting Common Errors + +| Error | Likely Cause | Fix | +|------------------------------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| `Access Denied: Cannot select from table` | The user lacks `SELECT` on the target table in the Trino access control rules or the underlying connector | Add `SELECT` permission for the user in `rules.json` or grant access in the underlying data source | +| `Access Denied: Cannot create table` | The enrichment user lacks `CREATE TABLE` on the target schema | Add `CREATE TABLE` permission in the access control rules or underlying data source | +| `Catalog does not exist` | The catalog name in the connection form does not match a configured Trino catalog | Verify available catalogs with `SHOW CATALOGS` in Trino | +| `Schema does not exist` | The schema name does not exist in the specified catalog | Verify available schemas with `SHOW SCHEMAS FROM ` | +| `Connection refused` | The Trino coordinator is not reachable or the port (default 8080) is incorrect | Verify the host, port, and that the Trino coordinator is running | +| `Authentication failed` | Incorrect username or password, or the Trino server requires a different authentication method | Verify credentials and check if the Trino server uses LDAP, Kerberos, or password authentication | + +### Detailed Troubleshooting Notes + +#### Authentication Errors + +The error `Authentication failed` indicates that the credentials are incorrect or the authentication method does not match the server configuration. + +Common causes: + +- **Incorrect password** — the password does not match the one configured in the Trino server. +- **Wrong authentication method** — the Trino server uses LDAP or Kerberos, but the connection form provides basic username/password. +- **HTTPS required** — the Trino coordinator requires HTTPS connections, but the connection is using HTTP. + +!!! note + Trino authentication is configured at the coordinator level. Check the `password-authenticator.properties` file for the configured authentication method. + +#### Permission Errors + +The error `Access Denied: Cannot select from table` or `Access Denied: Cannot create table` means the user authenticated successfully but lacks access to the target resource. + +Common causes: + +- **File-based access control** — the `rules.json` file does not grant the required permissions to the Qualytics user. +- **Connector-level security** — the underlying data source does not grant the necessary access. +- **Missing enrichment permissions** — for enrichment datastores, the user lacks `CREATE TABLE` or `INSERT` permissions in addition to `SELECT`. + +#### Connection Errors + +The error `Connection refused` or `Catalog does not exist` indicates a connectivity or configuration issue. + +Common causes: + +- **Coordinator not reachable** — the Trino coordinator host or port (default 8080) is incorrect. +- **Wrong catalog name** — the catalog name in the connection form does not match a configured Trino catalog. +- **Coordinator not running** — the Trino coordinator process is not started. + +!!! tip + Start by confirming credentials are valid (authentication errors), then verify access control rules (permission errors), and finally check coordinator connectivity (connection errors). + ## Add the Source Datastore A source datastore is a storage location used to connect to and access data from external sources. Trino is an example of such a datastore, specifically a type of JDBC datastore that supports connectivity through the JDBC API. Configuring the Trino datastore allows the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.