Skip to content

QUA-1632: Enhance athena documentation with permissions#1085

Merged
RafaelOsiro merged 1 commit intomainfrom
qua-1634-enhance-athena-datastore-with-minimum-permissions-required
Mar 26, 2026
Merged

QUA-1632: Enhance athena documentation with permissions#1085
RafaelOsiro merged 1 commit intomainfrom
qua-1634-enhance-athena-datastore-with-minimum-permissions-required

Conversation

@shindiogawa
Copy link
Copy Markdown
Contributor

Overview

This PR enhances the Athena source datastore documentation with detailed IAM permissions requirements, example
policies, and expanded troubleshooting guidance.

Key Changes

  • Athena Setup Guide section: Added a new setup guide section (before the datastore configuration steps)
    documenting all required IAM permissions for Athena, Glue, and S3.
  • Minimum permissions tables: Detailed tables for Athena query permissions, Glue catalog read-only permissions,
    and S3 query result output location permissions.
  • Example IAM policy: A ready-to-copy least-privilege IAM policy JSON covering all three AWS services (Athena,
    Glue, S3).
  • Lake Formation note: Added a note for customers using AWS Lake Formation–governed catalogs.
  • Detailed troubleshooting notes: Expanded troubleshooting with dedicated subsections for signature mismatch
    errors, S3 output location issues, permission-related errors, and general debugging guidance.

@shindiogawa shindiogawa self-assigned this Mar 26, 2026
@shindiogawa shindiogawa added the documentation Improvements or additions to documentation label Mar 26, 2026
@shindiogawa shindiogawa requested a review from RafaelOsiro March 26, 2026 21:15
@RafaelOsiro RafaelOsiro merged commit 94b0c59 into main Mar 26, 2026
1 check passed
@RafaelOsiro RafaelOsiro deleted the qua-1634-enhance-athena-datastore-with-minimum-permissions-required branch March 26, 2026 21:18
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 26, 2026

Greptile Summary

This PR enhances docs/source-datastore/add-datastores/athena.md with a new Athena Setup Guide section (~170 lines) that details the IAM permissions required to connect Qualytics to Amazon Athena via the Simba Athena JDBC driver. The additions include:\n\n- Minimum permission tables for Athena query actions, Glue catalog read-only access, and S3 output-location access\n- A ready-to-copy least-privilege IAM policy JSON covering all three services\n- A Lake Formation note and workgroup-scoping tip\n- Structured troubleshooting sections for signature mismatch errors, S3 output location problems, and AccessDenied errors, including a general debugging guidance table\n\nThe content is technically sound and well-organized. Two style-level suggestions were raised:\n\n- glue:GetCatalog / glue:GetCatalogs are newer cross-account Glue APIs that are not required for standard single-account Athena usage; labelling them as "minimum" permissions may mislead users on standard setups.\n- The example IAM policy combines bucket-level and object-level S3 actions in a single statement. While functionally correct, splitting them into two statements would align better with AWS IAM best-practice patterns and the resource-level distinctions already described in the permission table.

Confidence Score: 5/5

Documentation-only change; no production code affected — safe to merge.

All changes are documentation additions to a Markdown guide. The IAM permissions listed are technically accurate and functional, the troubleshooting content is well-structured and actionable, and internal anchor links resolve correctly. The two P2 suggestions are style/clarity improvements that do not block the PR.

No files require special attention; all changes are in a single documentation file.

Important Files Changed

Filename Overview
docs/source-datastore/add-datastores/athena.md Added ~170 lines of new Athena Setup Guide content covering minimum IAM permissions (Athena, Glue, S3), an example IAM policy, Lake Formation note, and detailed troubleshooting sections; also cleaned up trailing whitespace throughout the document. Two P2 style suggestions: glue:GetCatalog/glue:GetCatalogs may not be minimum permissions for standard setups, and the S3 policy statement could be split into bucket-level and object-level blocks for clarity.

Sequence Diagram

sequenceDiagram
    participant Q as Qualytics
    participant J as Simba Athena JDBC Driver
    participant A as AWS Athena
    participant G as AWS Glue
    participant S as AWS S3

    Q->>J: Connect (credentials, S3 output location, workgroup)
    J->>A: GetWorkGroup (validate workgroup config)
    J->>A: ListDatabases → delegates to Glue
    A->>G: GetDatabases / GetDatabase
    G-->>A: Database metadata
    A-->>J: Schema list
    J->>A: ListTableMetadata / GetTableMetadata
    A->>G: GetTables / GetTable / GetPartitions
    G-->>A: Table & column definitions
    A-->>J: Table metadata
    Q->>J: Execute query
    J->>A: StartQueryExecution (output → S3)
    J->>A: GetQueryExecution (poll status)
    A->>S: PutObject (write result files)
    A-->>J: Query succeeded
    J->>A: GetQueryResults (fetch rows via JDBC)
    J->>S: GetObject (read result files)
    S-->>J: Result data
    J-->>Q: ResultSet rows
Loading

Reviews (1): Last reviewed commit: "Enhance athena documentation with permis..." | Re-trigger Greptile

Comment on lines +34 to +37
| `glue:GetDatabase` / `glue:GetDatabases` | Read database metadata |
| `glue:GetCatalog` / `glue:GetCatalogs` | Read catalog metadata |
| `glue:GetTable` / `glue:GetTables` | Read table and column definitions |
| `glue:GetPartition` / `glue:GetPartitions` / `glue:BatchGetPartition` | Read partition metadata for query planning |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 glue:GetCatalog / glue:GetCatalogs may not be minimum permissions for standard setups

glue:GetCatalog and glue:GetCatalogs are part of the newer AWS Glue Data Catalog cross-account access APIs and are generally not required for standard single-account Athena usage. The classic minimum Glue permissions for Athena are GetDatabase/GetDatabases, GetTable/GetTables, and GetPartition/GetPartitions/BatchGetPartition.

Including them in a "Minimum Glue Permissions" table may confuse users on standard setups who see an AccessDenied for glue:GetCatalog when their AWS account or region doesn't expose that API surface, or who add these permissions unnecessarily.

Consider either:

  • Removing them from this "minimum" table and noting them under a separate "Cross-Account Catalog" row, or
  • Adding a qualifying note such as: "Only required if your account uses cross-account Glue catalog sharing; not needed for standard single-account setups."

This same concern applies to the glue:GetCatalog and glue:GetCatalogs entries in the example IAM policy at lines 85–86.

Comment on lines +96 to +111
"Sid": "S3QueryResultsBucket",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::<YOUR_BUCKET>",
"arn:aws:s3:::<YOUR_BUCKET>/<YOUR_PREFIX>/*"
]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 S3 statement mixes bucket-level and object-level actions — consider splitting for clarity

The single S3QueryResultsBucket statement lists both bucket-level actions (s3:ListBucket, s3:GetBucketLocation, s3:ListBucketMultipartUploads) and object-level actions (s3:PutObject, s3:GetObject, s3:ListMultipartUploadParts, s3:AbortMultipartUpload) applied to both the bucket ARN and the object-prefix ARN.

This is functionally correct — AWS silently ignores inapplicable action/resource combinations (e.g., s3:PutObject against a bare bucket ARN is a no-op). However, it diverges from the AWS IAM best-practice pattern and can mislead users who are learning how to write least-privilege S3 policies.

AWS recommends splitting the statement so the resource scope is explicit:

{
  "Sid": "S3QueryResultsBucketLevel",
  "Effect": "Allow",
  "Action": [
    "s3:ListBucket",
    "s3:GetBucketLocation",
    "s3:ListBucketMultipartUploads"
  ],
  "Resource": "arn:aws:s3:::<YOUR_BUCKET>"
},
{
  "Sid": "S3QueryResultsObjectLevel",
  "Effect": "Allow",
  "Action": [
    "s3:PutObject",
    "s3:GetObject",
    "s3:ListMultipartUploadParts",
    "s3:AbortMultipartUpload"
  ],
  "Resource": "arn:aws:s3:::<YOUR_BUCKET>/<YOUR_PREFIX>/*"
}

This also aligns the policy example with the resource column in the permission table above (lines 43–51), which correctly distinguishes bucket-level from object-level resources.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@RafaelOsiro RafaelOsiro changed the title Enhance athena documentation with permissions QUA-1632: Enhance athena documentation with permissions Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants