Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 182 additions & 0 deletions polaris-shell/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# Polaris Shell

An interactive SQL shell for exploring [Apache Iceberg](https://iceberg.apache.org/) tables and catalog metadata through [Apache Polaris](https://polaris.apache.org/) via its REST catalog API. No Spark, no Flink, no heavyweight engine — just a single fat JAR and a properties file.

Polaris Shell complements Polaris with a SQL interface for answering routine questions about your catalog — how many tables are in a namespace, how many snapshots a table has, where it is stored, whether it has too many small files — without spinning up Trino, Spark, or pyiceberg.

> **SELECT queries read data directly via the Iceberg Java library and are intended for sampling and exploration, not production workloads.**

> **Try it in minutes** — a fully self-contained Docker demo is included. See [demo/README.md](demo/README.md).

---

## How it works

Polaris Shell connects to a Polaris server using the **Iceberg REST catalog protocol** and OAuth2 client credentials. It parses SQL statements with an [ANTLR 4](https://www.antlr.org/) grammar, converts them to Iceberg API calls, and prints results to the terminal. No JDBC driver, no query engine — queries are executed directly through the Iceberg Java library against the catalog.

```
SQL input → ANTLR parser → QueryPlan → Iceberg REST catalog API → results
```

---

## Supported commands

| Command | Example |
|---|---|
| `SELECT` with predicate, projection, ORDER BY, LIMIT | `SELECT id, amount FROM retail.orders WHERE region = 'us-east-1' ORDER BY amount DESC LIMIT 10` |
| `SHOW TABLES IN <namespace>` | `SHOW TABLES IN retail` |
| `DESCRIBE STATS <table>` | `DESCRIBE STATS retail.orders` |
| `SHOW TABLE LOCATION <table>` | `SHOW TABLE LOCATION retail.products` |
| `SHOW TABLE POLICIES <table>` | `SHOW TABLE POLICIES retail.orders` |
| `DIAGNOSE TABLE <table>` | `DIAGNOSE TABLE retail.orders` |
| `EXPLAIN SELECT ...` | `EXPLAIN SELECT * FROM retail.orders WHERE region = 'us-east-1'` |

**`EXPLAIN`** shows the Iceberg scan plan: snapshot info, partition spec, manifest and data-file counts before and after filter pushdown, estimated bytes scanned, and any warnings (small files, missing column statistics).

**`DIAGNOSE`** scans the table's data files and reports how many are below the 128 MiB target size — a quick check for compaction candidates.

SQL keywords are case-insensitive. Predicates support `=`, `!=`, `<>`, `<`, `<=`, `>`, `>=`, `IS NULL`, `IS NOT NULL`, `IN (...)`, `NOT IN (...)`, `AND`, `OR`, and `NOT`.

### Sample output

```
sql> SELECT id, region, amount FROM retail.orders WHERE region = 'us-east-1' LIMIT 3
id=1, region=us-east-1, amount=312
id=2, region=us-east-1, amount=87
id=5, region=us-east-1, amount=204
(3 rows)

sql> SHOW TABLES IN retail
namespace: retail
tableCount: 3
tables: [retail.orders, retail.products, retail.regions]

sql> DIAGNOSE TABLE retail.orders
smallFileThresholdBytes: 134217728
smallFileCount: 4

sql> EXPLAIN SELECT * FROM retail.orders WHERE region = 'us-east-1'
┌──────────────────────────────────────────────────────────────────────┐
│ ICEBERG SCAN PLAN — retail.orders │
├──────────────────────────────────────────────────────────────────────┤
│ Snapshot ID │ 7326491023847162 │
│ Snapshot timestamp (ms) │ 1715000000000 │
│ Partition spec │ [region: identity] │
│ Schema columns │ 5 │
│ Projected columns │ 5 │
├──────────────────────────────────────┬──────────────────────────────┤
│ Total manifest files │ 3 │
│ Manifests after pruning │ 1 │
│ Data files total │ 10 │
│ Data files after filter │ 2 (80.0% eliminated) │
│ Estimated bytes scanned │ 1.2 MiB │
│ Pushdown filter │ ref(name="region") == ... │
└──────────────────────────────────────┴──────────────────────────────┘
```

---

## Limitations

- **Single-table reads only** — no `JOIN`
- **No aggregate functions** — `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, and `GROUP BY` are not supported
- **No DML** — `INSERT`, `UPDATE`, and `DELETE` are not supported
- **No DDL** — `CREATE TABLE`, `DROP TABLE`, and `ALTER TABLE` are not supported
- **`ORDER BY` is in-memory** — all rows matching the filter are fetched before sorting; use `LIMIT` to bound the result set
- **No subqueries or CTEs**

---

## Quick start

### Prerequisites
- Java 21+
- A running Polaris server (or use the [demo](demo/README.md) — no server setup required)

### 1. Build

```bash
./gradlew generateGrammarSource shadowJar
```

This produces `build/libs/polaris-shell-demo.jar`.

### 2. Configure

Copy the example properties file and fill in your Polaris connection details:

```bash
cp polaris-sql-demo.properties.example polaris-sql-demo.properties
```

```properties
# Required
polaris.uri=http://localhost:8181/api/catalog
polaris.warehouse=my-catalog
polaris.client.id=root
polaris.client.secret=s3cr3t

# Optional
polaris.token.endpoint=http://localhost:8181/api/catalog/v1/oauth/tokens
cli.max-display-rows=100

# S3 / MinIO FileIO properties (pass-through to the Iceberg catalog)
# s3.endpoint=http://localhost:9000
# s3.path-style-access=true
# io-impl=org.apache.iceberg.aws.s3.S3FileIO
```

Any property not prefixed with `polaris.` or `cli.` is passed directly to the Iceberg catalog (useful for S3 region, MinIO credentials, custom FileIO implementations, etc.).

### 3. Run

```bash
java -jar build/libs/polaris-shell-demo.jar polaris-sql-demo.properties
```

```
Connecting to Polaris at http://localhost:8181/api/catalog ...
Authenticated. Type SQL statements or 'exit' to quit.

sql> SHOW TABLES IN retail
sql> SELECT * FROM retail.orders WHERE amount > 100 LIMIT 5
sql> EXPLAIN SELECT * FROM retail.orders WHERE region = 'us-east-1'
sql> exit
```

---

## Demo

The [`demo/`](demo/README.md) directory contains a fully local environment using **Docker Compose + MinIO** — no AWS account or external Polaris server required. It spins up Polaris, MinIO, and seeds three sample Iceberg tables in under a minute.

See **[demo/README.md](demo/README.md)** for step-by-step instructions.

---

## Configuration reference

| Property | Required | Default | Description |
|---|---|---|---|
| `polaris.uri` | Yes | — | Polaris REST catalog base URI |
| `polaris.warehouse` | Yes | — | Warehouse / catalog name |
| `polaris.client.id` | Yes | — | OAuth2 client ID |
| `polaris.client.secret` | Yes | — | OAuth2 client secret |
| `polaris.token.endpoint` | No | `{polaris.uri}/v1/oauth/tokens` | Token endpoint override |
| `cli.max-display-rows` | No | `100` | Row cap for SELECT output |
| *(any other key)* | No | — | Passed through to the Iceberg catalog |

---

## Building from source

```bash
# Generate ANTLR sources and build the fat JAR
./gradlew generateGrammarSource shadowJar

# Run tests
./gradlew test
```

Requires Java 21. The Gradle wrapper is included — no local Gradle installation needed.
146 changes: 146 additions & 0 deletions polaris-shell/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar

plugins {
id("java")
alias(libs.plugins.shadow)
}

java {
toolchain {
languageVersion = JavaLanguageVersion.of(21)
}
}

// Isolated configuration: ANTLR 4 tool jar, does not leak into compile/runtime
val antlrTool: Configuration by configurations.creating

val antlrOutputDir = layout.buildDirectory.dir("generated/antlr/main")
val antlrPackageDir = layout.buildDirectory.dir("generated/antlr/main/org/apache/polaris/tools/grammar")
val grammarFile = file("src/main/antlr/IcebergSQL.g4")

val generateGrammarSource by tasks.registering(JavaExec::class) {
description = "Generate Java sources from IcebergSQL.g4 using ANTLR 4"
group = "build"

classpath = antlrTool
mainClass = "org.antlr.v4.Tool"

doFirst { antlrPackageDir.get().asFile.mkdirs() }

args = listOf(
"-visitor",
"-no-listener",
"-package", "org.apache.polaris.tools.grammar",
"-o", antlrPackageDir.get().asFile.absolutePath, // output into full package path
grammarFile.absolutePath
)

inputs.file(grammarFile)
outputs.dir(antlrOutputDir) // declare root as output for incremental build tracking
}

sourceSets {
main {
java { srcDir(antlrOutputDir) } // root — Java compiler walks subdirs automatically
}
}

tasks.named<JavaCompile>("compileJava") {
dependsOn(generateGrammarSource)
}

dependencies {
antlrTool(libs.antlr4) // ANTLR 4 tool — code-gen only, not shipped
implementation(libs.antlr4.engine.runtime) // ANTLR 4 runtime — shipped in our jar

implementation(platform(libs.iceberg.bom))
implementation("org.apache.iceberg:iceberg-api")
implementation("org.apache.iceberg:iceberg-core")
implementation("org.apache.iceberg:iceberg-data")
implementation("org.apache.iceberg:iceberg-parquet")
implementation("org.apache.iceberg:iceberg-aws")
implementation("org.apache.parquet:parquet-column:1.16.0")
// iceberg-aws declares ALL AWS SDK deps as compileOnly — none appear in its
// published metadata, so every module it references must be added explicitly
// here to be bundled in the shadow jar.
runtimeOnly(libs.awssdk.s3)
runtimeOnly(libs.awssdk.sts)
runtimeOnly(libs.awssdk.kms)
runtimeOnly(libs.awssdk.dynamodb)
runtimeOnly(libs.awssdk.glue)
runtimeOnly(libs.awssdk.lakeformation)
runtimeOnly(libs.awssdk.url.connection.client)

implementation(libs.guava)
implementation(libs.slf4j.api)
runtimeOnly("org.slf4j:slf4j-simple:${libs.versions.slf4j.get()}")

implementation(libs.hadoop.common)
implementation(libs.hadoop.client.runtime)

// ── Test dependencies ──────────────────────────────────────────────────────
testImplementation(platform(libs.junit.bom))
testImplementation("org.junit.jupiter:junit-jupiter-api")
testRuntimeOnly("org.junit.jupiter:junit-jupiter-engine")
testRuntimeOnly("org.junit.platform:junit-platform-launcher")

testImplementation(libs.assertj)
testImplementation(libs.mockito.junit.jupiter)

testImplementation(platform(libs.testcontainers.bom))
testImplementation("org.testcontainers:testcontainers")
testImplementation("org.testcontainers:testcontainers-junit-jupiter")
// Provides org.apache.polaris.test.minio.MinioContainer
testImplementation("org.apache.polaris:polaris-minio-testcontainer:1.4.1")
}

tasks.named<Test>("test") {
useJUnitPlatform()
// Integration tests require Docker + running Polaris/MinIO containers;
// exclude them from the default test task so normal builds succeed.
exclude("**/*IntegrationTest*")
}

tasks.register<Test>("integrationTest") {
description = "Runs integration tests that require Docker (Polaris + MinIO)."
group = "verification"
useJUnitPlatform()
testClassesDirs = sourceSets["test"].output.classesDirs
classpath = sourceSets["test"].runtimeClasspath
include("**/*IntegrationTest*")
}

// ── Demo fat jar ──────────────────────────────────────────────────────────────
tasks.named<ShadowJar>("shadowJar") {
archiveClassifier.set("demo")
mergeServiceFiles()
isZip64 = true
manifest {
attributes("Main-Class" to "org.apache.polaris.tools.cli.PolarisShell")
}
// Exclude SLF4J 1.7.x bindings that leak in from Hadoop transitive deps;
// we ship slf4j-simple 2.x as the provider instead.
exclude("org/slf4j/impl/StaticLoggerBinder.class")
exclude("org/slf4j/impl/StaticMDCBinder.class")
exclude("org/slf4j/impl/StaticMarkerBinder.class")
}
60 changes: 60 additions & 0 deletions polaris-shell/demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Polaris Shell — Local Demo

No AWS account or external Polaris server required. Everything runs locally via Docker.

## Prerequisites
- Docker + Docker Compose
- Java 21+

## Steps

0. **Build the fat jar** (from the `polaris-shell` directory)
```bash
cd polaris-shell
./gradlew generateGrammarSource shadowJar
```
Then change into the demo directory:
```bash
cd demo
```

1. **Start the environment**
```bash
docker compose up -d
```

2. **Seed demo data** (run once)
```bash
./seed.sh
```
This creates three Iceberg tables in MinIO:
- `retail.orders` — 200 rows, partitioned by `region`
- `retail.products` — 50 rows, unpartitioned
- `retail.regions` — 3 rows, reference data

3. **Launch the SQL shell**
```bash
java -jar ../build/libs/polaris-shell-demo.jar demo.properties
```

## Example queries
```sql
SHOW TABLES IN retail

SELECT * FROM retail.orders WHERE region = 'us-east-1' LIMIT 10

SELECT * FROM retail.orders WHERE amount > 200 ORDER BY amount DESC LIMIT 5

DESCRIBE STATS retail.orders

DIAGNOSE TABLE retail.orders

EXPLAIN SELECT * FROM retail.orders WHERE region = 'us-east-1'

SHOW TABLE LOCATION retail.products
```

## Tear down
```bash
docker compose down -v
```
Loading