Skip to content

Feature Clickhouse connection#317

Merged
Bl3f merged 13 commits intogetnao:mainfrom
poupou-web3:feature/clickhouse_connection
Mar 17, 2026
Merged

Feature Clickhouse connection#317
Bl3f merged 13 commits intogetnao:mainfrom
poupou-web3:feature/clickhouse_connection

Conversation

@poupou-web3
Copy link
Copy Markdown

Add ClickHouse support

Summary

  • Added ClickHouse database support to the CLI (ClickHouseConfig, DatabaseType.CLICKHOUSE) using ibis.clickhouse and wired it into AnyDatabaseConfig/parsing.
  • Introduced indexes.md generation via a new DatabaseAccessor.INDEXES and indexes.md.j2 template so agents can see ORDER BY, PRIMARY KEY, and PARTITION BY when available for clickhouse only.
  • Improved table selection/preview logic for ClickHouse, including support for AggregateFunction columns and safer handling of engines that disallow direct SELECT (e.g. Kafka, stream-like engines).

Details

ClickHouse context

  • ClickHouseDatabaseContext builds preview queries from the table schema, selecting plain columns directly and using *-Merge (e.g. uniqMerge) for AggregateFunction columns.
  • Added dependancie to ibis-clickhouse
  • For engines that raise code 620 ("Direct select is not allowed"), the context automatically switches to a no-SELECT path, using SHOW CREATE TABLE + system.columns so sync still completes.
  • indexes() returns full DDL (SHOW CREATE TABLE) used by indexes.md so the agent knows how tables are ordered/partitioned for efficient querying.
  • System tables are not sync.

Generic improvements

  • DatabaseConfig.raw_sql_to_dataframe now understands multiple cursor/result shapes, including ClickHouse's result_rows/column_names.
  • New DatabaseContext.indexes() hook powers indexes.md while keeping non-ClickHouse databases unchanged (they simply don't emit indexes docs by default).

Testing

Added ClickHouse integration tests (test_clickhouse.py + clickhouse.sql) covering:

  • Directory structure and markdown outputs (including indexes.md).
  • Engine coverage: MergeTree, SummingMergeTree, ReplacingMergeTree, AggregatingMergeTree, Kafka engine, materialized views, and dictionaries.
  • Include/exclude filters and sync state.

Added docker-compose.test.yml and expanded .env.example for Postgres, ClickHouse, and MSSQL so integration tests can be run locally with:

docker compose -f docker-compose.test.yml up -d
cd cli && cp tests/nao_core/commands/sync/integration/.env.example tests/nao_core/commands/sync/integration/.env
uv run pytest tests/nao_core/commands/sync/integration/test_clickhouse.py -v

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 16 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cli/nao_core/templates/defaults/databases/indexes.md.j2">

<violation number="1" location="cli/nao_core/templates/defaults/databases/indexes.md.j2:16">
P2: **Duplicate database query:** `db.indexes()` is called twice — once in the `{% if %}` guard and once to render the content. Since the ClickHouse implementation executes `SHOW CREATE TABLE` via `raw_sql` on each call (no caching), this doubles the number of queries per table during sync. Use Jinja2's `{% set %}` to call it once and reuse the result.</violation>
</file>

<file name="cli/nao_core/config/databases/clickhouse.py">

<violation number="1" location="cli/nao_core/config/databases/clickhouse.py:228">
P1: Missing `GROUP BY` clause when `AggregateFunction` columns are present. The `-Merge` combinators (e.g. `uniqMerge`, `sumMerge`) are aggregate functions. When mixed with non-aggregated columns in SELECT, ClickHouse requires a `GROUP BY` on all non-aggregated columns; otherwise the query will fail. Consider detecting when aggregate expressions are present and adding `GROUP BY` over the plain columns.</violation>
</file>

<file name=".env.example">

<violation number="1" location=".env.example:53">
P2: Port/secure mismatch: `CLICKHOUSE_PORT=8443` is the HTTPS port but `CLICKHOUSE_SECURE=false`. For a non-secure connection the default HTTP port should be `8123` (consistent with the integration test `.env.example` and code defaults).</violation>
</file>

<file name="cli/nao_core/config/databases/base.py">

<violation number="1" location="cli/nao_core/config/databases/base.py:87">
P2: `hasattr(cursor, "description")` is `True` even when `cursor.description is None` (DB-API 2.0 sets it to `None` for non-returning statements). This will raise `TypeError` when iterating over `None`. Check that `description` is not `None`.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread cli/nao_core/config/databases/clickhouse.py
Comment thread cli/nao_core/templates/defaults/databases/indexes.md.j2 Outdated
Comment thread .env.example Outdated
Comment thread cli/nao_core/config/databases/base.py Outdated
@Bl3f
Copy link
Copy Markdown
Contributor

Bl3f commented Feb 22, 2026

Hello @poupou-web3 thank you so much for the Clickhouse addition! I will do some tests, run the integration tests once, do a small review and it should be merged asap

Copy link
Copy Markdown
Contributor

@Bl3f Bl3f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question and change around indexes.md, but apart from it LGTM. Thank you so much integrations tests worked well!

Welcome as a new contributor ✨

Comment thread apps/backend/src/components/system-prompt.tsx Outdated
Comment thread cli/nao_core/config/databases/base.py Outdated
Comment thread cli/nao_core/config/databases/clickhouse.py Outdated
Comment thread cli/nao_core/config/databases/clickhouse.py
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cli/nao_core/config/databases/clickhouse.py">

<violation number="1" location="cli/nao_core/config/databases/clickhouse.py:199">
P1: Bug: `columns()` returns `[]` when `_direct_select_disallowed` is True, but `column_count()` still returns the actual count from `system.columns` for the same condition. This means stream-like engines (Kafka, RabbitMQ, FileLog) will have no column metadata in generated docs, and `column_count` vs `columns` will be inconsistent. The previous code correctly fell back to `_columns_from_system()` here.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread cli/nao_core/config/databases/clickhouse.py Outdated
@poupou-web3
Copy link
Copy Markdown
Author

I realized that the index.md is also a duplicate of the columns.md, and we could just put everything in columns.md.

@Bl3f
Copy link
Copy Markdown
Contributor

Bl3f commented Feb 26, 2026

@poupou-web3 yes, it was a bit was I tried to say. Could you change it in the PR pls?

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 76 files (changes from recent commits).

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/frontend/src/components/settings/teams-form.tsx">

<violation number="1" location="apps/frontend/src/components/settings/teams-form.tsx:101">
P2: Post-save manifest download errors fail the entire submit flow after credentials are already persisted. This creates inconsistent UX where users see submission failures despite successful backend saves, potentially leading to duplicate retries. The manifest download operation should be wrapped in error handling to prevent non-critical download failures from affecting the save flow.</violation>
</file>

<file name="apps/shared/src/chart-builder.tsx">

<violation number="1" location="apps/shared/src/chart-builder.tsx:94">
P1: `buildKpiCard` ignores the `children` prop which contains the resolved title element, causing titles to not render for KPI cards. Unlike other chart builders (`buildBarChart`, `buildAreaChart`, `buildPieChart`) which render `{children}` to display the title, `buildKpiCard` completely bypasses the title pipeline.</violation>
</file>

<file name="apps/backend/src/services/teams.ts">

<violation number="1" location="apps/backend/src/services/teams.ts:246">
P2: Stop/completion card cleanup is not exception-safe; failures can leave stale interactive UI artifacts. Wrap the stream reading in a try/finally block to ensure cleanup always executes.</violation>
</file>

<file name="apps/frontend/src/components/settings/slack-form.tsx">

<violation number="1" location="apps/frontend/src/components/settings/slack-form.tsx:18">
P2: Unvalidated user input for `mentionName` may cause Slack manifest import failures. According to Slack's manifest documentation, `display_information.name` has a 35-character limit, and `bot_user.display_name` has an 80-character limit with only lowercase `a-z` characters allowed. The current code only trims the input before embedding it directly. Users following the UI hint "e.g. @nao" may enter special characters like '@' which would cause manifest import to fail. Add validation to ensure names comply with Slack's constraints.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


type ResolvedProps = BuildChartProps & Required<Pick<BuildChartProps, 'colorFor' | 'labelFormatter'>>;

function buildKpiCard(props: ResolvedProps) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: buildKpiCard ignores the children prop which contains the resolved title element, causing titles to not render for KPI cards. Unlike other chart builders (buildBarChart, buildAreaChart, buildPieChart) which render {children} to display the title, buildKpiCard completely bypasses the title pipeline.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/shared/src/chart-builder.tsx, line 94:

<comment>`buildKpiCard` ignores the `children` prop which contains the resolved title element, causing titles to not render for KPI cards. Unlike other chart builders (`buildBarChart`, `buildAreaChart`, `buildPieChart`) which render `{children}` to display the title, `buildKpiCard` completely bypasses the title pipeline.</comment>

<file context>
@@ -88,6 +91,49 @@ function buildResolved(props: BuildChartProps) {
 
 type ResolvedProps = BuildChartProps & Required<Pick<BuildChartProps, 'colorFor' | 'labelFormatter'>>;
 
+function buildKpiCard(props: ResolvedProps) {
+	const { data, series } = props;
+
</file context>

onSubmit: async ({ value }) => {
await onSubmit(value);
if (teamsRedirectUrl) {
await downloadTeamsManifestZip(value.appId, teamsRedirectUrl);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Post-save manifest download errors fail the entire submit flow after credentials are already persisted. This creates inconsistent UX where users see submission failures despite successful backend saves, potentially leading to duplicate retries. The manifest download operation should be wrapped in error handling to prevent non-critical download failures from affecting the save flow.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/frontend/src/components/settings/teams-form.tsx, line 101:

<comment>Post-save manifest download errors fail the entire submit flow after credentials are already persisted. This creates inconsistent UX where users see submission failures despite successful backend saves, potentially leading to duplicate retries. The manifest download operation should be wrapped in error handling to prevent non-critical download failures from affecting the save flow.</comment>

<file context>
@@ -0,0 +1,167 @@
+		onSubmit: async ({ value }) => {
+			await onSubmit(value);
+			if (teamsRedirectUrl) {
+				await downloadTeamsManifestZip(value.appId, teamsRedirectUrl);
+			}
+			form.reset();
</file context>

Comment thread apps/backend/src/services/teams.ts Outdated
const stream = agent.stream(chat.messages, { provider: 'teams' });
const stopCard = await ctx.thread.post(createStopButtonCard());

const state = await this._readStreamAndUpdateMessage(stream, ctx);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Stop/completion card cleanup is not exception-safe; failures can leave stale interactive UI artifacts. Wrap the stream reading in a try/finally block to ensure cleanup always executes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/backend/src/services/teams.ts, line 246:

<comment>Stop/completion card cleanup is not exception-safe; failures can leave stale interactive UI artifacts. Wrap the stream reading in a try/finally block to ensure cleanup always executes.</comment>

<file context>
@@ -0,0 +1,411 @@
+		const stream = agent.stream(chat.messages, { provider: 'teams' });
+		const stopCard = await ctx.thread.post(createStopButtonCard());
+
+		const state = await this._readStreamAndUpdateMessage(stream, ctx);
+
+		await stopCard.delete();
</file context>


function buildSlackManifest(webhookUrl: string) {
function buildSlackManifest(webhookUrl: string, mentionName: string) {
const name = mentionName.trim() || 'nao';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Unvalidated user input for mentionName may cause Slack manifest import failures. According to Slack's manifest documentation, display_information.name has a 35-character limit, and bot_user.display_name has an 80-character limit with only lowercase a-z characters allowed. The current code only trims the input before embedding it directly. Users following the UI hint "e.g. @Nao" may enter special characters like '@' which would cause manifest import to fail. Add validation to ensure names comply with Slack's constraints.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/frontend/src/components/settings/slack-form.tsx, line 18:

<comment>Unvalidated user input for `mentionName` may cause Slack manifest import failures. According to Slack's manifest documentation, `display_information.name` has a 35-character limit, and `bot_user.display_name` has an 80-character limit with only lowercase `a-z` characters allowed. The current code only trims the input before embedding it directly. Users following the UI hint "e.g. @nao" may enter special characters like '@' which would cause manifest import to fail. Add validation to ensure names comply with Slack's constraints.</comment>

<file context>
@@ -14,10 +14,11 @@ export interface SlackFormProps {
 
-function buildSlackManifest(webhookUrl: string) {
+function buildSlackManifest(webhookUrl: string, mentionName: string) {
+	const name = mentionName.trim() || 'nao';
 	return {
 		display_information: {
</file context>

@poupou-web3 poupou-web3 marked this pull request as draft March 12, 2026 05:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 13, 2026

🧹 Preview Removed

The preview deployment for this PR has been cleaned up.

@poupou-web3 poupou-web3 marked this pull request as ready for review March 13, 2026 05:06
@poupou-web3
Copy link
Copy Markdown
Author

@Bl3f That version is more consistent with other database integrations.
Also improves column descriptions and makes the index part smaller, and does not print the full DDL when possible.
The system tables, when explicitly included, can be fetched.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 27 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docker-compose.test.yml">

<violation number="1" location="docker-compose.test.yml:21">
P2: Test container exposes ClickHouse on all host interfaces with weak hardcoded credentials. Docker Compose ports bind to 0.0.0.0 by default, and combined with the password 'admin', this exposes a writable database. Bind to localhost only: `127.0.0.1:8123:8123` and `127.0.0.1:9000:9000`.</violation>
</file>

<file name="cli/tests/nao_core/commands/sync/integration/test_clickhouse.py">

<violation number="1" location="cli/tests/nao_core/commands/sync/integration/test_clickhouse.py:69">
P2: Test fixture performs overly broad cleanup by dropping all databases with shared prefix, which can interfere with concurrent test runs against the same ClickHouse instance</violation>

<violation number="2" location="cli/tests/nao_core/commands/sync/integration/test_clickhouse.py:103">
P2: Fixture leaks table in shared `default` database without cleanup</violation>

<violation number="3" location="cli/tests/nao_core/commands/sync/integration/test_clickhouse.py:328">
P2: test_sync_all_schemas() is dead code - will never execute due to unconditional skip</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread docker-compose.test.yml
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: admin
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
ports:
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Test container exposes ClickHouse on all host interfaces with weak hardcoded credentials. Docker Compose ports bind to 0.0.0.0 by default, and combined with the password 'admin', this exposes a writable database. Bind to localhost only: 127.0.0.1:8123:8123 and 127.0.0.1:9000:9000.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docker-compose.test.yml, line 21:

<comment>Test container exposes ClickHouse on all host interfaces with weak hardcoded credentials. Docker Compose ports bind to 0.0.0.0 by default, and combined with the password 'admin', this exposes a writable database. Bind to localhost only: `127.0.0.1:8123:8123` and `127.0.0.1:9000:9000`.</comment>

<file context>
@@ -0,0 +1,28 @@
+            CLICKHOUSE_USER: default
+            CLICKHOUSE_PASSWORD: admin
+            CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
+        ports:
+            - '8123:8123'
+            - '9000:9000'
</file context>
Fix with Cubic

# So test_sync_all_schemas passes: "default" DB must have spec.another_table so it gets synced
conn_default = _clickhouse_connect("default")
try:
conn_default.raw_sql("CREATE TABLE IF NOT EXISTS nonexistent (id UInt32) ENGINE = MergeTree() ORDER BY id;")
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Fixture leaks table in shared default database without cleanup

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cli/tests/nao_core/commands/sync/integration/test_clickhouse.py, line 103:

<comment>Fixture leaks table in shared `default` database without cleanup</comment>

<file context>
@@ -0,0 +1,366 @@
+        # So test_sync_all_schemas passes: "default" DB must have spec.another_table so it gets synced
+        conn_default = _clickhouse_connect("default")
+        try:
+            conn_default.raw_sql("CREATE TABLE IF NOT EXISTS nonexistent (id UInt32) ENGINE = MergeTree() ORDER BY id;")
+            conn_default.disconnect()
+        except Exception:
</file context>
Fix with Cubic

"""Create a temporary database, populate it with test data, then clean up."""
conn = _clickhouse_connect("default")
try:
# Drop any leftover nao_unit_tests_* DBs from previous runs so get_schemas returns only ours
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Test fixture performs overly broad cleanup by dropping all databases with shared prefix, which can interfere with concurrent test runs against the same ClickHouse instance

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cli/tests/nao_core/commands/sync/integration/test_clickhouse.py, line 69:

<comment>Test fixture performs overly broad cleanup by dropping all databases with shared prefix, which can interfere with concurrent test runs against the same ClickHouse instance</comment>

<file context>
@@ -0,0 +1,366 @@
+    """Create a temporary database, populate it with test data, then clean up."""
+    conn = _clickhouse_connect("default")
+    try:
+        # Drop any leftover nao_unit_tests_* DBs from previous runs so get_schemas returns only ours
+        list_db = getattr(conn, "list_databases", None)
+        if list_db:
</file context>
Fix with Cubic

assert "primary key user_id" in content
assert "order by user_id" in content

def test_sync_all_schemas(self, tmp_path_factory, db_config, spec):
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: test_sync_all_schemas() is dead code - will never execute due to unconditional skip

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cli/tests/nao_core/commands/sync/integration/test_clickhouse.py, line 328:

<comment>test_sync_all_schemas() is dead code - will never execute due to unconditional skip</comment>

<file context>
@@ -0,0 +1,366 @@
+        assert "primary key user_id" in content
+        assert "order by user_id" in content
+
+    def test_sync_all_schemas(self, tmp_path_factory, db_config, spec):
+        """Overrides base test to test because clickhouse ddl has multiple tables.
+        When schema is not specified, all schemas (temp DB + default with nonexistent) are synced."""
</file context>
Fix with Cubic

@Bl3f Bl3f merged commit aa0b64b into getnao:main Mar 17, 2026
3 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🚀 Preview Deployment

URL https://pr-317-b039217.preview.getnao.io
Commit b039217

⚠️ No LLM API keys configured - you'll see the API key setup flow when trying to chat.


Preview will be automatically removed when this PR is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants