Skip to content

Skip query to system.detached_tables for external DB engine#300

Open
yahor-s wants to merge 2 commits intomainfrom
skip-detached-tables
Open

Skip query to system.detached_tables for external DB engine#300
yahor-s wants to merge 2 commits intomainfrom
skip-detached-tables

Conversation

@yahor-s
Copy link
Contributor

@yahor-s yahor-s commented Mar 9, 2026

Fix issue with restore schema for external DB engines

"exception": "Code: 48. DB::Exception: Cannot get detached tables for DatabasePostgreSQL. (NOT_IMPLEMENTED) (version 25.3.12.8 (official build))"

Summary by Sourcery

Bug Fixes:

  • Prevent failures when restoring schemas for external DB engine databases by skipping detached table lookup in those cases.

@sourcery-ai
Copy link

sourcery-ai bot commented Mar 9, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds awareness of external database engines when collecting detached tables so that restore logic skips querying system.detached_tables for unsupported external DB engines, preventing NOT_IMPLEMENTED exceptions during schema restore.

Sequence diagram for skipping detached table query for external DB engines

sequenceDiagram
    participant RestoreLogic
    participant ClickHouseControl
    participant ClickHouseClient
    participant Logger

    RestoreLogic->>ClickHouseControl: get_detached_tables(databases)
    activate ClickHouseControl
    ClickHouseControl->>ClickHouseControl: result = []

    alt system.detached_tables exists
        ClickHouseControl->>ClickHouseControl: check databases for external DB engine
        alt external DB engine found
            ClickHouseControl->>Logger: warning(Skip getting detached tables...)
            ClickHouseControl-->>RestoreLogic: result
        else no external DB engine
            ClickHouseControl->>ClickHouseClient: query(GET_DETACHED_TABLES_SQL)
            ClickHouseClient-->>ClickHouseControl: rows from system.detached_tables
            ClickHouseControl->>ClickHouseControl: build Table.make_dummy(...) for each row
            ClickHouseControl-->>RestoreLogic: result
        end
    else system.detached_tables does not exist
        ClickHouseControl-->>RestoreLogic: result
    end
    deactivate ClickHouseControl
Loading

Updated class diagram for detached table restore handling

classDiagram
    class ClickHouseControl {
        +get_detached_tables(databases: Dict_str_Database) Sequence_Table
    }

    class Database {
        +name: str
        +engine: str
        +is_external_db_engine() bool
    }

    class Table {
        +uuid: str
        +make_dummy(database: str, table: str, uuid: str) Table
    }

    class RestoreContext {
        +ch_ctl: ClickHouseControl
    }

    class RestoreLogic {
        +_preprocess_tables_to_restore(context: RestoreContext, databases: Dict_str_Database) void
    }

    RestoreContext --> ClickHouseControl
    RestoreLogic --> RestoreContext
    ClickHouseControl --> Database
    ClickHouseControl --> Table
    RestoreLogic --> Table
    Database <.. Table
Loading

File-Level Changes

Change Details Files
Make detached tables discovery conditional on there being no external DB engine databases, and thread this requirement through the restore preprocessing path.
  • Change get_detached_tables to accept a databases mapping so it can inspect database engines before querying system.detached_tables
  • Short‑circuit detached table lookup when any database is backed by an external DB engine, logging a warning and returning an empty result
  • Update restore preprocessing to pass the databases mapping into get_detached_tables and continue building the detached_tables index from the returned list
ch_backup/clickhouse/control.py
ch_backup/logic/table.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new logic returns early on the first database with an external engine, which means detached tables will be skipped globally even if other databases are compatible; consider either checking whether any external DB exists before the loop or limiting/altering the underlying query so you only skip problematic databases rather than all.
  • Instead of relying on the caller to pass databases into get_detached_tables, consider retrieving the needed database metadata within Control itself to keep this method’s public API simpler and reduce coupling to the caller’s context structure.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new logic returns early on the first database with an external engine, which means detached tables will be skipped globally even if other databases are compatible; consider either checking whether any external DB exists before the loop or limiting/altering the underlying query so you only skip problematic databases rather than all.
- Instead of relying on the caller to pass `databases` into `get_detached_tables`, consider retrieving the needed database metadata within `Control` itself to keep this method’s public API simpler and reduce coupling to the caller’s context structure.

## Individual Comments

### Comment 1
<location path="ch_backup/clickhouse/control.py" line_range="834-836" />
<code_context>

+        for database in databases.values():
+            if database.is_external_db_engine():
+                logging.warning(
+                    f'Skip getting detached tables, because {database.name} have external DB engine {database.engine}.'
+                )
+                return result
</code_context>
<issue_to_address>
**nitpick (typo):** Minor wording/grammar issue in the warning message.

Suggest: `"Skipping detached tables: {database.name} uses external DB engine {database.engine}."` Or minimally, change `"have external DB engine"` to `"has external DB engine"` for correct grammar.

```suggestion
                logging.warning(
+                    f'Skipping detached tables: {database.name} uses external DB engine {database.engine}.'
+                )
```
</issue_to_address>

### Comment 2
<location path="ch_backup/logic/table.py" line_range="581" />
<code_context>

         detached_tables = {}
-        for table in context.ch_ctl.get_detached_tables():
+        for table in context.ch_ctl.get_detached_tables(databases):
             if table.uuid:
                 detached_tables[table.uuid] = table
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The new `databases` parameter tightly couples `get_detached_tables` behavior to the caller’s view of databases.

Because the method now relies on the provided `databases` dict to decide whether to skip the query, its behavior will change if callers pass only a subset (e.g., databases relevant to a specific restore), and the external-engine safeguard may not run. Consider either requiring callers to always pass the full mapping or having `get_detached_tables` construct that mapping itself so its behavior is consistent across call sites.

Suggested implementation:

```python
        detached_tables = {}
        # Let ch_ctl.get_detached_tables determine the appropriate database mapping
        # internally so its behavior is consistent across all call sites.
        for table in context.ch_ctl.get_detached_tables():
            if table.uuid:
                detached_tables[table.uuid] = table

```

To fully implement the suggestion (have `get_detached_tables` construct the database mapping itself so its behavior is consistent across call sites), you should also:

1. Update the definition of `get_detached_tables` (likely in the `ch_ctl` implementation) to:
   - Remove the required `databases` parameter or make it optional with a default of `None`.
   - Internally build or retrieve the full database mapping (e.g., via an existing `get_databases()`/`load_databases()` call) instead of relying on a caller-provided dict.
   - Ensure the external-engine safeguard logic uses this internally constructed, full mapping.

2. Update any other call sites that were changed to pass a `databases` argument into `get_detached_tables` so they now call it without arguments (or pass `None` if you keep an optional parameter).

This will decouple `get_detached_tables` from individual callers’ views of the database set and ensure the safeguard runs consistently regardless of who calls it or for what subset of databases.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +834 to +836
logging.warning(
f'Skip getting detached tables, because {database.name} have external DB engine {database.engine}.'
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Minor wording/grammar issue in the warning message.

Suggest: "Skipping detached tables: {database.name} uses external DB engine {database.engine}." Or minimally, change "have external DB engine" to "has external DB engine" for correct grammar.

Suggested change
logging.warning(
f'Skip getting detached tables, because {database.name} have external DB engine {database.engine}.'
)
logging.warning(
+ f'Skipping detached tables: {database.name} uses external DB engine {database.engine}.'
+ )


detached_tables = {}
for table in context.ch_ctl.get_detached_tables():
for table in context.ch_ctl.get_detached_tables(databases):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): The new databases parameter tightly couples get_detached_tables behavior to the caller’s view of databases.

Because the method now relies on the provided databases dict to decide whether to skip the query, its behavior will change if callers pass only a subset (e.g., databases relevant to a specific restore), and the external-engine safeguard may not run. Consider either requiring callers to always pass the full mapping or having get_detached_tables construct that mapping itself so its behavior is consistent across call sites.

Suggested implementation:

        detached_tables = {}
        # Let ch_ctl.get_detached_tables determine the appropriate database mapping
        # internally so its behavior is consistent across all call sites.
        for table in context.ch_ctl.get_detached_tables():
            if table.uuid:
                detached_tables[table.uuid] = table

To fully implement the suggestion (have get_detached_tables construct the database mapping itself so its behavior is consistent across call sites), you should also:

  1. Update the definition of get_detached_tables (likely in the ch_ctl implementation) to:

    • Remove the required databases parameter or make it optional with a default of None.
    • Internally build or retrieve the full database mapping (e.g., via an existing get_databases()/load_databases() call) instead of relying on a caller-provided dict.
    • Ensure the external-engine safeguard logic uses this internally constructed, full mapping.
  2. Update any other call sites that were changed to pass a databases argument into get_detached_tables so they now call it without arguments (or pass None if you keep an optional parameter).

This will decouple get_detached_tables from individual callers’ views of the database set and ensure the safeguard runs consistently regardless of who calls it or for what subset of databases.

@aalexfvk
Copy link
Contributor

Let's rewrite sql like white list

SELECT database, table, uuid
FROM system.detached_tables
WHERE database IN (
    SELECT name FROM system.databases
    WHERE engine IN ('Atomic', 'Ordinary', 'Replicated', 'Lazy')
)
FORMAT JSON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants