Skip query to system.detached_tables for external DB engine#300
Skip query to system.detached_tables for external DB engine#300
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds awareness of external database engines when collecting detached tables so that restore logic skips querying system.detached_tables for unsupported external DB engines, preventing NOT_IMPLEMENTED exceptions during schema restore. Sequence diagram for skipping detached table query for external DB enginessequenceDiagram
participant RestoreLogic
participant ClickHouseControl
participant ClickHouseClient
participant Logger
RestoreLogic->>ClickHouseControl: get_detached_tables(databases)
activate ClickHouseControl
ClickHouseControl->>ClickHouseControl: result = []
alt system.detached_tables exists
ClickHouseControl->>ClickHouseControl: check databases for external DB engine
alt external DB engine found
ClickHouseControl->>Logger: warning(Skip getting detached tables...)
ClickHouseControl-->>RestoreLogic: result
else no external DB engine
ClickHouseControl->>ClickHouseClient: query(GET_DETACHED_TABLES_SQL)
ClickHouseClient-->>ClickHouseControl: rows from system.detached_tables
ClickHouseControl->>ClickHouseControl: build Table.make_dummy(...) for each row
ClickHouseControl-->>RestoreLogic: result
end
else system.detached_tables does not exist
ClickHouseControl-->>RestoreLogic: result
end
deactivate ClickHouseControl
Updated class diagram for detached table restore handlingclassDiagram
class ClickHouseControl {
+get_detached_tables(databases: Dict_str_Database) Sequence_Table
}
class Database {
+name: str
+engine: str
+is_external_db_engine() bool
}
class Table {
+uuid: str
+make_dummy(database: str, table: str, uuid: str) Table
}
class RestoreContext {
+ch_ctl: ClickHouseControl
}
class RestoreLogic {
+_preprocess_tables_to_restore(context: RestoreContext, databases: Dict_str_Database) void
}
RestoreContext --> ClickHouseControl
RestoreLogic --> RestoreContext
ClickHouseControl --> Database
ClickHouseControl --> Table
RestoreLogic --> Table
Database <.. Table
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The new logic returns early on the first database with an external engine, which means detached tables will be skipped globally even if other databases are compatible; consider either checking whether any external DB exists before the loop or limiting/altering the underlying query so you only skip problematic databases rather than all.
- Instead of relying on the caller to pass
databasesintoget_detached_tables, consider retrieving the needed database metadata withinControlitself to keep this method’s public API simpler and reduce coupling to the caller’s context structure.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new logic returns early on the first database with an external engine, which means detached tables will be skipped globally even if other databases are compatible; consider either checking whether any external DB exists before the loop or limiting/altering the underlying query so you only skip problematic databases rather than all.
- Instead of relying on the caller to pass `databases` into `get_detached_tables`, consider retrieving the needed database metadata within `Control` itself to keep this method’s public API simpler and reduce coupling to the caller’s context structure.
## Individual Comments
### Comment 1
<location path="ch_backup/clickhouse/control.py" line_range="834-836" />
<code_context>
+ for database in databases.values():
+ if database.is_external_db_engine():
+ logging.warning(
+ f'Skip getting detached tables, because {database.name} have external DB engine {database.engine}.'
+ )
+ return result
</code_context>
<issue_to_address>
**nitpick (typo):** Minor wording/grammar issue in the warning message.
Suggest: `"Skipping detached tables: {database.name} uses external DB engine {database.engine}."` Or minimally, change `"have external DB engine"` to `"has external DB engine"` for correct grammar.
```suggestion
logging.warning(
+ f'Skipping detached tables: {database.name} uses external DB engine {database.engine}.'
+ )
```
</issue_to_address>
### Comment 2
<location path="ch_backup/logic/table.py" line_range="581" />
<code_context>
detached_tables = {}
- for table in context.ch_ctl.get_detached_tables():
+ for table in context.ch_ctl.get_detached_tables(databases):
if table.uuid:
detached_tables[table.uuid] = table
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The new `databases` parameter tightly couples `get_detached_tables` behavior to the caller’s view of databases.
Because the method now relies on the provided `databases` dict to decide whether to skip the query, its behavior will change if callers pass only a subset (e.g., databases relevant to a specific restore), and the external-engine safeguard may not run. Consider either requiring callers to always pass the full mapping or having `get_detached_tables` construct that mapping itself so its behavior is consistent across call sites.
Suggested implementation:
```python
detached_tables = {}
# Let ch_ctl.get_detached_tables determine the appropriate database mapping
# internally so its behavior is consistent across all call sites.
for table in context.ch_ctl.get_detached_tables():
if table.uuid:
detached_tables[table.uuid] = table
```
To fully implement the suggestion (have `get_detached_tables` construct the database mapping itself so its behavior is consistent across call sites), you should also:
1. Update the definition of `get_detached_tables` (likely in the `ch_ctl` implementation) to:
- Remove the required `databases` parameter or make it optional with a default of `None`.
- Internally build or retrieve the full database mapping (e.g., via an existing `get_databases()`/`load_databases()` call) instead of relying on a caller-provided dict.
- Ensure the external-engine safeguard logic uses this internally constructed, full mapping.
2. Update any other call sites that were changed to pass a `databases` argument into `get_detached_tables` so they now call it without arguments (or pass `None` if you keep an optional parameter).
This will decouple `get_detached_tables` from individual callers’ views of the database set and ensure the safeguard runs consistently regardless of who calls it or for what subset of databases.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| logging.warning( | ||
| f'Skip getting detached tables, because {database.name} have external DB engine {database.engine}.' | ||
| ) |
There was a problem hiding this comment.
nitpick (typo): Minor wording/grammar issue in the warning message.
Suggest: "Skipping detached tables: {database.name} uses external DB engine {database.engine}." Or minimally, change "have external DB engine" to "has external DB engine" for correct grammar.
| logging.warning( | |
| f'Skip getting detached tables, because {database.name} have external DB engine {database.engine}.' | |
| ) | |
| logging.warning( | |
| + f'Skipping detached tables: {database.name} uses external DB engine {database.engine}.' | |
| + ) |
|
|
||
| detached_tables = {} | ||
| for table in context.ch_ctl.get_detached_tables(): | ||
| for table in context.ch_ctl.get_detached_tables(databases): |
There was a problem hiding this comment.
suggestion (bug_risk): The new databases parameter tightly couples get_detached_tables behavior to the caller’s view of databases.
Because the method now relies on the provided databases dict to decide whether to skip the query, its behavior will change if callers pass only a subset (e.g., databases relevant to a specific restore), and the external-engine safeguard may not run. Consider either requiring callers to always pass the full mapping or having get_detached_tables construct that mapping itself so its behavior is consistent across call sites.
Suggested implementation:
detached_tables = {}
# Let ch_ctl.get_detached_tables determine the appropriate database mapping
# internally so its behavior is consistent across all call sites.
for table in context.ch_ctl.get_detached_tables():
if table.uuid:
detached_tables[table.uuid] = tableTo fully implement the suggestion (have get_detached_tables construct the database mapping itself so its behavior is consistent across call sites), you should also:
-
Update the definition of
get_detached_tables(likely in thech_ctlimplementation) to:- Remove the required
databasesparameter or make it optional with a default ofNone. - Internally build or retrieve the full database mapping (e.g., via an existing
get_databases()/load_databases()call) instead of relying on a caller-provided dict. - Ensure the external-engine safeguard logic uses this internally constructed, full mapping.
- Remove the required
-
Update any other call sites that were changed to pass a
databasesargument intoget_detached_tablesso they now call it without arguments (or passNoneif you keep an optional parameter).
This will decouple get_detached_tables from individual callers’ views of the database set and ensure the safeguard runs consistently regardless of who calls it or for what subset of databases.
|
Let's rewrite sql like white list |
Fix issue with restore schema for external DB engines
Summary by Sourcery
Bug Fixes: