Skip to content

ORDER BY pushdown ignores sort-order semantics mismatch (text collation, UUID) — merge joins fail with "mergejoin input data is out of order", wrong-results risk #276

@iskakaushik

Description

@iskakaushik

Summary

When the FDW pushes a sort down to ClickHouse, it advertises PostgreSQL pathkeys on the foreign path, but the rows come back in ClickHouse's native ordering, which differs from PostgreSQL's for at least two common key types:

  • text with any non-C collation (e.g. en_US.UTF-8, ICU): ClickHouse ORDER BY on String is raw byte order (memcmp), while PostgreSQL sorts per the column collation.
  • uuid: PostgreSQL sorts uuid by memcmp of the RFC-4122 byte representation; ClickHouse sorts UUID by its internal UInt128 layout, producing a completely different total order (no collation involved at all — so this is an order-semantics mismatch per type, not just a collation gap).

If the planner consumes such a pre-sorted foreign path in a merge join, the join's comparator observes order regressions and the query fails with:

ERROR:  mergejoin input data is out of order

Worse, merge join does not detect every misorder — a misordered input can silently produce an incorrect join result — and a plain ORDER BY query whose sort was pushed down silently returns mis-ordered rows to the client.

All references below are pinned to commit 94256f0.

Observed behavior

Queries joining two ClickHouse foreign tables on a shared text/uuid key (plus a local table), with that key in query_pathkeys, intermittently fail with mergejoin input data is out of order whenever the planner picks a merge join over the FDW's pre-sorted foreign paths (observed 54 times in 40 minutes in one production workload).

Demonstrating the order mismatch directly in ClickHouse:

-- ClickHouse String ORDER BY is byte order:
SELECT s FROM (SELECT arrayJoin(['apple','Banana','cherry','Émile','zebra']) AS s) ORDER BY s
-- => Banana, apple, cherry, zebra, Émile
-- PostgreSQL (en_US.UTF-8/ICU): apple, Banana, cherry, Émile, zebra

-- ClickHouse UUID ORDER BY does not match PostgreSQL uuid order:
SELECT toUUID(s) AS u
FROM (SELECT arrayJoin(['00000000-0000-0000-ffff-000000000000',
                        '00000000-0000-0001-0000-000000000000',
                        'ffffffff-ffff-ffff-0000-000000000001',
                        '00000000-0000-0000-0000-000000000002']) AS s)
ORDER BY u
-- => 00000000-0000-0001-0000-000000000000
--    ffffffff-ffff-ffff-0000-000000000001
--    00000000-0000-0000-0000-000000000002
--    00000000-0000-0000-ffff-000000000000
-- PostgreSQL uuid sorts these as 0000...-0002, 0000-ffff-..., 0000-0001-..., ffffffff-...

Code walkthrough

  1. get_useful_pathkeys_for_relation (src/fdw.c:519-566) accepts root->query_pathkeys for pushdown after checking only ec_has_volatile, EC-member existence, and chfdw_is_foreign_expr (src/fdw.c:550-552). There is no check of the pathkey's collation, opfamily, or whether ClickHouse's ordering for the expression's type matches PostgreSQL's.
  2. chfdw_is_foreign_expr / foreign_expr_walker (src/deparse.c:254, src/deparse.c:319 ff.) performs no collation tracking. The header comment (src/deparse.c:313-316) still describes postgres_fdw-style collation checking, but the corresponding state machine is absent; the T_Var case (src/deparse.c:336-359) accepts any foreign-table Var (except system columns) regardless of collation.
  3. add_paths_with_pathkeys_for_rel (src/fdw.c:2084) then adds foreign paths advertising those pathkeys, both for base relations (src/fdw.c:594) and pushed-down join relations (src/fdw.c:2324). The same unchecked pattern exists for final-sort pushdown in add_foreign_ordered_paths (src/fdw.c:2953-2981).
  4. appendOrderByClause (src/deparse.c:4723-4789) deparses the sort as plain ORDER BY <expr> ASC|DESC NULLS FIRST|LAST — ClickHouse evaluates it with String = memcmp and UUID = internal-representation order, regardless of the PostgreSQL column's comparison semantics.
  5. At execution, each scan issues a single remote query (src/fdw.c:1099-1153) whose stream preserves ClickHouse's order — so the rows delivered to the merge join are ordered by ClickHouse semantics while the path's pathkeys promise PostgreSQL semantics.

For comparison, postgres_fdw only ships sorts whose collation derives from foreign Vars, deparses explicit COLLATE/USING clauses, and targets a remote PostgreSQL that shares the sort semantics — none of which can hold for ClickHouse String/UUID.

Minimal repro shape

-- ClickHouse: two tables sharing a String (or UUID) key:
--   CREATE TABLE t1 (id String, v1 Int64) ENGINE=MergeTree ORDER BY id;
--   CREATE TABLE t2 (id String, v2 Int64) ENGINE=MergeTree ORDER BY id;
-- with mixed-case values such as 'apple', 'Banana', 'cherry' in both.

-- PostgreSQL (database with a non-C default collation, e.g. en_US.UTF-8):
CREATE FOREIGN TABLE ft1 (id text, v1 bigint) SERVER ch_server OPTIONS (table_name 't1');
CREATE FOREIGN TABLE ft2 (id text, v2 bigint) SERVER ch_server OPTIONS (table_name 't2');
CREATE TABLE local_t (id text PRIMARY KEY, note text);

SET enable_hashjoin = off;  -- nudge the planner toward merge join
SELECT *
FROM ft1
JOIN ft2 USING (id)
JOIN local_t USING (id)
ORDER BY id;
-- ORDER BY id puts the key into query_pathkeys, so the FDW advertises
-- pre-sorted foreign paths; a merge join over them fails with
--   ERROR: mergejoin input data is out of order
-- (or, undetected, returns an incomplete join result).
-- With uuid columns the failure occurs even in C-collation databases.

Related: #251 ("Incorporate ClickHouse ordering into query planner") would widen exposure to this same gap if implemented without an order-semantics check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpushdownImprovements to query pushdown

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions