CopyFrom: auto-detect binary/text format with text fallback by jackc · Pull Request #2521 · jackc/pgx

jackc · 2026-03-20T01:17:15Z

CopyFrom previously hardcoded binary format and used tryScanStringCopyValueThenEncode as a workaround when binary encoding failed. This was fragile and couldn't handle types that only support text format (e.g. jsonpath, aclitem).

Now CopyFrom peeks at the first row and checks both codec-level format support and value-level encode plan availability. If any column cannot use binary, it transparently falls back to text format COPY, letting PostgreSQL handle the parsing natively.

Remove tryScanStringCopyValueThenEncode
Add encodeCopyValueText with proper COPY text escaping
Add canUseBinaryFormat two-level detection
Add buildCopyBufText for text format row encoding
Buffer first row for format decision without data loss
Add tests for text fallback, special char escaping, NULLs, large datasets, all query exec modes, string-to-int conversion, and empty row sets

CopyFrom previously hardcoded binary format and used tryScanStringCopyValueThenEncode as a workaround when binary encoding failed. This was fragile and couldn't handle types that only support text format (e.g. jsonpath, aclitem). Now CopyFrom peeks at the first row and checks both codec-level format support and value-level encode plan availability. If any column cannot use binary, it transparently falls back to text format COPY, letting PostgreSQL handle the parsing natively. - Remove tryScanStringCopyValueThenEncode - Add encodeCopyValueText with proper COPY text escaping - Add canUseBinaryFormat two-level detection - Add buildCopyBufText for text format row encoding - Buffer first row for format decision without data loss - Add tests for text fallback, special char escaping, NULLs, large datasets, all query exec modes, string-to-int conversion, and empty row sets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

abrightwell · 2026-03-21T14:05:31Z

I like this approach. Though, I'm curious about the value-level check. The idea behind the first-row peek makes sense. But it adds complexity and doesn't reliably catch potential caller type mismatches.

For instance:

CREATE TABLE (a int);

Fails:

rows := [][]any{
	{int32(42)},
	{"42"},
}

Succeeds:

rows := [][]any{
	{"42"},
	{int32(42)},
}

Admittedly, I'm sure this is an unlikely edge-case, as I would expect that the input rows would be uniform in format. But, it seemed worth bringing attention to.

Regardless, could the value types be checked without buffering the row itself?

jackc · 2026-03-21T17:59:18Z

@abrightwell To be honest, I haven't looked closed at this yet. I was more curious to see if Claude could do it at all, and the results seemed plausible. But it would definitely need careful review. It would also need careful consideration of if auto-fallback is desirable behavior. Could make performance less understandable.

abrightwell · 2026-03-24T13:05:00Z

Oh for sure, it definitely seems like Claude might be on to something with the approach. In fact, it got me thinking slightly differently.

It would also need careful consideration of if auto-fallback is desirable behavior.

Yeah, this is where I've shifted my thinking. Perhaps instead of attempting to predict the format, allow for it to be explicitly set by the caller with a reasonable default? For instance, adding an optional parameter on CopyFrom for a CopyFromFormat, where CopyFromFormatText is the default (based on COPY command docs).

The binary path could do a simple column codec check, returning an error if any of column types do not support it. Maybe something like: "column %s type %s does not support binary format, use CopyFromFormatText". And similarly if the caller passes incompatible data.

Could make performance less understandable.

Agreed. I think requiring the caller to be aware of and explicit about which is most appropriate for their use-case, while being able to fail-fast with actionable information could help with that. It might present an initial performance regression going from a default binary to text, but explicit control could reduce the friction there?

jackc mentioned this pull request Mar 20, 2026

Multidimensional array flattened when using CopyFrom instead of Exec+Insert #2385

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CopyFrom: auto-detect binary/text format with text fallback#2521

CopyFrom: auto-detect binary/text format with text fallback#2521
jackc wants to merge 1 commit intomasterfrom
copy-from-text-format

jackc commented Mar 20, 2026

Uh oh!

abrightwell commented Mar 21, 2026

Uh oh!

jackc commented Mar 21, 2026

Uh oh!

abrightwell commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jackc commented Mar 20, 2026

Uh oh!

abrightwell commented Mar 21, 2026

Uh oh!

jackc commented Mar 21, 2026

Uh oh!

abrightwell commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants