Commit 5ffbda7
committed
GH-49058: [Python] Disallow non-UTF-8 bytes in custom metadata
Schema.fbs defines metadata keys and values as flatbuffer strings,
which are required to be valid UTF-8. PyArrow was silently accepting
arbitrary byte sequences, producing schemas that violate the spec and
break cross-language interoperability (e.g. Rust enforces UTF-8 via
String).
Add a UTF-8 check in KeyValueMetadata.__init__ before handing bytes
to the C++ layer. Only runs when the input is bytes, so existing
TypeError behaviour for invalid types (e.g. integers) is unchanged.1 parent 5617e8d commit 5ffbda7
2 files changed
+34
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
785 | 785 | | |
786 | 786 | | |
787 | 787 | | |
788 | | - | |
789 | | - | |
790 | | - | |
791 | | - | |
792 | | - | |
793 | | - | |
794 | | - | |
795 | | - | |
796 | | - | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2314 | 2314 | | |
2315 | 2315 | | |
2316 | 2316 | | |
2317 | | - | |
2318 | | - | |
| 2317 | + | |
| 2318 | + | |
| 2319 | + | |
| 2320 | + | |
| 2321 | + | |
| 2322 | + | |
| 2323 | + | |
| 2324 | + | |
| 2325 | + | |
| 2326 | + | |
| 2327 | + | |
| 2328 | + | |
| 2329 | + | |
| 2330 | + | |
| 2331 | + | |
| 2332 | + | |
| 2333 | + | |
2319 | 2334 | | |
2320 | 2335 | | |
2321 | 2336 | | |
| |||
0 commit comments