Summary
When reading multiple XML/HTML files (via glob patterns or array of paths), each file could be processed by a separate thread. Currently both DOM and SAX paths are single-threaded (MaxThreads() = 1).
Design
- Move per-file state from
XMLReadGlobalState into a new XMLReadLocalState : LocalTableFunctionState
- Global state provides mutex-protected file index assignment
- Each thread gets its own DOM or SAX resources
MaxThreads() returns files.size()
- Register
init_local callback on all read_xml / read_html table functions
Considerations
- Schema inference happens at bind time (single-threaded) — only extraction parallelizes
union_by_name may need special handling since schema merging happens during bind
- DOM path: each thread holds its own
XMLDocRAII — straightforward
- SAX path: each thread holds its own push parser context — straightforward
- libxml2 is thread-safe for independent parser contexts (no shared global state per CLAUDE.md guidelines)
Impact
For single-file reads: no change (1 thread).
For multi-file reads: up to N threads for N files.
Summary
When reading multiple XML/HTML files (via glob patterns or array of paths), each file could be processed by a separate thread. Currently both DOM and SAX paths are single-threaded (
MaxThreads() = 1).Design
XMLReadGlobalStateinto a newXMLReadLocalState : LocalTableFunctionStateMaxThreads()returnsfiles.size()init_localcallback on allread_xml/read_htmltable functionsConsiderations
union_by_namemay need special handling since schema merging happens during bindXMLDocRAII— straightforwardImpact
For single-file reads: no change (1 thread).
For multi-file reads: up to N threads for N files.