Add ensemble_batch_size for single-device inference#906
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request implements ensemble batching for single-device predictions in TabPFN, enabling multiple compatible ensemble members to be processed in a single forward pass to improve performance. The changes introduce an ensemble_batch_size parameter across the API and update the inference engines to handle batched outputs. Feedback focuses on fixing a potential shape mismatch in embedding extraction, addressing an unused parameter in the on-demand engine, and standardizing telemetry timing for model execution.
709dc67 to
c65fda0
Compare
adrian-prior
left a comment
There was a problem hiding this comment.
Hey @randommm,
Thanks for setting this up, and apologies for taking so long to get back on this! I do have some open questions, which I left on the PR, and there are also a bunch of comments left by Gemini, of which I think many are valid. Would you mind going through them?
e3643a9 to
a37bc35
Compare
|
@adrian-prior i think it should be all fixed now. |
On devices with large amounts of RAM like Strix Halo, this can greatly speed up results
a37bc35 to
a49e278
Compare
Issue
#905
Motivation and Context
On devices with large amounts of RAM like Strix Halo, this can greatly speed up results
Public API Changes
How Has This Been Tested?
local testing
Checklist
changelog/README.md), or "no changelog needed" label requested.