feat: add basic ReAct agent by justinwangx · Pull Request #10 · aisa-group/PostTrainBench

justinwangx · 2026-01-18T07:41:49Z

Great work on this! This PR adds the ra agent, which is a basic ReAct agent.

Since the scaffolds are getting increasingly advanced, I think this is a useful comparison to have. It allows us to track pure scaffold-induced gains in contrast to gains brought about by increased model quality.

max-andr · 2026-01-18T08:38:37Z

Oh hey, Justin! :)

We did think about the same thing internally. In fact, @hrdkbhatnagar is now running experiments with the OpenCode scaffold (https://opencode.ai/).

Do you have an opinion about how your RA scaffold compares to OpenCode?

justinwangx · 2026-01-18T09:42:36Z

Hi Maksym :D

I must have missed OpenCode when I was looking for a baseline. OpenCode is quite a bit more complicated but fundamentally runs the same ReAct loop, with some more complicated machinery. A difference is that OpenCode uses compaction, where some of the context is summarized as context limits are being approached; Ra drops messages when the limit would be exceeded.

I think the case for Ra is if you want what is close to the simplest possible ReAct agent (comparable to AISI's basic agent). If OpenCode is simple enough, then it might not be worth it to run Ra as a baseline.

max-andr · 2026-01-18T09:52:48Z

Got it! My current feeling is that it would be great to have both RA and OpenCode as two baselines of increasing complexity. We are currently preparing a v1 of PostTrainBench by the end of January, so we will need to see if we have enough time to run everything. Curious to hear opinions of @rank-and-file and @hrdkbhatnagar!

hrdkbhatnagar · 2026-01-20T00:07:03Z

hey Justin, thanks for the PR!

I think the ra-cli agnet could be a great baseline to control for the different scaffolds. I would love to test it out more (such as seeing how the cli and full autonomy work)

It's interesting to know that Ra drops messages when context limit is reached. I guess that would make the peformance with it quite low because of this, but nonetheless we think it is important to have proper scaffold baselines.

If everything goes well we could definitely try to get this in before the Jan deadline, if not, we will still incorporate this shortly after!

justinwangx · 2026-01-20T00:14:57Z

appreciate! good luck with the ICML push, and I'm curious to see the baseline results.

hrdkbhatnagar · 2026-02-14T20:01:15Z

Hey Justin! just a quick update, we have had a lot of changes in the core codebase in the past month. nonetheless, I would still love to know how the basic react agent performs on PTB. I will resolve any conflicts that have come up and then try to run the evals for this soon and hope to include it in our V1 release!

justinwangx · 2026-02-20T07:08:30Z

no worries! to be honest, am not sure if it is worth the time / spend. It is indeed a simple baseline -- but I am not sure how worthwhile it is when OpenCode also runs a ReAct loop.

hrdkbhatnagar · 2026-02-21T13:30:34Z

Ah I see, yes that is indeed true. @max-andr what do you think ?

max-andr · 2026-02-22T00:45:50Z

i think we are interested in the two extremes: (1) model family optimized scaffolds (e.g., Claude Code for Claude models) and (2) model family independent scaffolds (e.g., OpenCode for all models). but for (2) we are also interested in extracting as much capabilities out of an LLM as possible, so going with a stronger scaffold (like OpenCode instead of a baseline ReAct agent) sounds like a better choice!

feat: add basic ReAct agent

54ac052

hrdkbhatnagar added this to the V1 Release milestone Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add basic ReAct agent#10

feat: add basic ReAct agent#10
justinwangx wants to merge 1 commit intoaisa-group:mainfrom
justinwangx:add-basic-agent

justinwangx commented Jan 18, 2026

Uh oh!

max-andr commented Jan 18, 2026

Uh oh!

justinwangx commented Jan 18, 2026

Uh oh!

max-andr commented Jan 18, 2026 •

edited

Loading

Uh oh!

hrdkbhatnagar commented Jan 20, 2026 •

edited

Loading

Uh oh!

justinwangx commented Jan 20, 2026

Uh oh!

hrdkbhatnagar commented Feb 14, 2026

Uh oh!

justinwangx commented Feb 20, 2026

Uh oh!

hrdkbhatnagar commented Feb 21, 2026

Uh oh!

max-andr commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

justinwangx commented Jan 18, 2026

Uh oh!

max-andr commented Jan 18, 2026

Uh oh!

justinwangx commented Jan 18, 2026

Uh oh!

max-andr commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hrdkbhatnagar commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinwangx commented Jan 20, 2026

Uh oh!

hrdkbhatnagar commented Feb 14, 2026

Uh oh!

justinwangx commented Feb 20, 2026

Uh oh!

hrdkbhatnagar commented Feb 21, 2026

Uh oh!

max-andr commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

max-andr commented Jan 18, 2026 •

edited

Loading

hrdkbhatnagar commented Jan 20, 2026 •

edited

Loading