need clear and detail setup guide 

It is so difficult to setup this benchmark.
First of all, it requires openclaw but it doesn't tell you the openclaw requirements (docker, bare metal, remote openclaw support, etc)
Secondly, within openclaw, it needs to do some searching and web browsing stuff, my private vllm setup doesn't have Internet out of box (require http proxy to access internet), it didn't work and now I need to manually fix it, also later realized it requires search api key. Okay brave it is, but it's so blackbox that I didn't know all the extensions or skills or api that the openclaw needs, took me 3 days to figure out. 
The judge doesn't support third party api endpoint, again, our LLM endpoint are all going thru LLM proxy (litellm), but who know it has claude cli support for the judge. Adding another hack to make the judge work.
There is more random stuff that failed during the middle of the run and it require manual intervention to continue 

Can we have a good and human readable setup guide so that it can be run easily? Or even better, just give me a docker image self contain everything so that it can be run easily with all env var flag to pass in for third party integration so people can save some time using it.

I do like the test suite and benchmark items, but running it is such painful experience especially in a more restricted env. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need clear and detail setup guide #386

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

need clear and detail setup guide #386

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions