Unable to reproduce the same results: CALVIN benchmark

Thanks for the outstanding work!
I downloaded the author's open-source code, weights, and rlds dataset, and then reproduce the pro model, but my results differed significantly from the author's. The detailed logs are as follows:

> 1/5 : 71.1% | 2/5 : 46.2% | 3/5 : 27.9% | 4/5 : 17.3% | 5/5 : 11.6% ||: 100%|█████| 1000/1000 [16:34:50<00:00, 59.69s/it]
> Results for Epoch None:
> Average successful sequence length: 1.741
> Success rates for i instructions in a row:
> 1: 71.1%
> 2: 46.2%
> 3: 27.9%
> 4: 17.3%
> 5: 11.6%
> turn_off_led: 94 / 95 |  SR: 98.9%
> push_into_drawer: 47 / 70 |  SR: 67.1%
> lift_blue_block_drawer: 5 / 7 |  SR: 71.4%
> rotate_blue_block_right: 24 / 60 |  SR: 40.0%
> lift_pink_block_table: 68 / 107 |  SR: 63.6%
> open_drawer: 240 / 243 |  SR: 98.8%
> push_blue_block_left: 34 / 53 |  SR: 64.2%
> close_drawer: 99 / 102 |  SR: 97.1%
> push_pink_block_right: 24 / 51 |  SR: 47.1%
> place_in_slider: 59 / 99 |  SR: 59.6%
> push_red_block_right: 43 / 54 |  SR: 79.6%
> push_red_block_left: 46 / 57 |  SR: 80.7%
> lift_blue_block_table: 66 / 88 |  SR: 75.0%
> turn_on_lightbulb: 99 / 99 |  SR: 100.0%
> turn_on_led: 110 / 115 |  SR: 95.7%
> move_slider_right: 174 / 203 |  SR: 85.7%
> turn_off_lightbulb: 84 / 84 |  SR: 100.0%
> place_in_drawer: 48 / 52 |  SR: 92.3%
> lift_pink_block_drawer: 4 / 5 |  SR: 80.0%
> rotate_blue_block_left: 21 / 47 |  SR: 44.7%
> lift_red_block_table: 66 / 96 |  SR: 68.8%
> stack_block: 34 / 58 |  SR: 58.6%
> unstack_block: 12 / 13 |  SR: 92.3%
> rotate_pink_block_right: 43 / 58 |  SR: 74.1%
> lift_red_block_slider: 2 / 96 |  SR: 2.1%
> push_pink_block_left: 44 / 57 |  SR: 77.2%
> push_blue_block_right: 29 / 54 |  SR: 53.7%
> move_slider_left: 33 / 178 |  SR: 18.5%
> rotate_pink_block_left: 18 / 42 |  SR: 42.9%
> rotate_red_block_left: 24 / 47 |  SR: 51.1%
> lift_pink_block_slider: 21 / 78 |  SR: 26.9%
> rotate_red_block_right: 23 / 56 |  SR: 41.1%
> lift_red_block_drawer: 3 / 7 |  SR: 42.9%
> lift_blue_block_slider: 0 / 94 |  SR: 0.0%
> 
> Best model: epoch None with average sequences length of 1.741
> average success rate  tensor(1.7410, device='cuda:0')
> disconnecting id 0 from server

I'm not sure if there's a problem with the settings, and I hope to get an answer from the author.😊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the same results: CALVIN benchmark #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to reproduce the same results: CALVIN benchmark #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions