Questions regarding implementation and evaluation code for StreamingBench

Hi, thank you for your great work and for sharing the code. I have a couple of questions regarding the implementation and evaluation:

Q1. In the paper, it is mentioned that a small Decision module and a large Reaction module are used. However, in the current codebase, it seems that a single large model is only used with a silent head. Where can I find the implementation of small decision module?

Q2. I am trying to reproduce the results reported in the paper for the PO task of StreamingBench, but I am not getting consistent performance. Would it be possible for you to release the evaluation code used for StreamingBench, especially for the PO task?

Thank you in advance for your time and support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding implementation and evaluation code for StreamingBench #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions regarding implementation and evaluation code for StreamingBench #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions