CoSee is a research prototype for studying shared-state collaboration in resource-constrained visual agents. The code implements a small multimodal collaboration loop where role-specialized agents write evidence notes to a shared Board before producing an answer.
The repository is aligned with the paper direction, Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents, and focuses on auditable intermediate state, bounded generation calls, and structured JSONL experiment logs.
Implemented:
- Core Board data structure in
cosee/board.py - Agent and Action abstractions in
cosee/agents.py - Sequential CoSee controller in
cosee/controller.py - Normalized JSONL dataset loader in
cosee/data/datasets.py - Lazy-loading Qwen VL wrapper in
cosee/models/qwen_vl_wrapper.py - Baseline, single-board, multi-agent, export, and aggregation scripts under
scripts/
Not included in the repository:
- Dataset files under
data/ - Local Qwen model weights under
models/ - Published experiment result files under
results/
git clone https://github.com/ClayKa/CoSee.git
cd CoSee
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtThe code uses Python 3.10+ syntax.
This command does not require model weights or datasets:
python -m scripts.run_dummy_coseeExpected behavior: it runs the controller with dummy agents and prints a final answer plus a Board summary.
All experiment scripts expect normalized JSONL annotations:
data/
slidevqa/
images/
annotations/slidevqa.jsonl
chartqapro/
images/
annotations/chartqapro.jsonl
vqaonline/
images/
annotations/vqaonline.jsonl
Each JSONL row follows this schema. image_paths entries are relative to the repository root:
{
"id": "chartqapro_test_000001",
"dataset": "chartqapro",
"split": "test",
"image_paths": ["data/chartqapro/images/example.png"],
"question": "What is the highest value?",
"answer": "42",
"meta": {}
}Export helpers:
python -m scripts.export_chartqapro_toy
python -m scripts.export_slidevqa_toy
python -m scripts.export_vqaonline_toyThese exporters may download from Hugging Face and require network access plus the datasets or huggingface_hub packages.
Set COSEE_MODEL_PATH to a local Qwen VL checkpoint directory:
export COSEE_MODEL_PATH=/path/to/Qwen3-VL-4B-InstructYou can also pass --model-path directly to the run scripts.
For a real model-backed run, the order is:
-
Export one normalized dataset:
python -m scripts.export_chartqapro_toy
-
Point CoSee at a local Qwen VL checkpoint:
export COSEE_MODEL_PATH=/path/to/Qwen3-VL-4B-Instruct -
Run a small experiment:
python -m scripts.run_cosee_on_dataset \ --dataset chartqapro \ --split test \ --agent-config two_qwen \ --max-examples 5 \ --device cuda \ --log-compute
Single Qwen baseline:
python -m scripts.run_qwen_single_baseline \
--dataset chartqapro \
--split test \
--max-examples 50 \
--device cuda \
--log-computeTwo-agent CoSee:
python -m scripts.run_cosee_on_dataset \
--dataset chartqapro \
--split test \
--agent-config two_qwen \
--max-steps 3 \
--max-examples 50 \
--device cuda \
--log-computeSingle-agent Board variant:
python -m scripts.run_single_qwen_board \
--dataset slidevqa \
--split train \
--max-examples 50 \
--device cuda \
--log-computeAggregate JSONL results:
python -m scripts.aggregate_results \
--mode cosee \
--dataset chartqapro \
--inputs results/cosee_two_qwen_chartqapro_test.jsonl--log-computerecords generation call counts and generated token counts.--num-shards,--shard-id, and--run-all-shardssupport long runs on larger splits.--resume-fromskips already processed example IDs.data/,models/, andresults/are intentionally ignored by git.
No license file is currently included. Add one before public redistribution if needed.