Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding
SemKey a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. By utilizing these semantic attributes in conjunction with encoded EEG signals, we achieve state-of-the-art (SOTA) performance in EEG-to-text generation.
Tip
You can find all required packages in ./environment.yml
# Create environment
conda env create -f environment.yml
# Activate environment
conda activate semkey# Additionally, removal of environment
conda env remove -n semkeyPlease download ZuCo 1.0 and 2.0 from their official site:
Important
Please rename ZuCo2 directories (follows ZuCo1 task naming):
"task1 - NR" -> "task2-NR"
"task2 - TSR" -> "task3-TSR"
Please also remove extra spaces in directories' names (i.e. "task1- SR" -> "task1-SR") and rename "Matlab files" -> "Matlab_files"
Please manually check csv errors in ZuCo1/task_materials/*.csv and put them in ZuCo1/revised_csv or copy the provided folder from ./preprocess/resource/revised_csv
Please place necessary files under the following tree structure:
SemKey
└── datasets
└── ZuCo
├── ZuCo1
│ ├── revised_csv
│ ├── task1-SR
│ ├── task2-NR
│ └── task3-TSR
└── ZuCo2
├── task_materials
├── task2-NR
└── task3-TSR
...Please run the followings as instructed to setup datasets for SemKey stage 1 (parallel) training
Tip
Please run from project's root directory (i.e. SemKey/ )
Parse ZuCo sentences
Run./preprocess/preprocess_label.py
Generate topic/sentiment/length/surprisal labels
Run./label_generation/generate_all_labels.py
Load EEG data
Run./preprocess/preprocess_mat.py
Merge EEG with labels
Run./preprocess/preprocess_merge.py
Merge MTV
Copy./preprocess/resource/zuco_label_8variants.dfto./data/zuco_preprocessed_dataframe
Run./preprocess/preprocess_merge_MTV.py
Please run (This upgrade brings cosine learn-rate generation function)
If you directly use this version, you'll encounter safetensor warning during label generation
pip install --upgrade transformers==4.57.6Tip
Please run from project's root directory (i.e. SemKey/ )
Configure
./run_script/run_parallel.sh
Run./run_script/run_parallel.sh
Configure
./inference/predict_semkey_parallel_and_pack.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
Run./inference/predict_semkey_parallel_and_pack.sh
Configure
./run_script/run_e2e.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
-> You need to specify path-to-stage2dataset (generated by./inference/predict_semkey_parallel_and_pack.sh)
Run./run_script/run_e2e.sh
