Skip to content

xmed-lab/SemKey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

structure

Architecture of the SemKey framework

SemKey a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. By utilizing these semantic attributes in conjunction with encoded EEG signals, we achieve state-of-the-art (SOTA) performance in EEG-to-text generation.

🛠️ Installation & Setup

🖥️ Environment Setup

Tip

You can find all required packages in ./environment.yml

# Create environment
conda env create -f environment.yml
# Activate environment
conda activate semkey
# Additionally, removal of environment
conda env remove -n semkey

📊 Data Preparation

1.Download ZuCo Dataset

Please download ZuCo 1.0 and 2.0 from their official site:

ZuCo1: link
ZuCo2: link

Important

Please rename ZuCo2 directories (follows ZuCo1 task naming):
"task1 - NR" -> "task2-NR"
"task2 - TSR" -> "task3-TSR"

Please also remove extra spaces in directories' names (i.e. "task1- SR" -> "task1-SR") and rename "Matlab files" -> "Matlab_files"

Please manually check csv errors in ZuCo1/task_materials/*.csv and put them in ZuCo1/revised_csv or copy the provided folder from ./preprocess/resource/revised_csv

Please place necessary files under the following tree structure:

SemKey
└── datasets
    └── ZuCo
        ├── ZuCo1
        │    ├── revised_csv
        │    ├── task1-SR
        │    ├── task2-NR
        │    └── task3-TSR
        └── ZuCo2
             ├── task_materials
             ├── task2-NR
             └── task3-TSR
...

2.Preprocess

Please run the followings as instructed to setup datasets for SemKey stage 1 (parallel) training

Tip

Please run from project's root directory (i.e. SemKey/ )

Parse ZuCo sentences
Run ./preprocess/preprocess_label.py

Generate topic/sentiment/length/surprisal labels
Run ./label_generation/generate_all_labels.py

Load EEG data
Run ./preprocess/preprocess_mat.py

Merge EEG with labels
Run ./preprocess/preprocess_merge.py

Merge MTV
Copy ./preprocess/resource/zuco_label_8variants.df to ./data/zuco_preprocessed_dataframe
Run ./preprocess/preprocess_merge_MTV.py

🔄 Upgrade package: Transformers

Please run (This upgrade brings cosine learn-rate generation function)
If you directly use this version, you'll encounter safetensor warning during label generation

pip install --upgrade transformers==4.57.6

🔥 Training

Tip

Please run from project's root directory (i.e. SemKey/ )

Stage 1 (Semkey Parallel)

Configure ./run_script/run_parallel.sh
Run ./run_script/run_parallel.sh

Prepare data for Stage 2

Configure ./inference/predict_semkey_parallel_and_pack.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
Run ./inference/predict_semkey_parallel_and_pack.sh

Stage 2 (Semkey E2E | end-to-end training)

Configure ./run_script/run_e2e.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
-> You need to specify path-to-stage2dataset (generated by ./inference/predict_semkey_parallel_and_pack.sh)
Run ./run_script/run_e2e.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published