Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

Architecture of the SemKey framework

SemKey a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. By utilizing these semantic attributes in conjunction with encoded EEG signals, we achieve state-of-the-art (SOTA) performance in EEG-to-text generation.

🛠️ Installation & Setup

🖥️ Environment Setup

Tip

You can find all required packages in ./environment.yml

# Create environment
conda env create -f environment.yml
# Activate environment
conda activate semkey

# Additionally, removal of environment
conda env remove -n semkey

📊 Data Preparation

1.Download ZuCo Dataset

Please download ZuCo 1.0 and 2.0 from their official site:

ZuCo1: link
ZuCo2: link

Important

Please rename ZuCo2 directories (follows ZuCo1 task naming):
"task1 - NR" -> "task2-NR"
"task2 - TSR" -> "task3-TSR"

Please also remove extra spaces in directories' names (i.e. "task1- SR" -> "task1-SR") and rename "Matlab files" -> "Matlab_files"

Please manually check csv errors in ZuCo1/task_materials/*.csv and put them in ZuCo1/revised_csv or copy the provided folder from ./preprocess/resource/revised_csv

Please place necessary files under the following tree structure:

SemKey
└── datasets
    └── ZuCo
        ├── ZuCo1
        │    ├── revised_csv
        │    ├── task1-SR
        │    ├── task2-NR
        │    └── task3-TSR
        └── ZuCo2
             ├── task_materials
             ├── task2-NR
             └── task3-TSR
...

2.Preprocess

Please run the followings as instructed to setup datasets for SemKey stage 1 (parallel) training

Tip

Please run from project's root directory (i.e. SemKey/ )

Parse ZuCo sentences
Run ./preprocess/preprocess_label.py

Generate topic/sentiment/length/surprisal labels
Run ./label_generation/generate_all_labels.py

Load EEG data
Run ./preprocess/preprocess_mat.py

Merge EEG with labels
Run ./preprocess/preprocess_merge.py

Merge MTV
Copy ./preprocess/resource/zuco_label_8variants.df to ./data/zuco_preprocessed_dataframe
Run ./preprocess/preprocess_merge_MTV.py

🔄 Upgrade package: `Transformers`

Please run (This upgrade brings cosine learn-rate generation function)
If you directly use this version, you'll encounter safetensor warning during label generation

pip install --upgrade transformers==4.57.6

🔥 Training

Tip

Please run from project's root directory (i.e. SemKey/ )

Stage 1 (Semkey Parallel)

Configure ./run_script/run_parallel.sh
Run ./run_script/run_parallel.sh

Prepare data for Stage 2

Configure ./inference/predict_semkey_parallel_and_pack.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
Run ./inference/predict_semkey_parallel_and_pack.sh

Stage 2 (Semkey E2E | end-to-end training)

Configure ./run_script/run_e2e.sh
-> You need to specify path-to-stage1 (SemKey parallel) checkpoint
-> You need to specify path-to-stage2dataset (generated by ./inference/predict_semkey_parallel_and_pack.sh)
Run ./run_script/run_e2e.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

🛠️ Installation & Setup

🖥️ Environment Setup

📊 Data Preparation

1.Download ZuCo Dataset

2.Preprocess

🔄 Upgrade package: `Transformers`

🔥 Training

Stage 1 (Semkey Parallel)

Prepare data for Stage 2

Stage 2 (Semkey E2E | end-to-end training)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
figure		figure
inference		inference
label_generation		label_generation
model		model
preprocess		preprocess
run_script		run_script
train		train
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

xmed-lab/SemKey

Folders and files

Latest commit

History

Repository files navigation

Beyond LLM Priors: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

📌 Overview

🛠️ Installation & Setup

🖥️ Environment Setup

📊 Data Preparation

1.Download ZuCo Dataset

2.Preprocess

🔄 Upgrade package: Transformers

🔥 Training

Stage 1 (Semkey Parallel)

Prepare data for Stage 2

Stage 2 (Semkey E2E | end-to-end training)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🔄 Upgrade package: `Transformers`

Packages