MedGuideX

This repository contains the public release artifacts for MedGuideX, a medical LLM project that converts clinical practice guidelines into executable decision logic and then into factual and counterfactual QA supervision.

The release is intentionally scoped to reviewer-useful artifacts:

final guideline-derived factual QA data;
final strict counterfactual QA data;
the executable guideline-to-QA construction pipeline;
training data preparation utilities for SFT/RL;
executable-consistency reward code;
lightweight validation scripts.

It intentionally excludes model weights, checkpoints, raw benchmark data, private logs, API keys, and large intermediate pipeline artifacts.

Repository Layout

MedGuideX/
  data/
    factual.json                 # 4,963 factual QA examples, grouped by guideline function
    counterfactual.json          # 4,963 strict counterfactual QA examples
    stats.json                   # release and source pipeline counts
    samples/                     # one sample record per task type
    source/README.md             # raw CPG source schema note
  code/
    data_generation/v2_pipeline.py
    training/
      prepare_sft_coldstart_dataset.py
      prepare_rl_medical_reasoning_dataset.py
      medical_reasoning_reward.py
      medical_sft_dataset.py
  src/
    azure_api.py                 # env-driven LLM client wrapper, no keys included
    azure_openai_judge.py
  scripts/
    create_release_dataset.py
    validate_release.py
    check_executable_consistency.py

Dataset

The released dataset is a cleaned version of the US-only pipeline output. It preserves the fields needed for training and verification while removing raw CPG text, source chunks, LLM history, validation history, logs, and checkpoints.

Counts:

Split	Functions	QA examples
Factual	2,759	4,963
Counterfactual	699	4,963

Each record is grouped by executable guideline function. Each scenario contains:

the clinical question;
the generated reasoning and final answer in JSON text;
executable inputs and outputs;
for counterfactual data, X_base, X_hidden, X_change, intervention values, and abduction-stability metadata.

Quick Validation

pip install -r requirements.txt
python scripts/validate_release.py
python scripts/check_executable_consistency.py

Expected output:

OK factual=4963 counterfactual=4963

The second command re-executes every released factual and counterfactual scenario against its guideline function and checks that the stored oracle output is reproduced.

Preparing SFT Data

python code/training/prepare_sft_coldstart_dataset.py \
  --task-selection both \
  --use-all \
  --factual-val 0 \
  --factual-test 0 \
  --counterfactual-val 0 \
  --counterfactual-test 0 \
  --out-dir data/sft

The script writes parquet files for text-only SFT. The default paths point to data/factual.json and data/counterfactual.json.

Preparing RL Prompts

python code/training/prepare_rl_medical_reasoning_dataset.py \
  --task-selection factual_cot \
  --pool-mode all \
  --out-dir data/rl/factual_cot

The reward implementation is in code/training/medical_reasoning_reward.py. It checks answer correctness, response format, executable consistency, and counterfactual hidden-variable recovery when applicable.

The main post-training configuration template is in code/training/configs/post_training.yaml.

Re-running Data Generation

The full pipeline is in code/data_generation/v2_pipeline.py. To run it from raw CPG JSONL:

export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_BASE_URL=...
export GUIDELINE_OPENAI_MODEL=<your-generation-model>

python code/data_generation/v2_pipeline.py \
  --source-jsonl data/source/us_filter.jsonl \
  --output-dir data/generated/us_only_final \
  --max-qa-jobs 4963 \
  --max-no-action-ratio 0.25

Raw CPG source documents are not bundled by default to avoid redistribution and licensing ambiguity. See data/source/README.md for the expected schema.

Release Safety

Do not upload:

model weights or LoRA adapters;
training checkpoints and optimizer states;
private .env files, API keys, or usage logs;
raw MIMIC/benchmark data;
private evaluation outputs that include benchmark case text;
raw CPG source documents unless the release license has been explicitly checked.

The included .gitignore blocks common model and credential files, but run python scripts/validate_release.py before publishing.

Medical Disclaimer

This repository is for research on clinical reasoning models. It is not a medical device and must not be used as a substitute for professional clinical judgment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedGuideX

Repository Layout

Dataset

Quick Validation

Preparing SFT Data

Preparing RL Prompts

Re-running Data Generation

Release Safety

Medical Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
data		data
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MedGuideX

Repository Layout

Dataset

Quick Validation

Preparing SFT Data

Preparing RL Prompts

Re-running Data Generation

Release Safety

Medical Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages