Skip to content

yuhos16/MedGuideX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MedGuideX

This repository contains the public release artifacts for MedGuideX, a medical LLM project that converts clinical practice guidelines into executable decision logic and then into factual and counterfactual QA supervision.

The release is intentionally scoped to reviewer-useful artifacts:

  • final guideline-derived factual QA data;
  • final strict counterfactual QA data;
  • the executable guideline-to-QA construction pipeline;
  • training data preparation utilities for SFT/RL;
  • executable-consistency reward code;
  • lightweight validation scripts.

It intentionally excludes model weights, checkpoints, raw benchmark data, private logs, API keys, and large intermediate pipeline artifacts.

Repository Layout

MedGuideX/
  data/
    factual.json                 # 4,963 factual QA examples, grouped by guideline function
    counterfactual.json          # 4,963 strict counterfactual QA examples
    stats.json                   # release and source pipeline counts
    samples/                     # one sample record per task type
    source/README.md             # raw CPG source schema note
  code/
    data_generation/v2_pipeline.py
    training/
      prepare_sft_coldstart_dataset.py
      prepare_rl_medical_reasoning_dataset.py
      medical_reasoning_reward.py
      medical_sft_dataset.py
  src/
    azure_api.py                 # env-driven LLM client wrapper, no keys included
    azure_openai_judge.py
  scripts/
    create_release_dataset.py
    validate_release.py
    check_executable_consistency.py

Dataset

The released dataset is a cleaned version of the US-only pipeline output. It preserves the fields needed for training and verification while removing raw CPG text, source chunks, LLM history, validation history, logs, and checkpoints.

Counts:

Split Functions QA examples
Factual 2,759 4,963
Counterfactual 699 4,963

Each record is grouped by executable guideline function. Each scenario contains:

  • the clinical question;
  • the generated reasoning and final answer in JSON text;
  • executable inputs and outputs;
  • for counterfactual data, X_base, X_hidden, X_change, intervention values, and abduction-stability metadata.

Quick Validation

pip install -r requirements.txt
python scripts/validate_release.py
python scripts/check_executable_consistency.py

Expected output:

OK factual=4963 counterfactual=4963

The second command re-executes every released factual and counterfactual scenario against its guideline function and checks that the stored oracle output is reproduced.

Preparing SFT Data

python code/training/prepare_sft_coldstart_dataset.py \
  --task-selection both \
  --use-all \
  --factual-val 0 \
  --factual-test 0 \
  --counterfactual-val 0 \
  --counterfactual-test 0 \
  --out-dir data/sft

The script writes parquet files for text-only SFT. The default paths point to data/factual.json and data/counterfactual.json.

Preparing RL Prompts

python code/training/prepare_rl_medical_reasoning_dataset.py \
  --task-selection factual_cot \
  --pool-mode all \
  --out-dir data/rl/factual_cot

The reward implementation is in code/training/medical_reasoning_reward.py. It checks answer correctness, response format, executable consistency, and counterfactual hidden-variable recovery when applicable.

The main post-training configuration template is in code/training/configs/post_training.yaml.

Re-running Data Generation

The full pipeline is in code/data_generation/v2_pipeline.py. To run it from raw CPG JSONL:

export AZURE_OPENAI_API_KEY=...
export AZURE_OPENAI_BASE_URL=...
export GUIDELINE_OPENAI_MODEL=<your-generation-model>

python code/data_generation/v2_pipeline.py \
  --source-jsonl data/source/us_filter.jsonl \
  --output-dir data/generated/us_only_final \
  --max-qa-jobs 4963 \
  --max-no-action-ratio 0.25

Raw CPG source documents are not bundled by default to avoid redistribution and licensing ambiguity. See data/source/README.md for the expected schema.

Release Safety

Do not upload:

  • model weights or LoRA adapters;
  • training checkpoints and optimizer states;
  • private .env files, API keys, or usage logs;
  • raw MIMIC/benchmark data;
  • private evaluation outputs that include benchmark case text;
  • raw CPG source documents unless the release license has been explicitly checked.

The included .gitignore blocks common model and credential files, but run python scripts/validate_release.py before publishing.

Medical Disclaimer

This repository is for research on clinical reasoning models. It is not a medical device and must not be used as a substitute for professional clinical judgment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages