Pivot to Steiner translation SFT training on RunPod#230
Open
nirdrang wants to merge 50 commits intoVectifyAI:mainfrom
Open
Pivot to Steiner translation SFT training on RunPod#230nirdrang wants to merge 50 commits intoVectifyAI:mainfrom
nirdrang wants to merge 50 commits intoVectifyAI:mainfrom
Conversation
Installs project-level Claude Code skills for working with RunPod infrastructure: flash (serverless GPU/CPU deployment SDK/CLI) and runpodctl (pod/endpoint/volume management CLI). https://claude.ai/code/session_012ryUKq1DUyU68zCTfdtBp4
…ion-WGq9u Add RunPod flash and runpodctl Claude Code skills
Automatically installs runpodctl CLI and configures the API key from .env on remote Claude Code sessions. https://claude.ai/code/session_01Sng6PreXjajcMAQLc5WYkR
Clearing out PageIndex-specific code to repurpose the repo. https://claude.ai/code/session_01Sng6PreXjajcMAQLc5WYkR
openai 1.101.0 -> 2.30.0, python-dotenv 1.1.0 -> 1.2.2, tiktoken 0.11.0 -> 0.12.0 https://claude.ai/code/session_01Sng6PreXjajcMAQLc5WYkR
Add session-start hook for runpodctl setup
Steiner English-to-Hebrew translation batch (200 requests) using fine-tuned GPT-4.1 model. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Second batch file (file-9qv2F9EJLyrACDQZNc6cy9) - same 200 translation requests with shorter instructions. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
…on scripts - glossary.json: 179 English→Hebrew anthroposophical term mappings extracted from batch instructions - create_batch_gpt54mini.py: creates and submits GPT-5.4 mini batch jobs (with/without glossary) - run_eval.py: Vertex AI evaluation (COMET, BLEU, MetricX) + terminology recall + composite score - Updated requirements.txt with google-cloud-aiplatform and pandas https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Downloaded from OpenAI (file-R5q3PdQWzjEXm5FLRmRGU5). https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Selected 200 paragraphs from gpt41_full_train.jsonl that: - Are NOT in gpt41_3k_priority.jsonl or gpt41_20k_val.jsonl - Match the original 200 eval paragraphs in length and glossary term distribution - Include both English source and Hebrew reference translations Also adds gpt41_full_train.jsonl (20,281 training examples). https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
- eval_source.jsonl: 200 English paragraphs (custom_id + source) - eval_reference.jsonl: 200 Hebrew reference translations (custom_id + reference) https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
gpt41_full_train.jsonl: 20281 → 20081 lines. 3k priority and val files already had zero overlap. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
gpt41_full_train.jsonl → gpt41_20k_train.jsonl gpt41_3k_priority.jsonl → gpt41_3k_train.jsonl gpt41_20k_val.jsonl → gpt41_val.jsonl https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
steiner_val.jsonl: 500 → 50 steiner_20k_train.jsonl: 20,081 → 20,531 https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
MetricX auto-detects the reference column and uses METRICX_24_SRC_REF when it's present in the dataset DataFrame. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Uses eval_source.jsonl (200 paragraphs with reference translations available). https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
EvalTask wrapper has a parsing bug for COMET/MetricX results. Now uses EvaluationServiceClient directly with explicit language params: - COMET_22_SRC_REF (en→he) - METRICX_24_SRC_REF (en→he) - BLEU https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Replaces direct API calls with proper SDK usage: - pointwise_metric.Comet(source_language='en', target_language='he') - pointwise_metric.MetricX(source_language='en', target_language='he', version='METRICX_24_SRC_REF') https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
COMET: 0.8262, BLEU: 0.1612, MetricX: 3.6761, Term recall: 0.8667 Composite: 0.7439 https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
COMET: 0.8212, BLEU: 0.1504, MetricX: 3.6931, Term recall: 0.7400 Composite: 0.7280 https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
- sft/axolotl_config_3k.yaml: LoRA config for GPT-OSS-120B with MXFP4 - sft/run_sft.py: Full orchestration (create pod, upload data, train, infer, cleanup) - eval_results_with_glossary.csv: GPT-5.4 mini eval results https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
…script - sft/axolotl_config_3k.yaml: LoRA config for GPT-OSS-120B (Mxfp4Config + dequantize, lora r=8, sample_packing, 1 epoch) - sft/run_sft.py: full orchestrator (create pod, upload data, train, infer, cleanup) via Jupyter kernel API - sft/benchmark_sft.py: SFT speed benchmark on a single GPU (~30 steps), measures samples/sec and steps/sec https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Skips pod creation; runs benchmark on already-running pods via Jupyter. Supports parallel benchmarking of multiple pods. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Splits setup metrics into: - axolotl_install_s - model_download_s (snapshot_download from HF) - train_time_s Also skips pod uptime check (often inaccurate); polls Jupyter directly. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Websocket has 10MB message limit; the 3k train file is 10MB raw / 13MB b64. Switching to PUT /api/contents/<path> avoids the websocket message limit. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
- pip install axolotl without [flash-attn] extras (compilation often fails) - Use plain string concatenation for download script instead of multiline triple-quoted https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
…xolotl The base pytorch image has debian-packaged cryptography without a RECORD file, so pip cannot uninstall it. Installing with --ignore-installed bypasses this. https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
Key additions to the SFT tuning plan: - Lever exploration methodology with 3-stage evaluation (eval_loss → pairwise → composite) - 3k vs 20k lever placement: data-shape levers (Tier 4) run on cheap 3k subset, hyperparameter sweeps (Tier 1) on 20k only - num_epochs changed from hardcoded 2 to TBD via schedule shootout (1, 2, 3 tested empirically) - Successive halving for family sweeps (LR, rank) - Proposed 15-experiment sequence with cost estimates https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR pivots the repository from a document indexing/RAG tool (PageIndex) to an English→Hebrew Steiner text translation project using supervised fine-tuning (SFT) on RunPod infrastructure.
Key Changes
Removed:
pageindex/package,run_pageindex.py)Added:
steiner_3k_train.jsonl,steiner_20k_train.jsonl,steiner_val.jsonlfor English→Hebrew translation pairssft/run_sft.py— RunPod pod lifecycle management (create, train, infer, cleanup)sft/benchmark_pod.pyandsft/benchmark_sft.pyfor GPU performance measurementsft/axolotl_config_3k.yaml— LoRA fine-tuning on openai/gpt-oss-120b with MXFP4 quantizationrun_eval.pyfor translation quality assessment using Vertex AI (COMET, BLEU, MetricX) + terminology recallcreate_batch_gpt54mini.pyfor OpenAI batch API submissionglossary.jsonwith 180+ Steiner-specific terminology mappings (English→Hebrew).claude/plans/lever-exploration-plan.md(1590 lines) detailing the SFT approach, memory model, and training strategy.claude/skills/flash/SKILL.mdand.claude/skills/runpodctl/SKILL.mdfor deployment automation.claude/settings.json,.claude/hooks/session-start.sh,CLAUDE.mdfor Claude Code environment setupModified:
requirements.txt: Replaced PageIndex dependencies (pymupdf, PyPDF2, tiktoken) with translation/evaluation stack (openai 2.30.0, google-cloud-aiplatform, pandas).gitignore: Addedgcp-sa-key.jsonandsft/benchmark_results.jsonImplementation Details
https://claude.ai/code/session_01QT56nFPcmWN9mgE8aEbnLb