Skip to content

Refactor code, add auto research and openenv#225

Open
tastelikefeet wants to merge 66 commits into
modelscope:mainfrom
tastelikefeet:feat/refactor-1
Open

Refactor code, add auto research and openenv#225
tastelikefeet wants to merge 66 commits into
modelscope:mainfrom
tastelikefeet:feat/refactor-1

Conversation

@tastelikefeet

@tastelikefeet tastelikefeet commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support
  • Refactor

PR information

Features

  1. Auto Research 终端代理 — 新增 twinkle_client/auto 模块(AgentLoop + TrainingMonitor + Tools),纯终端 chat 即可控制训练
  2. Skills 知识注入 — 新增 twinkle_client/skills,支持 bundled / local / ModelScope 三级 provider,自动注入训练领域知识
  3. OpenEnv 环境抽象 — 新增 twinkle_agentic/envs,统一多环境工具注入接口,支持自定义环境包接入
  4. ChunkedCrossEntropyLoss 优化 — 分块计算降低显存峰值,修复 require_entropy 逻辑
  5. Grad Clip 重构 — 平台感知的梯度裁剪(GPU/NPU/MPS 自适应)
  6. Multi-turn RL Cookbook — 新增多轮工具调用 GRPO 训练完整示例
  7. Cookbook 目录重组rl/ 拆分为 rl/dpo/rl/grpo/rl/gkd/,统一附带 .sh 启动脚本
  8. Platform 检测模块 — 新增 utils/platforms/,统一 GPU/NPU/MPS 设备抽象
  9. Template 增强 — DeepSeekV4 模板优化、工具调用解析器重构、Qwen3.5-VL 支持

Bug Fixes

  1. DPO / GKD / GRPO loss 数值稳定性修复
  2. InfoNCE loss in-batch negative 采样修正
  3. Processor require_entropy 冗余计算消除
  4. Megatron multi-LoRA 路径处理修复
  5. vLLM Sampler 遗留代码清理
  6. Server state session 管理修复
  7. Template 工具解析边界条件处理
  8. IterablePackingDataset 拼包逻辑健壮性增强

Tests

  1. 新增 tests/loss/ — CE/MSE、DPO、GRPO/GKD 全覆盖
  2. 新增 tests/advantage/ — 优势函数单测
  3. 新增 tests/metric/ — 全指标覆盖
  4. 新增 tests/utils/ — 工具函数测试
  5. 新增 tests/twinkle_agentic/test_tools.py — Agentic 工具测试

Docs

  1. 双语新增:Agentic(Envs/Protocol/Rollout/Tools/Preprocessor)、Auto(Auto-Research/SkillProvider)、CLI 组件文档

Experiment results

Paste your experiment result here(if needed).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request cleans up unused imports across several files, including removing time in manager.py, os in mixin.py, and various types in cli.py. It also removes from __future__ import annotations from cli.py and quotes the ConfigRegistry type annotation as a result. The review feedback points out that removing from __future__ import annotations will break compatibility with Python < 3.10 due to the use of PEP 604 union types and PEP 585 generic collections. It is recommended to restore this import, which also allows using ConfigRegistry directly without quotes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/twinkle/cli/cli.py
Comment thread src/twinkle/cli/cli.py
@tastelikefeet tastelikefeet changed the title [WIP] Refactor Refactor code, add auto research and openenv Jun 28, 2026
…actor-1

# Conflicts:
#	pyproject.toml
#	src/twinkle/model/megatron/megatron.py
@tastelikefeet

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant updates to the Twinkle framework, including the addition of a new agentic module (twinkle_agentic) with support for multi-turn rollouts, environment pools, and tool use, alongside comprehensive documentation. It refactors training and sampling cookbooks to use a unified CLI configuration, adds Muon optimizer support, and optimizes the ChunkedCrossEntropyLoss backward pass for memory efficiency. Key feedback highlights several critical issues: a missing batched=True parameter in dataset_index.py that would cause type errors, dangerous in-place mutation of logits in the chunked cross-entropy backward pass, missing L2 normalization for cosine similarity calculations in build_thinking_rag_index.py, and missing process group parameters in EmbeddingMetric's distributed gather. Additionally, improvements are suggested to resolve hardcoded absolute paths in evaluation scripts, correct step counter misalignment on training resume, and prevent negative local pool sizes in EnvPool.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread cookbook/exp/embedding/dataset_index.py Outdated
Comment thread src/twinkle/loss/chunked_cross_entropy.py
Comment thread cookbook/exp/condenser/untested/eval_condensed_compressed.sh
Comment thread cookbook/exp/condenser/untested/eval_condensed_native.sh
Comment thread cookbook/exp/embedding/build_thinking_rag_index.py
Comment thread src/twinkle/metric/embedding.py
Comment thread cookbook/exp/embedding/train_embedding_full_ddp.py
Comment thread src/twinkle_agentic/envs/openenv.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant