Add improvement notes and helper examples for user-provided optimized script by kokibs · Pull Request #5 · Atlas-45/WebIntro

kokibs · 2026-02-06T08:47:51Z

The user-provided “optimized” training script had valid ideas but left risks around leakage, feature redundancy, and compute cost that reduce reproducibility and generalization.
Provide a concise, actionable checklist and code snippets so readers can safely apply lag/rolling/EMA features, stabilize CV, and prioritize LightGBM tuning.

Appended a new section "7. 参考: ユーザー提示の最適化スクリプトに対する改善ポイント" to docs/brain_lever_competition_improvements.md that summarizes risks and recommendations across CV design, feature pruning, time-series handling, model selection, and numerical stability.
Included concrete suggestions and best practices for CV (prefer GroupKFold(sample_id) with optional time-holdout), feature reduction (importance-based pruning), and safer time-series features (lag, diff, EMA, limited rolling windows).
Added example helper functions and code snippets (add_rolling_features, add_ema_features, prepare_features, cv_score, train_and_predict) illustrating how to implement the recommended feature engineering and an LGBM-centered workflow with early stopping and regularization.

No automated tests were run because this is a documentation-only change.
There were no CI/unit test executions as part of this update.
The documentation content was added and formatted inline with the existing file; no runtime code changes were introduced to production modules.

Add improvement notes for optimized script

ffd836f

kokibs added the codex label Feb 6, 2026 — with ChatGPT Codex Connector

Provide feedback