diff --git a/images/models/pinned-and-baseline-runs/baseline-run-visual-distinction.png b/images/models/pinned-and-baseline-runs/baseline-run-visual-distinction.png new file mode 100644 index 0000000000..0bdec9d9f2 Binary files /dev/null and b/images/models/pinned-and-baseline-runs/baseline-run-visual-distinction.png differ diff --git a/images/models/pinned-and-baseline-runs/baseline_run_deltas.png b/images/models/pinned-and-baseline-runs/baseline_run_deltas.png new file mode 100644 index 0000000000..b033d4210d Binary files /dev/null and b/images/models/pinned-and-baseline-runs/baseline_run_deltas.png differ diff --git a/ja/models/runs/compare-runs.mdx b/ja/models/runs/compare-runs.mdx index a49dfe20c6..e8d04431c7 100644 --- a/ja/models/runs/compare-runs.mdx +++ b/ja/models/runs/compare-runs.mdx @@ -2,7 +2,6 @@ title: run の固定と比較 description: pinned や baseline を使用して重要な run を追跡し、モデルの実験( Experiments )を効率的に評価する方法について学びます。 --- -import CloudOnly from '/snippets/en/_includes/pinned-and-baseline-runs/cloud-only.mdx'; import PinRunsCondensed from '/snippets/en/_includes/pinned-and-baseline-runs/pin-runs-condensed.mdx'; 多数の Runs が存在する Workspace では、最高のパフォーマンスを出したモデル、プロダクション環境のモデル、失敗した Experiments 、あるいは重要な参照ポイントを把握し続けるのが難しくなることがあります。W&B App は、Runs を整理して比較するのに役立つ以下の機能を提供しています。 @@ -15,7 +14,6 @@ import PinRunsCondensed from '/snippets/en/_includes/pinned-and-baseline-runs/pi - 実験中に複数の候補モデルを追跡する。 - 新しい Runs がこれまでの最高結果を改善しているかどうかを評価する。 - [制限事項](#limitations) を参照してください。 ## Pin runs @@ -171,11 +169,10 @@ Baseline の指定を解除するには: {/* TODO screenshot */} ## Limitations - 以下の機能は、現在 Pinned runs および Baseline runs ではサポートされていません。 - **Grouping**: Run セレクターや Runs テーブルで [Runs を表示](/models/runs#view-logged-runs) する際、列でグループ化されている場合、Pinned runs や Baseline runs は他の Runs と視覚的に区別されません。 - **Reports**: [W&B Report](/models/reports) 内の Run set では、Pinned runs や Baseline runs は他の Runs と視覚的に区別されません。 - **Workspace 表示のみ**: 個別の Run の Workspace を表示している場合、Baseline は表示されません。 -- **折れ線グラフのみ**: Baseline との比較は折れ線グラフ(line plots)でのみ利用可能であり、棒グラフやメディアパネルなどの他のパネルではまだ利用できません。 \ No newline at end of file +- **折れ線グラフのみ**: Baseline との比較は折れ線グラフ(line plots)でのみ利用可能であり、棒グラフやメディアパネルなどの他のパネルではまだ利用できません。 diff --git a/ko/models/runs/compare-runs.mdx b/ko/models/runs/compare-runs.mdx index 5bca32fe24..40b9739cea 100644 --- a/ko/models/runs/compare-runs.mdx +++ b/ko/models/runs/compare-runs.mdx @@ -3,7 +3,6 @@ title: Run 고정 및 비교 description: Pinned Runs와 베이스라인 (baseline) 을 사용하여 중요한 Runs를 추적하고 모델 Experiments를 효율적으로 평가하는 방법을 알아보세요. --- -import CloudOnly from '/snippets/en/_includes/pinned-and-baseline-runs/cloud-only.mdx'; import PinRunsCondensed from '/snippets/en/_includes/pinned-and-baseline-runs/pin-runs-condensed.mdx'; 많은 Run 이 포함된 Workspace 에서는 성능이 가장 좋은 모델, 프로덕션 모델, 실패한 실험 또는 중요한 기준점을 추적하기 어려울 수 있습니다. W&B App은 Run 을 구성하고 비교하는 데 도움이 되는 기능을 제공합니다: @@ -16,7 +15,6 @@ import PinRunsCondensed from '/snippets/en/_includes/pinned-and-baseline-runs/pi - 실험 과정에서 여러 후보 모델을 추적할 때. - 새로운 Run 이 기존의 최상위 결과를 개선하는지 평가할 때. - [제한 사항](#제한-사항)을 참조하세요. ## Run 고정 (Pin runs) @@ -172,11 +170,10 @@ baseline run 은 해당 Run 이 로그를 기록한 메트릭에 대한 라인 {/* TODO screenshot */} ## 제한 사항 - 고정된 Run 및 baseline run 에 대해 다음 기능은 아직 지원되지 않습니다: - **그룹화 (Grouping)**: Run 선택기 또는 Run 테이블에서 [Run을 볼 때](/models/runs#view-logged-runs), Run 이 특정 컬럼으로 그룹화되어 있으면 고정된 Run 과 baseline run 이 다른 Run 과 시각적으로 구별되지 않습니다. - **Reports**: [W&B Report](/models/reports)의 Run 세트 내에서 고정된 Run 과 baseline run 은 다른 Run 과 시각적으로 구별되지 않습니다. - **Workspace 뷰 전용**: 단일 Run 의 Workspace 를 볼 때는 baseline 이 나타나지 않습니다. -- **라인 플롯 전용**: baseline 비교는 라인 플롯에서만 가능하며, 바 차트나 미디어 패널 등 다른 패널에서는 아직 사용할 수 없습니다. \ No newline at end of file +- **라인 플롯 전용**: baseline 비교는 라인 플롯에서만 가능하며, 바 차트나 미디어 패널 등 다른 패널에서는 아직 사용할 수 없습니다. diff --git a/models/runs/compare-runs.mdx b/models/runs/compare-runs.mdx index ffb2abb25e..7c4b8c3e4b 100644 --- a/models/runs/compare-runs.mdx +++ b/models/runs/compare-runs.mdx @@ -2,20 +2,21 @@ description: Learn how to use pinned and baseline runs to keep track of important runs and efficiently evaluate model experiments. title: Pin and compare runs --- -import CloudOnly from '/snippets/en/_includes/pinned-and-baseline-runs/cloud-only.mdx'; import PinRunsCondensed from '/snippets/en/_includes/pinned-and-baseline-runs/pin-runs-condensed.mdx'; In a workspace with many runs, it can be difficult to keep track of your best performers, production models, failed experiments, or important reference points. The W&B App provides features to help organize and compare runs: - **Pinned runs**: Pin up to 6 runs to keep them visible in the workspace and at the top of the runs list. If you have a baseline run, you can pin up to 5 runs because the baseline is implicitly pinned. -- **Baseline run**: Specify a baseline run as your reference point for comparisons. The baseline run is always visible in the workspace and at the top of the runs list. In line plots, the baseline appears with visually distinct styling to help with comparison. +- **Baseline run**: Specify a baseline run as your reference point for comparisons. The baseline run is always visible in the workspace and at the top of the runs list. In the runs table, summary metric deltas show how each run compares to the baseline. In line plots, the baseline appears with visually distinct styling to help with comparison. + +![Line plot with baseline and pinned runs](/images/m +odels/pinned-and-baseline-runs/baseline-run-visual-distinction.png) These features are particularly useful for: - Comparing new experiments against your production model. - Tracking multiple candidate models during experimentation. - Evaluating whether new runs improve on your best results. - See [Limitations](#limitations). ## Pin runs @@ -85,6 +86,32 @@ The baseline run is always visible in line plots for metrics the run has logged. ![Demo of comparing another run with the baseline](/images/models/pinned-and-baseline-runs/line-plot-baseline-comparison.png) +### Summary metric deltas +When a run is set as the baseline, by default every other run that logs the same summary metric as the baseline run shows the delta (amount of change) of that metric from the baseline. The delta appears to the right of the metric's value in the run's row in the runs table. + +By default, the delta is shown with dark gray text on a dark gray background. To turn on semantic coloring for quick visual reference, you can set the **Metric directionality** for a column. With directionality set: + +- If the other run **outperforms** (is directionally better than) the baseline, the delta is shown in dark red text with a light red background. +- If the other run **underperforms** (is directionally better than) the baseline, the delta is shown in dark teal text with a light teal background. + +To set the directionality for a metric: + +1. In the runs table, hover over the column heading for the metric. +2. Click the `...` action menu that appears. +3. Set **Metric directionality** to **Higher values are best** or **Lower values are best**. + +The following screenshot shows how the runs `nanochat-train-base` and `nanochat-train-mid` compare with the baseline run `nanochat-train`. Delta metrics are shown for `TOTAL_TRAINING_TIME`, `TRAIN/DT`, AND `TRAIN/GRAD_NORM`. +![Screenshot comparing summary metric deltas from the baseline run](/images/models/pinned-and-baseline-runs/baseline_run_deltas.png) + +## Hide summary metric deltas in a workspace +By default, a workspace with a baseline run always displays summary metric deltas. To hide them for a workspace: + +1. In the workspace, click **Settings**. +1. In the drawer that appears, click **Runs**. +1. In the **Baseline** tab, toggle **Show value deltas in the runs table**. +1. Close the workspace settings drawer. + + ## Use cases This section describes some scenarios where pinned and baseline runs can help guide your experiments. @@ -163,13 +190,14 @@ This section illustrates how pinned and baseline runs can help you to compare ru After running this code, your workspace has three runs. 2. Set `baseline-config` as your baseline run. 3. Pin `baseline-config` to keep it visible. -4. Compare the experiment runs against the baseline using the line plots in the workspace. +4. Compare the experiment runs against the baseline. + - In the runs table, review the summary metric deltas next to each run's values to compare the run to the baseline. + - In line plots, compare the performance of one or more runs to the baseline, which is always visible. 5. Pin promising experiments for further investigation. In this example, after 50 epochs, `lr-experiment-0.003` has the highest accuracy (`~0.64`) and the lowest loss (`~0.86`). {/* TODO screenshot */} ## Limitations - The following features are not yet supported for pinned and baseline runs: diff --git a/snippets/en/_includes/pinned-and-baseline-runs/cloud-only.mdx b/snippets/en/_includes/pinned-and-baseline-runs/cloud-only.mdx deleted file mode 100644 index 292acf459c..0000000000 --- a/snippets/en/_includes/pinned-and-baseline-runs/cloud-only.mdx +++ /dev/null @@ -1,3 +0,0 @@ - -Pinned and baseline runs are available for [W&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) only. -