update profiling doc for AITK (#9644)

xieofxie · hualxie · ntrogh · web-flow · commit 740eb42164ce · 2026-04-13T11:45:14.000+02:00
* update profiling with OP

* WIP

* add genai

* fix

* Update docs/intelligentapps/profiling.md

Co-authored-by: Nick Trogh &lt;ntrogh@hotmail.com&gt;

* Update docs/intelligentapps/profiling.md

Co-authored-by: Nick Trogh &lt;ntrogh@hotmail.com&gt;

---------

Co-authored-by: hualxie &lt;hualxie@microsoft.com&gt;
Co-authored-by: Nick Trogh &lt;ntrogh@hotmail.com&gt;
diff --git a/docs/intelligentapps/images/profiling/by-genai-model-file-config.png b/docs/intelligentapps/images/profiling/by-genai-model-file-config.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5f763c563cff8d0b436a84c33ab8ed9112722da662436155eddf7219ae99bfee
+size 31940
diff --git a/docs/intelligentapps/images/profiling/by-model-file-config-2.png b/docs/intelligentapps/images/profiling/by-model-file-config-2.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0542fe53facf8926c3500042127fd8030afad919d3515d28c68ea29f82e8e191
+size 30321
diff --git a/docs/intelligentapps/images/profiling/by-model-file-config.png b/docs/intelligentapps/images/profiling/by-model-file-config.png
diff --git a/docs/intelligentapps/images/profiling/by-model-file-op-result.png b/docs/intelligentapps/images/profiling/by-model-file-op-result.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ba6345e4d3a2ab15ccf9da915b5bc599d1a4fb082ffa846aa89c80fd447f935a
+size 75325
diff --git a/docs/intelligentapps/images/profiling/by-model-file-op-succeeded.png b/docs/intelligentapps/images/profiling/by-model-file-op-succeeded.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:125df7b56ad0f53d8029fb151d4385706e89db097a446586d368d36dd0227568
+size 17982
diff --git a/docs/intelligentapps/images/profiling/by-process-id-or-name-2.png b/docs/intelligentapps/images/profiling/by-process-id-or-name-2.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:bda726840d20994feb34c600baabbd567392acf2989f6cf12a13c449615098af
+size 17862
diff --git a/docs/intelligentapps/images/profiling/by-process-id-or-name.png b/docs/intelligentapps/images/profiling/by-process-id-or-name.png
diff --git a/docs/intelligentapps/images/profiling/the-next-session-guide-2.png b/docs/intelligentapps/images/profiling/the-next-session-guide-2.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:438954df2c7172ed686e40f80a4e513dc710666ae4600c4616c9f36e8152a6c2
+size 92168
diff --git a/docs/intelligentapps/images/profiling/the-next-session-guide.png b/docs/intelligentapps/images/profiling/the-next-session-guide.png
diff --git a/docs/intelligentapps/profiling.md b/docs/intelligentapps/profiling.md
@@ -5,9 +5,9 @@ MetaDescription: Profiling Quickstart in AI Toolkit.
 ---
 # Profiling an app using Windows Machine Learning
 
-Profiling is a tool designed to help developers and AI engineers to diagnose the CPU, GPU, NPU resource usages of processes, ONNX model on different execution providers, and Windows ML events.
+Profiling is a tool designed to help developers and AI engineers to diagnose CPU, GPU, and NPU resource usage of processes, profile ONNX models on different execution providers, and capture Windows ML events.
 
-In this article, you could learn how to start profiling and how to inspect the resource usages view and the events view.
+In this article, you learn how to start profiling and how to inspect the resource usage view and the events view.
 
 ## Prerequisites
 
@@ -17,9 +17,9 @@ In this article, you could learn how to start profiling and how to inspect the r
 ## Profile on app startup
 
 In this mode, the profiling tool profiles the next app that is started and that is sending out Windows ML events.
-This option is ideal for testing a run-once app. In this case, you start profiling, then run the app, and the resource usages will begin showing up.
+This option is ideal for testing a run-once app. In this case, you start profiling, then run the app, and the resource usage will appear.
 
-![Screenshot that shows how to start by the next session](./images/profiling/the-next-session-guide.png)
+![Screenshot that shows how to start by the next session](./images/profiling/the-next-session-guide-2.png)
 
 The tool starts profiling a newly started app. This means that for profiling a Python notebook, if the kernel is already running, you need to restart the kernel to begin profiling for it. Just starting a new notebook does not automatically start profiling.
 
@@ -38,15 +38,15 @@ In this mode, the profiling tool starts profiling an already running app. You ca
 
 This option is ideal for profiling an app that is already running and you're unable to restart it for profiling purposes.
 
-![Screenshot that shows how to start by process id or name](./images/profiling/by-process-id-or-name.png)
+![Screenshot that shows how to start by process id or name](./images/profiling/by-process-id-or-name-2.png)
 
 ## Profile an ONNX model
 
-In this mode, the profiling tool starts profiling an ONNX model file on a target execution provider (EP) or device policy for a given duration. You can see the resource usage while it's running.
+In this mode, the profiling tool starts profiling an ONNX model file on a target execution provider (EP) for a given duration. You can see the resource usage while it's running.
 
-This option is ideal for profiling an ONNX model on different EPs or device policies.
+This option is ideal for profiling an ONNX model on different EPs.
 
-![Screenshot that shows how to start by model file](./images/profiling/by-model-file-config.png)
+![Screenshot that shows how to start by model file](./images/profiling/by-model-file-config-2.png)
 
 After profiling, a notification shows up to guide you to open or save the report.
 
@@ -56,28 +56,47 @@ The report contains detailed profiling statistics and results for the ONNX model
 
 ![Screenshot that shows the report data](./images/profiling/by-model-file-result.png)
 
-## Resource Usages view
+### Benchmark time for each operation
 
-In the main window, the plot on the top shows usage of CPU, GPU, NPU, and memory. The usage is updated every second, and kept for 10 minutes. You can use the tools on the top right to navigate the timeline by zooming in, zooming out, and panning.
+If OP Profiling is enabled, op level data will be generated to allow you to inspect the model in more detail.
 
-![Screenshot that shows the resource usages view](./images/profiling/resource-usage-view.png)
+![Screenshot that shows the succeeded notification with OP profiling enabled](./images/profiling/by-model-file-op-succeeded.png)
+
+The report contains detailed latencies for each op.
+
+![Screenshot that shows the report data for each OP](./images/profiling/by-model-file-op-result.png)
+
+## Profile an ONNX GenAI model
+
+In this mode, the profiling tool starts profiling an ONNX GenAI model on a target execution provider (EP) for a specified number of prompts. You can see the resource usage while it's running.
+
+![Screenshot that shows how to start by genai model](./images/profiling/by-genai-model-file-config.png)
+
+> [!NOTE]
+> You need to select the folder of the GenAI model, this is the folder that contains `genai_config.json`.
+
+## Resource Usage view
+
+In the main window, the plot on the top shows usage of CPU, GPU, NPU, and memory. The usage is updated every second, and is kept for 10 minutes. You can use the tools on the top right to navigate the timeline by zooming in, zooming out, and panning.
+
+![Screenshot that shows the resource usage view](./images/profiling/resource-usage-view.png)
 
 > [!NOTE]
 > This feature uses performance counters. To achieve higher accuracy, you could also try [Windows Performance Recorder](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder).
 
 ## Windows ML Events view
 
-In the main window, the plot on the bottom shows Windows ML events. Its timeline is synced with the Resource Usages view, so you can easily determine how resources are used when certain events occur.
+In the main window, the plot on the bottom shows Windows ML events. Its timeline is synchronized with the Resource Usage view, so you can easily determine how resources are used when certain events occur.
 
-> [!Important]
+> [!IMPORTANT]
 > To receive Windows ML events, the tool needs to be run in admin mode. If VS Code is not started in admin mode, a notification shows up and guides you to restart VS Code. You need to close all other VS Code instances to make the restart in admin mode work.
 > ![Screenshot that shows a notification to restart VS Code in admin mode](./images/profiling/events-view-admin.png)
 
-Currently, we only show events of the following types:
+Currently, only the following event types are shown:
 
 - Ensure ExecutionProvider Ready: when Windows ML is preparing the EP
 - Session Creation: when the session is created
-- Inference: when the model inferences on the session
+- Inference: when the model runs inference on the session
 
 ![Screenshot that shows the Windows ML events view](./images/profiling/events-view.png)
 
@@ -86,7 +105,7 @@ Currently, we only show events of the following types:
 In this article, you learned how to:
 
 - Start profiling in different ways
-- Inspect the Resource Usages view
+- Inspect the Resource Usage view
 - Inspect the Windows ML Events view
 
 ## See also