Skip to content

Profiling#30

Merged
Redth merged 10 commits intoRedth:mainfrom
michalpobuta:Performance_v2
Mar 8, 2026
Merged

Profiling#30
Redth merged 10 commits intoRedth:mainfrom
michalpobuta:Performance_v2

Conversation

@michalpobuta
Copy link
Copy Markdown
Contributor

This pull request introduces a comprehensive runtime profiling system for the MAUI DevFlow agent, enabling performance sampling, UI event tracing, and deep interaction analysis. The changes add new agent options for profiling, define contracts and interfaces for profiler data, and implement efficient ring buffer and session management for profiling data. The README is updated to document these new features and endpoints.

Profiling system additions:

  • Added new agent options in AgentOptions to enable profiling, configure sampling intervals, and control memory usage for samples, markers, and spans, as well as UI hook granularity.
  • Defined profiler contracts and data structures (ProfilerSessionInfo, ProfilerSample, ProfilerMarker, ProfilerSpan, ProfilerBatch, ProfilerHotspot, etc.) in ProfilerContracts.cs to standardize profiling data and requests.
  • Introduced interfaces for profiler collectors, marker publishers, and native frame stats providers to support extensible profiling and platform-specific frame timing. [1] [2] [3]

Profiling data storage and management:

  • Implemented a generic ring buffer (ProfilerRingBuffer<T>) for efficient, overwrite-on-full storage of profiling samples, markers, and spans.
  • Added a session store (ProfilerSessionStore) to manage active profiling sessions, aggregate and batch profiling data, and compute operation hotspots for performance analysis.

Documentation and API updates:

  • Updated README.md to document new agent options, explain profiler behavior, and list new profiling API endpoints for capabilities, session control, data polling, and hotspot analysis. [1] [2]

michalpobuta and others added 5 commits March 8, 2026 15:46
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@Redth Redth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #30 Review: Profiling System

Architecture is solid — the session/ring-buffer/collector/provider pattern with cursor-based polling is well-designed. The separation between Agent.Core (platform-agnostic) and platform INativeFrameStatsProvider implementations follows existing project conventions nicely. Here are the substantive findings:


🔴 Platform Native API Gaps

1. Android: Missing FrameMetrics API — The PR uses Choreographer.IFrameCallback which only measures inter-frame deltas (wall time between vsync callbacks). Android has had Window.OnFrameMetricsAvailableListener since API 24 that provides GPU-decomposed frame timings (layout, draw, sync, command issue, swap, total). This is far more actionable for diagnosing why frames are janky — not just that they are. The Choreographer approach cannot distinguish between a slow layout pass and a slow GPU draw.

2. iOS/Mac: CADisplayLink.Timestamp measures cadence, not render time. It tells you the vsync interval, not how long the app spent processing the frame. iOS has CAMetalLayer.nextDrawable timing and MetricKit (MXMetricManager) for frame hitches. On iOS 16+, MXSignpostMetric can give per-interaction render costs. The current implementation will report 60fps even when the GPU is dropping frames, as long as the main thread returns to the run loop on time.

3. Windows: No native provider at allNativeFrameStatsProviderFactory returns null for Windows. WinUI3 has CompositionTarget.Rendering which fires on each frame render, and DispatcherTimer can measure UI thread responsiveness. The Compositor.RequestCommitAsync() pattern can measure composition frame timing.

4. No native memory trackingGC.GetTotalMemory(false) only shows managed heap. Android has Debug.NativeHeapSize/Debug.NativeHeapAllocatedSize, iOS has task_info(mach_task_self()) for resident memory. For a MAUI app, native memory often dominates (images, platform views).


🟡 Design Concerns

5. Duplicate DTOs in DriverProfilerSample, ProfilerMarker, ProfilerSpan, etc. are fully duplicated between Agent.Core.Profiling and MauiDevFlow.Driver (with [JsonPropertyName] attributes). These will drift over time. Consider sharing via a contracts package, or at minimum having the Driver types reference the Core types.

6. Process.Refresh() called multiple times per sampleTryCollect calls _process.Refresh() implicitly through both TryReadCpuPercent and TryReadThreadCount. This is a syscall each time. Call it once at the start of TryCollect and cache the result.

7. Profiler loop → UI thread thrashingRunProfilerLoopAsync runs on a thread pool thread, but EnsureAutoUiHooks dispatches to the UI thread every iteration (throttled to 1s). This creates a lot of cross-thread marshaling. Consider running the hook-scanning part on a DispatcherTimer instead, or batching the UI thread work less frequently.

8. FrameStatsAccumulator list trimming — Uses List<double> with RemoveRange(0, n) to cap at 720 entries. This is O(n) copy on every trim. A circular buffer (like ProfilerRingBuffer already in this PR) would be more efficient for the accumulator too.

9. No unhooking of element events on stopAttachButtonHook, AttachCollectionViewHook, etc. subscribe to events, but StopAutoUiHooks only unhooks Shell navigation. If profiling restarts, the ConditionalWeakTable check (TryRegisterUiHook) prevents double-attach for live elements, but for elements that were removed and re-added to the tree, stale event handlers could accumulate. Consider tracking subscriptions more explicitly or unhooking on stop.


🟢 Good Patterns

  • ConditionalWeakTable for hook tracking prevents memory leaks — good choice.
  • Scroll batching with 220ms debounce and version-check flush is well-implemented.
  • Capability reporting via BuildProfilerCapabilitiesPayload with graceful degradation when native providers fail.
  • Cursor-based batch retrieval for incremental polling is the right pattern (avoids re-sending old data).
  • ThrowingNativeProvider test verifies graceful fallback — nice defensive test.

📋 Missing Pieces (for follow-up)

  • No CLI commands for profiling yet (start/stop/samples/hotspots)
  • No WebSocket streaming for real-time profiler data (like network monitoring already has)
  • No SKILL.md updates for profiling workflow
  • BuildProfilerCapabilitiesPayload is called on every /api/status response — could be lazily cached

Summary

The biggest wins would be: (1) Android FrameMetrics API for actionable GPU-decomposed timings, (2) native memory tracking on Android/iOS, and (3) a Windows frame stats provider. The estimated FPS from CADisplayLink timestamps will give misleadingly good numbers when the real bottleneck is GPU render time rather than main thread stalls.

michalpobuta and others added 2 commits March 8, 2026 17:03
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@Redth Redth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review (after 7e18bbe + 3d14f55)

Good progress addressing the feedback. Here's a thorough analysis of the updated code:

✅ Issues Addressed Well

1. Android FrameMetrics (was 🔴#1)AndroidFrameMetricsStatsProvider using Window.IOnFrameMetricsAvailableListener with FrameMetricsId.TotalDuration is the right call. The fallback to Choreographer for API <24 is clean. The second commit (3d14f55) correctly fixed the null Handler crash — AddOnFrameMetricsAvailableListener requires a non-null Handler on many devices.

2. Windows provider (was 🔴#3)WindowsCompositionFrameStatsProvider using CompositionTarget.Rendering + RenderingEventArgs.RenderingTime is correct and properly casts args to RenderingEventArgs to reject stale events. Good.

3. Native memory (was 🔴#4) — Android uses Debug.NativeHeapAllocatedSize (true native heap), iOS/Windows use Process.WorkingSet64 minus managed. Good that Android gets the real native heap. The WorkingSet64 - managedBytes fallback for other platforms is a reasonable approximation.

4. Process.Refresh() consolidation (was 🟡#6)TryRefreshProcessSnapshot() called once, result threaded through to TryReadCpuPercent, TryReadThreadCount, and TryReadNativeMemoryBytes. Clean.

5. FrameStatsAccumulator (was 🟡#8) — Switched from List<double> with RemoveRange(0, n) to Queue<double> with Dequeue(). O(1) trim. Good.

6. UI hook unhooking (was 🟡#9)_uiHookUnsubscribers list + generation-based invalidation in StopAutoUiHooks is a solid approach. The _uiHookGeneration check in TryRegisterUiHook ensures stale hooks are cleared on profiler restart. All Attach*Hook methods now pass an unsubscribe lambda. Well done.

7. DTO alignment test (was 🟡#5)ProfilerContractModels_StayAlignedWithDriverModels using reflection to assert Core properties exist in Driver is exactly the right mitigation for the duplicate DTO problem. This will catch drift at build time.

8. UI hook scan interval (was 🟡#7) — Increased from 1s to 3s (UiHookScanIntervalMs = 3000). Reduces cross-thread marshaling pressure.

9. ProvidesExactFrameTimings property — New interface member distinguishes Android FrameMetrics (exact total render pipeline time) from cadence-based providers (CADisplayLink, CompositionTarget, Choreographer). FrameQuality now reports "native.exact" vs "native.cadence". Good for consumer confidence scoring.


🟡 Remaining Observations

10. Duplicated helper methods within the same fileTryReadAndroidNativeMemoryBytes() appears identically in both AndroidFrameMetricsStatsProvider (line 198) and AndroidChoreographerFrameStatsProvider (line 303). Same for ResolveFrameBudgetMs() which appears in 3 providers. These could be extracted to a shared static helper (or just methods on FrameStatsAccumulator) to avoid copy-paste drift. Low priority since they're in the same file and simple.

11. iOS native memory uses Process.WorkingSet64 not task_info — On iOS/macOS, Process.WorkingSet64 may return 0 or throw on some .NET runtime versions because the POSIX /proc filesystem isn't available. The truly reliable iOS approach is P/Invoking task_info(mach_task_self(), TASK_VM_INFO, ...) for phys_footprint (Apple's recommended metric for memory attribution). However, WorkingSet64 works via Mono's sysctl implementation in .NET on Apple platforms, so this is fine for now — just worth noting if users report 0 or null native memory on iOS.

12. AttachCollectionViewHook — unsubscribe lambda captures wrong scope when first hook fails — If TryRegisterUiHook(collectionView, "CollectionView.SelectionChanged", ...) returns false (already registered), the method returns early and the CollectionView.Scrolled hook is never attempted. This is correct behavior. But if SelectionChanged succeeds but Scrolled fails (already registered from a previous generation), the SelectionChanged unsubscriber is registered but not the Scrolled one. On StopAutoUiHooks, SelectionChanged gets cleaned up but Scrolled doesn't — however the generation check in TryRegisterUiHook will clear it on next scan, so this is safe. Just noting the subtlety.

13. WindowsCompositionFrameStatsProvider hardcodes 60Hz frame budgetnew FrameStatsAccumulator(1000d / 60d) in the constructor. Unlike the Android/Apple providers that call ResolveFrameBudgetMs() to detect the actual display refresh rate, the Windows provider assumes 60Hz. Modern Windows machines often have 120Hz/144Hz displays. The jank threshold will be too lenient on high-refresh displays (24ms threshold instead of ~12.5ms for 120Hz).


📋 Summary

The update is substantially improved. All the major issues from the first review are addressed:

  • ✅ Android FrameMetrics for real render timing
  • ✅ Windows frame stats provider
  • ✅ Native memory tracking
  • ✅ Single Process.Refresh() call
  • ✅ Efficient Queue-based accumulator
  • ✅ Proper UI hook cleanup
  • ✅ DTO drift test

The remaining items (#10-#13) are minor. The only functional one worth fixing before merge is #13 (Windows hardcoded 60Hz) — it's a one-line change to call ResolveFrameBudgetMs() like the other providers do (which could be extracted as a static method on FrameStatsAccumulator). Everything else can be follow-up work.

michalpobuta and others added 2 commits March 8, 2026 19:22
Resolve Windows native frame budget from actual display refresh rate (with 60Hz fallback) and relax brittle RuntimeProfilerCollector metric assertions/timing to avoid CI flakiness.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Process.WorkingSet64 (RSS) with task_info(TASK_VM_INFO)
phys_footprint on iOS/macCatalyst. phys_footprint is the metric
Apple uses for memory warnings and OOM kills, making it more
accurate for mobile profiling than RSS.

The struct layout uses only fields through rev1 (stable since
iOS 7 / OS X 10.9). A unit test validates the struct layout at
runtime by allocating 20MB of native memory and asserting that
phys_footprint grows accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@Redth Redth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final Review — Remaining Items

The author's latest commit (846e840) fixed the Windows hardcoded 60Hz frame budget and stabilized the test assertions. All previously identified issues are now resolved. Here are the remaining items I found:


🟡 CancellationTokenSource disposed before awaiting the loop task

File: DevFlowAgentService.cs lines 1914-1924

if (cts != null)
{
    cts.Cancel();
    cts.Dispose();   // disposed here
}

if (loopTask != null)
{
    try
    {
        await loopTask;  // loop might still be referencing the token
    }

The Task.Delay(intervalMs, ct) in RunProfilerLoopAsync may still be referencing the token when it's disposed. On .NET this is generally safe (the token is a struct copy), but the CancellationTokenSource disposal can race with the token registration callback. Move cts.Dispose() to after await loopTask to be safe.


🟡 Dispose() doesn't await the profiler loop

File: DevFlowAgentService.cs lines 2987-3000

public void Dispose()
{
    _profilerLoopCts?.Cancel();
    _profilerLoopCts?.Dispose();
    _profilerCollector.Stop();      // loop might still call TryCollect()
    _profilerStateGate.Dispose();   // loop might still be running

The synchronous Dispose() cancels the CTS but doesn't wait for the loop task to exit. The loop (line 1946) could call _profilerCollector.TryCollect() concurrently with Stop() on line 2995, or try to acquire the already-disposed _profilerStateGate. Consider using StopProfilerAsync().GetAwaiter().GetResult() in Dispose, or at minimum doing a brief _profilerLoopTask?.Wait(TimeSpan.FromSeconds(2)) before disposing resources.


🟡 Native provider Start() sets _running = true before async dispatch succeeds

File: NativeFrameStatsProviderFactory.cs — all four providers

The Android FrameMetrics provider sets _running = true (line 143) before MainThread.BeginInvokeOnMainThread dispatches to the UI thread. If the dispatched work throws (e.g., Platform.CurrentActivity is null), the exception is swallowed by BeginInvokeOnMainThread and _running stays true — the provider is in an inconsistent state. This applies to all four native providers.

Not a critical issue since RuntimeProfilerCollector.Start() already catches InvalidOperationException from provider start and falls back to estimated timing. But it would be cleaner to set _running = true inside the dispatched lambda after successful registration.


🟢 Minor: INativeFrameStatsProvider not disposed by collector

RuntimeProfilerCollector receives an INativeFrameStatsProvider in its constructor and calls Start()/Stop() but never Dispose(). Since the provider implements IDisposable, the collector should also implement IDisposable and clean up the provider. Currently Stop() on each provider also unregisters callbacks, so in practice there's no leak — but it's an API contract gap.


🟢 Minor: Duplicated static helpers

TryReadAndroidNativeMemoryBytes() is identically defined in both AndroidFrameMetricsStatsProvider (line 198) and AndroidChoreographerFrameStatsProvider (line 303). Same for ResolveFrameBudgetMs() appearing in 3 providers. Could be extracted to a shared static method. Very low priority since they're in the same file and simple.


Summary

The PR is in good shape. The first two items (🟡) are the only ones with real potential for runtime issues — both are race conditions in the shutdown path. Everything else is defensive hardening or minor cleanup.

- Move CTS.Dispose() after awaiting the loop task in StopProfilerAsync
  to prevent ObjectDisposedException on the cancellation token
- Dispose() now waits up to 3s for the profiler loop to exit before
  disposing resources, preventing races with TryCollect/semaphore
- RuntimeProfilerCollector implements IDisposable, disposes the native
  frame stats provider
- All native providers (FrameMetrics, Choreographer, CADisplayLink,
  CompositionTarget) now set _running=true inside the dispatched
  lambda after successful registration, not before — prevents
  inconsistent state if BeginInvokeOnMainThread work fails

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Redth Redth merged commit b9cc25d into Redth:main Mar 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants