Skip to content

Consider disabling or reducing telemetry flush for print-env-script and investigate SDK telemetry data loss after flush timeout reduction #54753

@nagilson

Description

@nagilson

Summary

Following the discussion in #54733, we reduced the dotnetup telemetry flush timeout to 10ms for interactive scenarios and 2000ms for one-and-done/CI environments. This resolved the immediate user-perceived latency issue, but raises follow-up questions about telemetry strategy.

Items to investigate

  1. Disable print-env-script telemetry entirely or use a minimal flush timeout there.
    The print-env-script path is invoked on every new shell session (e.g. shell profile integration). Sending telemetry on that path adds latency to shell startup. We should consider either:

    • Disabling telemetry completely for print-env-script invocations, or
    • Using a very low (e.g. 10ms) flush timeout specifically for that code path, while allowing a higher budget for genuinely interactive commands.
  2. Use a higher flush timeout by default in interactive scenarios.
    The current 10ms interactive timeout means we will almost never successfully flush telemetry on shutdown for interactive runs. We should evaluate whether a slightly higher timeout (e.g. 50-100ms) is acceptable without impacting perceived latency, or whether we accept relying on deferred/startup-drain patterns.

  3. Investigate telemetry data loss after the .NET SDK switched its default flush timeout.
    The .NET SDK CLI previously made a similar change to reduce flush timeouts. We should look at the telemetry data around that transition to determine:

    • Whether there was a measurable reduction in telemetry volume/coverage.
    • Whether the offline store drain-on-next-startup pattern adequately compensates.
    • What lessons apply to dotnetup's telemetry strategy.

Context

  • PR: Reduce dotnetup flush delay #54733
  • Median App Insights POST latency is ~198ms (direct regional endpoint), with p90 ~246ms and tail outliers up to ~1.3s.
  • OTLP export was found to be unexpectedly enabled and adding ~1100ms median; that is being gated separately.
  • The global App Insights endpoint adds ~50-60ms due to a 307 redirect to the regional endpoint.

Related

/cc @baronfel

Metadata

Metadata

Assignees

No one assigned

    Labels

    dotnetupWork items around the proposed `dotnetup` bootstrapper/toolchain management tool and library

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions