Skip to content

Performance improvements made by copilot#1383

Draft
nturinski wants to merge 2 commits intomainfrom
nat/perfImprovementsByCopilot
Draft

Performance improvements made by copilot#1383
nturinski wants to merge 2 commits intomainfrom
nat/perfImprovementsByCopilot

Conversation

@nturinski
Copy link
Member

@nturinski nturinski commented Feb 25, 2026

I did an experiment by implementing a telemetry profiler to see which commands were taking a long time to run and used it to create prompts to get Copilot to improve perf.

Most of it involve parallelizing await async code, but the double telemetry wrapping was an interesting one. I'm not 100% if it's correct in that we don't need to wrap it again, but it made sense to me.

It also made a rudimentary cache since Managed Identity nodes have to leverage the getResource call as well.

Also, wasn't sure about batching all of the tenant subscriptions calls into one telemetry event. Could possibly break someone's dashboard.

Here's some time differences between fixes:
image

image image image image image image

@nturinski nturinski requested a review from a team as a code owner February 25, 2026 21:46
Copilot AI review requested due to automatic review settings February 25, 2026 21:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on reducing latency in Azure Resources tree/commands by parallelizing independent async calls, removing redundant telemetry wrappers, batching subscription telemetry, and adding a short-lived cache for ARM resource listings.

Changes:

  • Batch subscription telemetry events to avoid per-subscription telemetry calls during tree load.
  • Remove an extra callWithTelemetryAndErrorHandling layer from ResourceTreeDataProviderBase.getTreeItem to avoid double-wrapping overhead.
  • Introduce a TTL cache for listResources (ARM resources.list) plus cache invalidation on common refresh/sign-in commands; parallelize a few independent awaits.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/tree/azure/AzureResourceTreeDataProvider.ts Batches subscription telemetry during subscription list processing.
src/tree/ResourceTreeDataProviderBase.ts Skips telemetry wrapping in getTreeItem to avoid double-wrapping overhead.
src/services/AzureResourcesService.ts Adds TTL caching for listResources, exposes invalidation, and memoizes the default service instance.
src/managedIdentity/ManagedIdentityItem.ts Parallelizes MSI identity fetch with subscription resource listing.
src/commands/registerCommands.ts Invalidates the new resource-list cache on top-level refresh/sign-in flows.
src/cloudConsole/cloudConsole.ts Parallelizes tenant enumeration across accounts.
src/api/DefaultAzureResourceProvider.ts Parallelizes resource + resource group listing per subscription.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 45 to 59
registerCommand('azureResourceGroups.refreshTree', () => {
ext.setClearCacheOnNextLoad();
invalidateResourceListCache();
ext.actions.refreshAzureTree();
});
registerCommand('azureWorkspace.refreshTree', () => ext.actions.refreshWorkspaceTree());
registerCommand('azureFocusView.refreshTree', () => {
ext.setClearCacheOnNextLoad();
invalidateResourceListCache();
ext.actions.refreshFocusTree();
});
registerCommand('azureTenantsView.refreshTree', () => {
ext.setClearCacheOnNextLoad();
invalidateResourceListCache();
ext.actions.refreshTenantTree();
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateResourceListCache() is wired to the title-bar refresh commands, but the more common per-node refresh command (azureResourceGroups.refresh) can still re-render using cached ARM resources for up to the TTL window. Consider also invalidating (globally or per subscription) when handling node refresh so user-initiated refresh reliably forces a refetch.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also skeptical about doing this change. Seems way too simplistic and as it mentions, there's a bunch of ways that it can refresh resources outside of the ribbon, making this listCache invalid.

});
} else {
// batch telemetry into a single event instead of one per subscription
void callWithTelemetryAndErrorHandling('azureResourceGroups.getSubscriptions', async (telemetryContext: IActionContext) => {
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new telemetry event ID (azureResourceGroups.getSubscriptions) where previously the code emitted one event per subscription with a different callbackId. If any telemetry dashboards depend on the prior event name/property shape, consider a compatibility strategy (e.g., dual-emitting for a while, or keeping the original callbackId).

Suggested change
void callWithTelemetryAndErrorHandling('azureResourceGroups.getSubscriptions', async (telemetryContext: IActionContext) => {
void callWithTelemetryAndErrorHandling('updateSubscriptionsAndTenants', async (telemetryContext: IActionContext) => {

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +126
// Branch data provider items already have telemetry wrapping via
// wrapFunctionsInTelemetry ('branchDataProvider.getTreeItem'), so we skip
// the extra callWithTelemetryAndErrorHandling layer here to avoid
// double-wrapping overhead (~5-10ms per call x hundreds of items).
return await element.getTreeItem();
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTreeItem no longer wraps non-branch items in callWithTelemetryAndErrorHandling, which drops the existing getTreeItem telemetry event and changes error-handling settings for items that aren't wrapped by wrapFunctionsInTelemetry (e.g., built-in tree items like GenericItem, SubscriptionItem, etc.). Consider keeping telemetry wrapping for non-BranchDataItemWrapper items and only skipping the extra wrapping for branch-provider items to avoid double-wrapping while preserving the previous behavior for the rest of the tree.

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +173
// batch telemetry into a single event instead of one per subscription
void callWithTelemetryAndErrorHandling('azureResourceGroups.getTenantFilteredSubscriptions', async (telemetryContext: IActionContext) => {
telemetryContext.telemetry.measurements.subscriptionCount = tenantFilteredSubcriptions.length;
telemetryContext.telemetry.properties.subscriptionIds = tenantFilteredSubcriptions.map(s => s.subscriptionId).join(',');
});
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new telemetry event ID (azureResourceGroups.getTenantFilteredSubscriptions) where previously the code emitted one event per subscription with a different callbackId. If dashboards/queries depend on the prior event name/property shape, this will be a breaking telemetry change; consider preserving compatibility (e.g., emit both old + new telemetry during a transition or keep the old callbackId and add a subscriptionCount measurement).

Suggested change
// batch telemetry into a single event instead of one per subscription
void callWithTelemetryAndErrorHandling('azureResourceGroups.getTenantFilteredSubscriptions', async (telemetryContext: IActionContext) => {
telemetryContext.telemetry.measurements.subscriptionCount = tenantFilteredSubcriptions.length;
telemetryContext.telemetry.properties.subscriptionIds = tenantFilteredSubcriptions.map(s => s.subscriptionId).join(',');
});
// batch telemetry into a single event
void callWithTelemetryAndErrorHandling('azureResourceGroups.getTenantFilteredSubscriptions', async (telemetryContext: IActionContext) => {
telemetryContext.telemetry.measurements.subscriptionCount = tenantFilteredSubcriptions.length;
telemetryContext.telemetry.properties.subscriptionIds = tenantFilteredSubcriptions.map(s => s.subscriptionId).join(',');
});
// For backward compatibility, also emit one telemetry event per subscription
for (const subscription of tenantFilteredSubcriptions) {
void callWithTelemetryAndErrorHandling('azureResourceGroups.getTenantFilteredSubscriptions', async (telemetryContext: IActionContext) => {
telemetryContext.telemetry.properties.subscriptionId = subscription.subscriptionId;
});
}

Copilot uses AI. Check for mistakes.
// batch telemetry into a single event instead of one per subscription
void callWithTelemetryAndErrorHandling('azureResourceGroups.getTenantFilteredSubscriptions', async (telemetryContext: IActionContext) => {
telemetryContext.telemetry.measurements.subscriptionCount = tenantFilteredSubcriptions.length;
telemetryContext.telemetry.properties.subscriptionIds = tenantFilteredSubcriptions.map(s => s.subscriptionId).join(',');
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telemetryContext.telemetry.properties.subscriptionIds concatenates all subscription IDs into a single string. Unlike getMetricsForTelemetry() (used later in this file), there’s no truncation/incomplete flag here, so accounts with many subscriptions can easily exceed telemetry property size limits and cause the event to be dropped. Consider truncating the list (and/or sending only a count + a bounded sample) and adding an isIncomplete flag similar to subscriptionidlistisincomplete.

Suggested change
telemetryContext.telemetry.properties.subscriptionIds = tenantFilteredSubcriptions.map(s => s.subscriptionId).join(',');
// Limit the number of subscription IDs sent in telemetry to avoid exceeding
// telemetry property size limits. Also indicate if the list was truncated.
const allSubscriptionIds: string[] = tenantFilteredSubcriptions.map(s => s.subscriptionId);
const maxSubscriptionIdsForTelemetry = 50;
const sampledSubscriptionIds = allSubscriptionIds.slice(0, maxSubscriptionIdsForTelemetry);
telemetryContext.telemetry.properties.subscriptionIds = sampledSubscriptionIds.join(',');
telemetryContext.telemetry.properties.subscriptionidlistisincomplete = String(allSubscriptionIds.length > maxSubscriptionIdsForTelemetry);

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +38
function getCachedResources(subscriptionId: string): GenericResource[] | undefined {
const entry = resourceListCache.get(subscriptionId);
if (entry && Date.now() < entry.expiresAt) {
return entry.data;
}
// Expired or missing — clean up
resourceListCache.delete(subscriptionId);
return undefined;
}

function setCachedResources(subscriptionId: string, data: GenericResource[]): void {
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTL cache never proactively purges expired entries; if a user loads many subscriptions once, the map can retain lots of expired resource arrays until explicitly invalidated, which can lead to unbounded memory growth over a long session. Consider adding a lightweight periodic sweep, a max-size/LRU policy, or purging a few expired entries on every setCachedResources/getCachedResources call.

Suggested change
function getCachedResources(subscriptionId: string): GenericResource[] | undefined {
const entry = resourceListCache.get(subscriptionId);
if (entry && Date.now() < entry.expiresAt) {
return entry.data;
}
// Expired or missing — clean up
resourceListCache.delete(subscriptionId);
return undefined;
}
function setCachedResources(subscriptionId: string, data: GenericResource[]): void {
/**
* Opportunistically purge a limited number of expired entries from the cache.
* This keeps memory usage bounded over long sessions without a background timer.
*/
function purgeExpiredCacheEntries(maxToPurge: number): void {
if (maxToPurge <= 0 || resourceListCache.size === 0) {
return;
}
const now = Date.now();
let purged = 0;
for (const [subscriptionId, entry] of resourceListCache) {
if (entry.expiresAt <= now) {
resourceListCache.delete(subscriptionId);
purged++;
if (purged >= maxToPurge) {
break;
}
}
}
}
function getCachedResources(subscriptionId: string): GenericResource[] | undefined {
// Opportunistically clean up a few expired entries on every read.
purgeExpiredCacheEntries(10);
const entry = resourceListCache.get(subscriptionId);
if (entry && Date.now() < entry.expiresAt) {
return entry.data;
}
// Expired or missing — clean up this specific key.
resourceListCache.delete(subscriptionId);
return undefined;
}
function setCachedResources(subscriptionId: string, data: GenericResource[]): void {
// Opportunistically clean up a few expired entries on every write.
purgeExpiredCacheEntries(10);

Copilot uses AI. Check for mistakes.
Comment on lines 67 to +77
async listResources(context: IActionContext, subscription: AzureSubscription): Promise<GenericResource[]> {
// Check TTL cache first to avoid duplicate ARM calls across callers
const cached = getCachedResources(subscription.subscriptionId);
if (cached) {
return cached;
}

const client = await createClient(context, subscription);
return uiUtils.listAllIterator(client.resources.list());
const result = await uiUtils.listAllIterator(client.resources.list());
setCachedResources(subscription.subscriptionId, result);
return result;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache only helps after the first request completes; concurrent listResources calls for the same subscription will still issue duplicate ARM requests because there's no in-flight deduplication. If the goal is to avoid redundant calls across multiple callers, consider caching a promise (or a {promise,data} entry) so simultaneous requests share the same underlying ARM call.

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +172
telemetryContext.telemetry.measurements.subscriptionCount = tenantFilteredSubcriptions.length;
telemetryContext.telemetry.properties.subscriptionIds = tenantFilteredSubcriptions.map(s => s.subscriptionId).join(',');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will potentially impact Kusto queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally--how badly do we need this? We have telemetry already for subscription info (updateSubscriptionsAndTenants)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think somebody asked us for subscriptionIds at some point, but not really sure how important it is that we report it exactly like this. The difference in perf is probably not huge, especially for most users, so I'm not super gungho about introducing this change.

Comment on lines +212 to +213
telemetryContext.telemetry.measurements.subscriptionCount = subscriptions.length;
telemetryContext.telemetry.properties.subscriptionIds = subscriptions.map(s => s.subscriptionId).join(',');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally--how badly do we need this? We have telemetry already for subscription info (updateSubscriptionsAndTenants)

@nturinski
Copy link
Member Author

Sorry, this is a little bit of AI slop in here. I think the parallel promises I'll probably keep, but I mostly just wanted some visual aids for when we have to present our topics on Monday. I'm changing this to a draft PR.

@nturinski nturinski marked this pull request as draft February 25, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants