Skip to content

OCPBUGS-81521: Adapt dashboard Prometheus polling interval based on query response time#16441

Open
stefanonardo wants to merge 1 commit into
openshift:mainfrom
stefanonardo:OCPBUGS-81521
Open

OCPBUGS-81521: Adapt dashboard Prometheus polling interval based on query response time#16441
stefanonardo wants to merge 1 commit into
openshift:mainfrom
stefanonardo:OCPBUGS-81521

Conversation

@stefanonardo
Copy link
Copy Markdown
Contributor

@stefanonardo stefanonardo commented May 14, 2026

Analysis / Root cause:
Dashboard Prometheus queries poll every 15 seconds regardless of cluster size. On large clusters,
these queries are expensive (more time series to aggregate), creating unnecessary load on the
Prometheus/Thanos monitoring stack.

Solution description:
Replace the hardcoded 15s polling delay in fetchPeriodically with an adaptive interval derived
from an Exponential Moving Average (EMA) of query response times. Fast clusters (~500ms responses)
stay at the 15s floor, while slow/large clusters automatically back off up to 60s.

  • New utility: adaptive-polling.ts with computeAdaptiveDelay() and emaToDelay()
  • Modified: dashboards.ts fetchPeriodically to measure fetch duration and compute adaptive delay
  • EMA state is per-query, passed through recursive calls (no Redux state changes)
  • No changes to exported types — no SDK/public API impact

Screenshots / screen recording:
N/A — no visual changes. Polling interval changes are observable in browser DevTools Network tab.

Test setup:
No special setup required.

Test cases:

  • Unit tests for computeAdaptiveDelay and emaToDelay (boundary values, EMA smoothing, NaN guards)
  • Integration tests for fetchPeriodically adaptive behavior (fast response, slow response, error backoff)
  • Manual: Open cluster dashboard, observe Prometheus request intervals in DevTools Network tab

Browser conformance:

  • Chrome
  • Firefox
  • Safari (or Epiphany on Linux)

Additional info:
Jira: https://redhat.atlassian.net/browse/OCPBUGS-81521

Summary by CodeRabbit

  • New Features

    • Dashboard data polling now intelligently adapts refresh rates based on response times for improved performance and efficiency.
  • Tests

    • Enhanced E2E test infrastructure with improved setup and teardown orchestration.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@stefanonardo: This pull request references Jira Issue OCPBUGS-81521, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Analysis / Root cause:
Dashboard Prometheus queries poll every 15 seconds regardless of cluster size. On large clusters,
these queries are expensive (more time series to aggregate), creating unnecessary load on the
Prometheus/Thanos monitoring stack.

Solution description:
Replace the hardcoded 15s polling delay in fetchPeriodically with an adaptive interval derived
from an Exponential Moving Average (EMA) of query response times. Fast clusters (~500ms responses)
stay at the 15s floor, while slow/large clusters automatically back off up to 60s.

  • New utility: adaptive-polling.ts with computeAdaptiveDelay() and emaToDelay()
  • Modified: dashboards.ts fetchPeriodically to measure fetch duration and compute adaptive delay
  • EMA state is per-query, passed through recursive calls (no Redux state changes)
  • No changes to exported types — no SDK/public API impact

Screenshots / screen recording:
N/A — no visual changes. Polling interval changes are observable in browser DevTools Network tab.

Test setup:
No special setup required.

Test cases:

  • Unit tests for computeAdaptiveDelay and emaToDelay (boundary values, EMA smoothing, NaN guards)
  • Integration tests for fetchPeriodically adaptive behavior (fast response, slow response, error backoff)
  • Manual: Open cluster dashboard, observe Prometheus request intervals in DevTools Network tab

Browser conformance:

  • Chrome
  • Firefox
  • Safari (or Epiphany on Linux)

Additional info:
Jira: https://redhat.atlassian.net/browse/OCPBUGS-81521

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from rhamilto and spadgett May 14, 2026 10:15
@openshift-ci openshift-ci Bot added the component/core Related to console core functionality label May 14, 2026
@stefanonardo
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@stefanonardo: This pull request references Jira Issue OCPBUGS-81521, which is invalid:

  • expected the bug to target only the "5.0.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@stefanonardo
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@stefanonardo: This pull request references Jira Issue OCPBUGS-81521, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@stefanonardo: This pull request references Jira Issue OCPBUGS-81521, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Analysis / Root cause:
Dashboard Prometheus queries poll every 15 seconds regardless of cluster size. On large clusters,
these queries are expensive (more time series to aggregate), creating unnecessary load on the
Prometheus/Thanos monitoring stack.

Solution description:
Replace the hardcoded 15s polling delay in fetchPeriodically with an adaptive interval derived
from an Exponential Moving Average (EMA) of query response times. Fast clusters (~500ms responses)
stay at the 15s floor, while slow/large clusters automatically back off up to 60s.

  • New utility: adaptive-polling.ts with computeAdaptiveDelay() and emaToDelay()
  • Modified: dashboards.ts fetchPeriodically to measure fetch duration and compute adaptive delay
  • EMA state is per-query, passed through recursive calls (no Redux state changes)
  • No changes to exported types — no SDK/public API impact

Screenshots / screen recording:
N/A — no visual changes. Polling interval changes are observable in browser DevTools Network tab.

Test setup:
No special setup required.

Test cases:

  • Unit tests for computeAdaptiveDelay and emaToDelay (boundary values, EMA smoothing, NaN guards)
  • Integration tests for fetchPeriodically adaptive behavior (fast response, slow response, error backoff)
  • Manual: Open cluster dashboard, observe Prometheus request intervals in DevTools Network tab

Browser conformance:

  • Chrome
  • Firefox
  • Safari (or Epiphany on Linux)

Additional info:
Jira: https://redhat.atlassian.net/browse/OCPBUGS-81521

Summary by CodeRabbit

  • New Features

  • Dashboard data polling now intelligently adapts refresh rates based on response times for improved performance and efficiency.

  • Tests

  • Enhanced E2E test infrastructure with improved setup and teardown orchestration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

This pull request refactors the Playwright E2E test infrastructure from global setup/teardown hooks into explicit modular projects (cluster-setup, admin-auth, developer-auth, teardown). It extracts shared login logic into reusable helpers, adds a developer perspective smoke test, and updates the Playwright configuration to reflect the new dependency graph. Additionally, dashboard data fetching is enhanced with adaptive polling that adjusts delays based on response times via exponential moving average, replacing fixed-delay retry behavior.

Suggested reviewers

  • spadgett
  • sg00dwin
  • vikram-raj
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed Title directly references the main change: adaptive Prometheus polling based on query response time, with clear jira prefix and concise phrasing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Custom check not applicable. PR contains zero Ginkgo tests. All test names use stable, deterministic static strings with no dynamic values.
Test Structure And Quality ✅ Passed Check is designed for Ginkgo (Go) tests. PR contains only Playwright and Jest (TypeScript) tests. Check is not applicable.
Microshift Test Compatibility ✅ Passed Custom check not applicable. PR adds TypeScript/JavaScript tests (Jest, Playwright) only, not Ginkgo e2e tests required by this check.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Check not applicable. PR adds TypeScript/JavaScript tests (Playwright e2e, Jest units), not Ginkgo e2e tests. SNO compatibility applies to Go-based origin tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains frontend code and E2E tests only. No deployment manifests, operator code, or K8s scheduling constraints present.
Ote Binary Stdout Contract ✅ Passed OTE Binary Stdout Contract applies to Go code only. This PR modifies only frontend TypeScript/JavaScript files, no Go code changes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests added in this PR. All changes are TypeScript/JavaScript (Playwright and Jest). The check only applies to Ginkgo tests, so it is not applicable here.
Description check ✅ Passed PR description is comprehensive, well-structured, and covers all required sections: root cause analysis, solution with technical details, test cases, browser conformance, and Jira reference.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/playwright.config.ts (1)

131-141: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align developer project creation with the same credential checks used by setup.

Developer projects are enabled from a username-only flag, while developer-auth.setup.ts skips unless both username and password exist. This mismatch can produce developer projects without valid auth state.

Suggested patch
-const hasDeveloper = !!process.env.BRIDGE_HTPASSWD_USERNAME;
+const hasDeveloper =
+  !!process.env.BRIDGE_HTPASSWD_USERNAME && !!process.env.BRIDGE_HTPASSWD_PASSWORD;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/playwright.config.ts` around lines 131 - 141, The developer project
generation uses hasDeveloper (username-only) which can create projects without
valid auth; change the conditional that yields the developer entries to require
the same credentials check used in developer-auth.setup.ts (i.e., ensure both
developer username and developer password exist) or reuse the shared helper/flag
from developer-auth.setup.ts instead of hasDeveloper; update the condition
surrounding devPackages mapping (and any references to developerStorageState) so
developer projects are only created when both username and password are present.
🧹 Nitpick comments (1)
frontend/public/actions/__tests__/dashboards.spec.ts (1)

165-174: ⚡ Quick win

Strengthen the error-backoff assertion to actually prevent ceiling jumps.

This test currently allows MAX_POLL_DELAY, which conflicts with its stated intent. Seed with a fast success first, then force an error and assert the second delay increases but stays strictly below max.

Proposed test tightening
-    it('backs off on fetch error without jumping to MAX_POLL_DELAY', async () => {
-      const fetchMock = jest.fn().mockRejectedValueOnce(new Error('network error'));
+    it('backs off on fetch error without jumping to MAX_POLL_DELAY', async () => {
+      const fetchMock = jest
+        .fn()
+        .mockResolvedValueOnce({ data: 'test' })
+        .mockRejectedValueOnce(new Error('network error'));
       setupWatchURL(fetchMock);
 
       await flushPromises();
+      const firstTimeout = setTimeoutSpy.mock.calls[setTimeoutSpy.mock.calls.length - 1];
+      const firstDelay = firstTimeout[1] as number;
+
+      const nextPoll = firstTimeout[0] as (...args: unknown[]) => unknown;
+      nextPoll();
+      await flushPromises();
 
       const lastSetTimeout = setTimeoutSpy.mock.calls[setTimeoutSpy.mock.calls.length - 1];
-      expect(lastSetTimeout[1]).toBeGreaterThan(MIN_POLL_DELAY);
-      expect(lastSetTimeout[1]).toBeLessThanOrEqual(MAX_POLL_DELAY);
+      expect(lastSetTimeout[1]).toBeGreaterThan(firstDelay);
+      expect(lastSetTimeout[1]).toBeLessThan(MAX_POLL_DELAY);
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/public/actions/__tests__/dashboards.spec.ts` around lines 165 - 174,
The test currently allows the backoff to equal MAX_POLL_DELAY which contradicts
its intent; update the test in dashboards.spec.ts to first seed a fast
successful poll (call setupWatchURL with a fetch mock that resolves once
quickly), then trigger a rejection (mockRejectedValueOnce) so the backoff
increases from the previous delay, and assert the subsequent timeout delay
(inspect setTimeoutSpy.mock.calls[...] like in the existing test) is greater
than the prior delay and strictly less than MAX_POLL_DELAY (use <
MAX_POLL_DELAY, not <=), while still being > MIN_POLL_DELAY; keep references to
setupWatchURL, setTimeoutSpy, flushPromises, MIN_POLL_DELAY and MAX_POLL_DELAY
when locating and changing the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/e2e/setup/teardown.setup.ts`:
- Around line 24-50: The teardown leaves CONFIG_FILE (.test-config.json) on
disk; wrap the config read and namespace deletion logic in a try/finally so the
finally always runs and removes CONFIG_FILE regardless of early returns or
errors. Specifically, keep reading into testNamespace/authToken/kubeConfigPath
and running KubernetesClient.deleteNamespace and waitForNamespaceDeleted as
before (referencing CONFIG_FILE, testNamespace, authToken, kubeConfigPath, and
KubernetesClient), but move the current early returns into the try and perform
fs.unlinkSync or fs.rmSync(CONFIG_FILE) inside the finally (guarded by
fs.existsSync and swallowing/logging any unlink errors) so the file is always
removed after teardown completes.

In `@frontend/public/actions/dashboards.ts`:
- Around line 86-94: The catch path currently seeds the EMA with MAX_POLL_DELAY
/ SCALE_FACTOR when responseTimeEma is zero, which can jump retries straight to
MAX_POLL_DELAY; modify the logic around computeAdaptiveDelay so that when
responseTimeEma === 0 you seed the EMA from the "floor" value (the
minimal/steady-state equivalent) instead of MAX_POLL_DELAY / SCALE_FACTOR (e.g.
use a MIN_POLL_DELAY-based seed or the floor EMA value), keeping all calls and
variables (computeAdaptiveDelay, responseTimeEma, MAX_POLL_DELAY, SCALE_FACTOR,
emaToDelay, fetchPeriodically) intact; ensure nextEma is computed from that
floor-seeded value so the first failure backs off conservatively rather than
immediately scheduling the max delay.

---

Outside diff comments:
In `@frontend/playwright.config.ts`:
- Around line 131-141: The developer project generation uses hasDeveloper
(username-only) which can create projects without valid auth; change the
conditional that yields the developer entries to require the same credentials
check used in developer-auth.setup.ts (i.e., ensure both developer username and
developer password exist) or reuse the shared helper/flag from
developer-auth.setup.ts instead of hasDeveloper; update the condition
surrounding devPackages mapping (and any references to developerStorageState) so
developer projects are only created when both username and password are present.

---

Nitpick comments:
In `@frontend/public/actions/__tests__/dashboards.spec.ts`:
- Around line 165-174: The test currently allows the backoff to equal
MAX_POLL_DELAY which contradicts its intent; update the test in
dashboards.spec.ts to first seed a fast successful poll (call setupWatchURL with
a fetch mock that resolves once quickly), then trigger a rejection
(mockRejectedValueOnce) so the backoff increases from the previous delay, and
assert the subsequent timeout delay (inspect setTimeoutSpy.mock.calls[...] like
in the existing test) is greater than the prior delay and strictly less than
MAX_POLL_DELAY (use < MAX_POLL_DELAY, not <=), while still being >
MIN_POLL_DELAY; keep references to setupWatchURL, setTimeoutSpy, flushPromises,
MIN_POLL_DELAY and MAX_POLL_DELAY when locating and changing the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1e3997f8-4837-4ba4-b681-3b92dd163d9c

📥 Commits

Reviewing files that changed from the base of the PR and between 6701c23 and bd5db08.

📒 Files selected for processing (14)
  • frontend/e2e/global.setup.ts
  • frontend/e2e/global.teardown.ts
  • frontend/e2e/setup/admin-auth.setup.ts
  • frontend/e2e/setup/cluster.setup.ts
  • frontend/e2e/setup/developer-auth.setup.ts
  • frontend/e2e/setup/login-helper.ts
  • frontend/e2e/setup/teardown.setup.ts
  • frontend/e2e/tests/smoke/developer/smoke-test.spec.ts
  • frontend/package.json
  • frontend/playwright.config.ts
  • frontend/public/actions/__tests__/dashboards.spec.ts
  • frontend/public/actions/dashboards.ts
  • frontend/public/components/utils/__tests__/adaptive-polling.spec.ts
  • frontend/public/components/utils/adaptive-polling.ts
💤 Files with no reviewable changes (2)
  • frontend/e2e/global.setup.ts
  • frontend/e2e/global.teardown.ts
📜 Review details
🔇 Additional comments (9)
frontend/public/components/utils/__tests__/adaptive-polling.spec.ts (1)

1-98: LGTM!

frontend/public/components/utils/adaptive-polling.ts (1)

1-34: LGTM!

frontend/e2e/setup/cluster.setup.ts (1)

13-65: LGTM!

frontend/e2e/setup/login-helper.ts (1)

9-50: LGTM!

frontend/e2e/setup/admin-auth.setup.ts (1)

9-18: LGTM!

frontend/e2e/setup/developer-auth.setup.ts (1)

9-22: LGTM!

frontend/playwright.config.ts (1)

30-126: LGTM!

frontend/e2e/tests/smoke/developer/smoke-test.spec.ts (1)

3-8: LGTM!

frontend/package.json (1)

57-57: LGTM!

Comment on lines +24 to +50
try {
const config = JSON.parse(fs.readFileSync(CONFIG_FILE, 'utf-8'));
testNamespace = config.testNamespace;
kubeConfigPath = config.kubeConfigPath;
authToken = config.authToken;
} catch {
return;
}

if (!testNamespace) {
return;
}

const client = new KubernetesClient(
{
clusterUrl: process.env.CLUSTER_URL || '',
username: process.env.OPENSHIFT_USERNAME || 'kubeadmin',
password: process.env.BRIDGE_KUBEADMIN_PASSWORD || '',
token: authToken,
},
kubeConfigPath,
);

await client.deleteNamespace(testNamespace);
const deleted = await client.waitForNamespaceDeleted(testNamespace, 120_000);
expect(deleted, `Namespace ${testNamespace} should be deleted within 120s`).toBe(true);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Delete .test-config.json after teardown completes.

This file contains a bearer token and currently persists after cleanup. Remove it in a finally block to reduce secret-retention risk.

Suggested patch
 teardown('delete test namespace', async () => {
@@
-  let testNamespace: string | undefined;
-  let kubeConfigPath: string | undefined;
-  let authToken: string | undefined;
-
-  try {
-    const config = JSON.parse(fs.readFileSync(CONFIG_FILE, 'utf-8'));
-    testNamespace = config.testNamespace;
-    kubeConfigPath = config.kubeConfigPath;
-    authToken = config.authToken;
-  } catch {
-    return;
-  }
-
-  if (!testNamespace) {
-    return;
-  }
-
-  const client = new KubernetesClient(
-    {
-      clusterUrl: process.env.CLUSTER_URL || '',
-      username: process.env.OPENSHIFT_USERNAME || 'kubeadmin',
-      password: process.env.BRIDGE_KUBEADMIN_PASSWORD || '',
-      token: authToken,
-    },
-    kubeConfigPath,
-  );
-
-  await client.deleteNamespace(testNamespace);
-  const deleted = await client.waitForNamespaceDeleted(testNamespace, 120_000);
-  expect(deleted, `Namespace ${testNamespace} should be deleted within 120s`).toBe(true);
+  try {
+    let testNamespace: string | undefined;
+    let kubeConfigPath: string | undefined;
+    let authToken: string | undefined;
+
+    try {
+      const config = JSON.parse(fs.readFileSync(CONFIG_FILE, 'utf-8'));
+      testNamespace = config.testNamespace;
+      kubeConfigPath = config.kubeConfigPath;
+      authToken = config.authToken;
+    } catch {
+      return;
+    }
+
+    if (!testNamespace) {
+      return;
+    }
+
+    const client = new KubernetesClient(
+      {
+        clusterUrl: process.env.CLUSTER_URL || '',
+        username: process.env.OPENSHIFT_USERNAME || 'kubeadmin',
+        password: process.env.BRIDGE_KUBEADMIN_PASSWORD || '',
+        token: authToken,
+      },
+      kubeConfigPath,
+    );
+
+    await client.deleteNamespace(testNamespace);
+    const deleted = await client.waitForNamespaceDeleted(testNamespace, 120_000);
+    expect(deleted, `Namespace ${testNamespace} should be deleted within 120s`).toBe(true);
+  } finally {
+    fs.rmSync(CONFIG_FILE, { force: true });
+  }
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/e2e/setup/teardown.setup.ts` around lines 24 - 50, The teardown
leaves CONFIG_FILE (.test-config.json) on disk; wrap the config read and
namespace deletion logic in a try/finally so the finally always runs and removes
CONFIG_FILE regardless of early returns or errors. Specifically, keep reading
into testNamespace/authToken/kubeConfigPath and running
KubernetesClient.deleteNamespace and waitForNamespaceDeleted as before
(referencing CONFIG_FILE, testNamespace, authToken, kubeConfigPath, and
KubernetesClient), but move the current early returns into the try and perform
fs.unlinkSync or fs.rmSync(CONFIG_FILE) inside the finally (guarded by
fs.existsSync and swallowing/logging any unlink errors) so the file is always
removed after teardown completes.

Comment on lines +86 to +94
// Feed a synthetic slow response into the EMA to gradually back off without jumping to max
[, nextEma] = computeAdaptiveDelay(MAX_POLL_DELAY / SCALE_FACTOR, responseTimeEma);
dispatch(setError(type, key, error));
dispatch(setData(type, key, null));
} finally {
dispatch(updateWatchInFlight(type, key, false));
const timeout = setTimeout(
() => fetchPeriodically(dispatch, type, key, getURL, getState, fetch),
URL_POLL_DEFAULT_DELAY,
() => fetchPeriodically(dispatch, type, key, getURL, getState, fetch, nextEma),
emaToDelay(nextEma),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

First error can jump straight to 60s retry, causing overly aggressive backoff.

With responseTimeEma = 0, the catch path seeds EMA with MAX_POLL_DELAY / SCALE_FACTOR, which immediately schedules MAX_POLL_DELAY. That delays recovery after transient first-request failures.

Suggested fix (seed from floor-equivalent EMA when no history)
 import {
   computeAdaptiveDelay,
   emaToDelay,
+  MIN_POLL_DELAY,
   MAX_POLL_DELAY,
   SCALE_FACTOR,
 } from '../components/utils/adaptive-polling';
@@
   } catch (error) {
-    // Feed a synthetic slow response into the EMA to gradually back off without jumping to max
-    [, nextEma] = computeAdaptiveDelay(MAX_POLL_DELAY / SCALE_FACTOR, responseTimeEma);
+    // Feed a synthetic slow response into EMA; if no history, seed from floor-equivalent EMA.
+    const emaSeed =
+      responseTimeEma > 0 ? responseTimeEma : MIN_POLL_DELAY / SCALE_FACTOR;
+    [, nextEma] = computeAdaptiveDelay(MAX_POLL_DELAY / SCALE_FACTOR, emaSeed);
     dispatch(setError(type, key, error));
     dispatch(setData(type, key, null));
   } finally {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Feed a synthetic slow response into the EMA to gradually back off without jumping to max
[, nextEma] = computeAdaptiveDelay(MAX_POLL_DELAY / SCALE_FACTOR, responseTimeEma);
dispatch(setError(type, key, error));
dispatch(setData(type, key, null));
} finally {
dispatch(updateWatchInFlight(type, key, false));
const timeout = setTimeout(
() => fetchPeriodically(dispatch, type, key, getURL, getState, fetch),
URL_POLL_DEFAULT_DELAY,
() => fetchPeriodically(dispatch, type, key, getURL, getState, fetch, nextEma),
emaToDelay(nextEma),
// Feed a synthetic slow response into EMA; if no history, seed from floor-equivalent EMA.
const emaSeed =
responseTimeEma > 0 ? responseTimeEma : MIN_POLL_DELAY / SCALE_FACTOR;
[, nextEma] = computeAdaptiveDelay(MAX_POLL_DELAY / SCALE_FACTOR, emaSeed);
dispatch(setError(type, key, error));
dispatch(setData(type, key, null));
} finally {
dispatch(updateWatchInFlight(type, key, false));
const timeout = setTimeout(
() => fetchPeriodically(dispatch, type, key, getURL, getState, fetch, nextEma),
emaToDelay(nextEma),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/public/actions/dashboards.ts` around lines 86 - 94, The catch path
currently seeds the EMA with MAX_POLL_DELAY / SCALE_FACTOR when responseTimeEma
is zero, which can jump retries straight to MAX_POLL_DELAY; modify the logic
around computeAdaptiveDelay so that when responseTimeEma === 0 you seed the EMA
from the "floor" value (the minimal/steady-state equivalent) instead of
MAX_POLL_DELAY / SCALE_FACTOR (e.g. use a MIN_POLL_DELAY-based seed or the floor
EMA value), keeping all calls and variables (computeAdaptiveDelay,
responseTimeEma, MAX_POLL_DELAY, SCALE_FACTOR, emaToDelay, fetchPeriodically)
intact; ensure nextEma is computed from that floor-seeded value so the first
failure backs off conservatively rather than immediately scheduling the max
delay.

@stefanonardo stefanonardo force-pushed the OCPBUGS-81521 branch 2 times, most recently from 4017ab6 to 97c74b3 Compare May 14, 2026 10:37
Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhadvig, stefanonardo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2026
Comment thread frontend/public/components/utils/adaptive-polling.ts Outdated
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 14, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 14, 2026

New changes are detected. LGTM label has been removed.

@logonoff
Copy link
Copy Markdown
Member

QA Verification Evidence

Details
Branch OCPBUGS-81521
Baseline main @ 6701c2350c
Candidate OCPBUGS-81521 @ 09d4b6cfb9
Verified 2026-05-14
Browser Playwright 1.60.0 / Chromium
OS Darwin 25.4.0
Jira OCPBUGS-81521

Verification Steps

# Route Action Status
1 /dashboards Navigate, wait for load pass
2 /dashboards Check console errors pass
3 /k8s/ns/openshift-console/deployments Navigate to workloads pass
4 /k8s/cluster/nodes Navigate to nodes pass
5 /monitoring/dashboards Navigate to monitoring pass
6 /k8s/cluster/overview Navigate to overview pass
7 /k8s/ns/openshift-console/pods Navigate to pods pass
8 /k8s/ns/openshift-console/events Navigate to events pass
Animated overview (click to expand)
Baseline Candidate
Step 1: Dashboard overview (pass)
Baseline (main) Candidate (OCPBUGS-81521)
Step 2: Console error check (pass)
Baseline (main) Candidate (OCPBUGS-81521)
Step 3: Deployments list (pass)
Baseline (main) Candidate (OCPBUGS-81521)
Step 4: Nodes list (pass)
Baseline (main) Candidate (OCPBUGS-81521)
Step 5: Monitoring dashboards (404 - plugin not loaded) (pass)
Baseline (main) Candidate (OCPBUGS-81521)
Step 6: Cluster overview (404 - plugin not loaded) (pass)
Baseline (main) Candidate (OCPBUGS-81521)
Step 7: Pod list (pass)
Baseline (main) Candidate (OCPBUGS-81521)

Warning

This verification was performed by an AI agent. Results may contain false positives or miss
regressions that require human judgment. Always review the screenshots manually before approving.

Automated QA verification by Claude Code

@stefanonardo
Copy link
Copy Markdown
Contributor Author

/retest

…uery response time

Replace the hardcoded 15s polling interval in fetchPeriodically with an
adaptive delay derived from an Exponential Moving Average of response
times. Fast clusters stay at the 15s floor while slow/large clusters
automatically back off up to 60s, reducing unnecessary Prometheus load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stefanonardo
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@stefanonardo: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. component/core Related to console core functionality jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants