Skip to content

Add prometheus-exporter contrib#104

Open
IsaiahStapleton wants to merge 1 commit intoopenshift-psap:mainfrom
IsaiahStapleton:prometheus-exporter
Open

Add prometheus-exporter contrib#104
IsaiahStapleton wants to merge 1 commit intoopenshift-psap:mainfrom
IsaiahStapleton:prometheus-exporter

Conversation

@IsaiahStapleton
Copy link
Copy Markdown
Contributor

@IsaiahStapleton IsaiahStapleton commented Mar 27, 2026

SEE README.md. This tool automatically discovers KServe InferenceService models in a Kubernetes/OpenShift cluster, runs load tests against them, and exports the results as Prometheus metrics.

This was a tool that was originally created by me to be run against models deployed in the Mass Open Cloud (MOC) environment: https://github.com/IsaiahStapleton/llm-load-test-exporter.

Summary by CodeRabbit

  • New Features
    • Added a Prometheus metrics exporter for LLM load testing, enabling real-time monitoring of model performance metrics including latency percentiles, throughput, and token statistics.
    • Added a Grafana dashboard for visualizing LLM model performance and GPU health metrics.
    • Added Kubernetes deployment manifests with automatic KServe model discovery and native Prometheus integration.

SEE README.md. This tool automatically discovers KServe InferenceService models in a Kubernetes/OpenShift cluster, runs load tests against them, and exports the results as Prometheus metrics.

Signed-off-by: IsaiahStapleton <istaplet@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

📝 Walkthrough

Walkthrough

This PR introduces a complete Prometheus metrics exporter system for llm-load-test. It includes Kubernetes manifests defining a Deployment with two cooperating containers (runner and exporter), a Python exporter application serving metrics, a runner process that discovers KServe models and executes load tests, a Grafana dashboard, container build configurations, and supporting configuration and dataset files.

Changes

Cohort / File(s) Summary
Documentation
contrib/prometheus-exporter/README.md
Architecture documentation describing the runner/exporter container design, metrics enumeration, prerequisites, deployment procedure, and sample metric output.
Kubernetes RBAC
contrib/prometheus-exporter/base/clusterrole.yaml, contrib/prometheus-exporter/base/clusterrolebinding.yaml, contrib/prometheus-exporter/base/serviceaccount.yaml
RBAC configuration granting the exporter ServiceAccount cluster-wide read/list access to pods, services, and secrets for model discovery and authentication token retrieval.
Kubernetes Core Resources
contrib/prometheus-exporter/base/deployment.yaml, contrib/prometheus-exporter/base/service.yaml, contrib/prometheus-exporter/base/servicemonitor.yaml
Deployment defining runner and exporter containers sharing an emptyDir volume; Service exposing metrics port 8080; ServiceMonitor configuring Prometheus scraping at 120s intervals.
Kustomize & Configuration
contrib/prometheus-exporter/base/kustomization.yaml, contrib/prometheus-exporter/base/files/llm-load-test-config.env, contrib/prometheus-exporter/base/files/uwl_metrics_list.yaml
Kustomize base configuration with namespace and labels; load-test parameter environment variables; Prometheus metrics allowlist regex.
Exporter Application
contrib/prometheus-exporter/exporter/exporter.py, contrib/prometheus-exporter/exporter/wsgi.py, contrib/prometheus-exporter/exporter/requirements.txt, contrib/prometheus-exporter/exporter/Containerfile
Python Flask application reading JSON results from shared volume and exposing Prometheus metrics (timing percentiles, throughput, requests, failures) labeled by model and namespace; WSGI entrypoint and gunicorn container build.
Runner Application
contrib/prometheus-exporter/runner/runner.py, contrib/prometheus-exporter/runner/requirements.txt, contrib/prometheus-exporter/runner/Containerfile
Long-running process discovering KServe InferenceService pods, determining model endpoints and authentication, executing llm-load-test CLI per model, and writing JSON results to shared volume; container build configuration.
Datasets & Dashboards
contrib/prometheus-exporter/runner/datasets/dataset.jsonl, contrib/prometheus-exporter/grafana/grafana-llm-load-test-dashboard.json
JSONL dataset with 50 synthetic test records; Grafana dashboard visualizing model latency percentiles, throughput, token metrics, GPU metrics, and node health status.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Runner Container
    participant K8sAPI as Kubernetes API
    participant Secret as Secrets Store
    participant LLM as Load Test CLI
    participant Volume as Shared Volume
    participant Exporter as Exporter Container
    participant Prom as Prometheus

    Runner->>K8sAPI: List pods with<br/>serving.kserve.io/inferenceservice label
    K8sAPI-->>Runner: Pod list with gather_llm_metrics labels
    
    loop For each eligible model
        Runner->>K8sAPI: Resolve service endpoint<br/>for model
        K8sAPI-->>Runner: Service URL + port info
        
        alt Auth enabled on pod
            Runner->>Secret: Retrieve bearer token<br/>from service account secret
            Secret-->>Runner: Auth token
        end
        
        Runner->>LLM: Execute load-test<br/>with model config
        LLM-->>LLM: Run load tests
        LLM-->>Volume: Write JSON results<br/>{model}_{namespace}.json
    end
    
    Runner->>Volume: Clean stale JSON files
    
    par Metrics Export
        Exporter->>Volume: Read JSON result files
        Volume-->>Exporter: Model latency/throughput data
        Exporter->>Exporter: Parse & update<br/>Prometheus Gauges
        Exporter-->>Prom: /metrics endpoint<br/>(model, namespace labels)
    and Prometheus Scraping
        Prom->>Exporter: GET /metrics<br/>every 120s
        Exporter-->>Prom: Metric samples
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whiskers twitching with delight,
A metrics garden, shining bright!
Models run, their stories flow,
Prometheus watches them all glow.
Load tests dance, the dashboards sing,
LLM metrics—what joy they bring! 📊

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add prometheus-exporter contrib' directly and clearly summarizes the main change: addition of a new prometheus-exporter contribution tool.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
contrib/prometheus-exporter/exporter/Containerfile (1)

1-12: Consider running as a non-root user for security hardening.

The container runs as root by default. For better security posture, especially in Kubernetes/OpenShift environments, create and switch to a non-root user.

🛡️ Proposed fix to add non-root user
 FROM python:3.12-slim
 
 WORKDIR /app
 
+RUN useradd --create-home --shell /bin/bash appuser
+
 COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt
 
 COPY exporter.py wsgi.py ./
 
+RUN chown -R appuser:appuser /app
+USER appuser
+
 EXPOSE 8080
 
 CMD ["gunicorn", "wsgi:app", "--log-level=info", "--workers", "2", "--bind", "0.0.0.0:8080"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/exporter/Containerfile` around lines 1 - 12, The
image runs as root; add a non-root user and switch to it: create a system
group/user (e.g., "app"), ensure /app is owned by that user (chown) after
copying files and installing deps, and then set USER app before the CMD so
gunicorn runs unprivileged; keep WORKDIR, EXPOSE and CMD unchanged but perform
pip install as root (or use a temporary root step) then drop privileges by
switching to the created user (reference symbols: WORKDIR /app, COPY, RUN pip
install, and CMD ["gunicorn", "wsgi:app"...], and add USER <name>).
contrib/prometheus-exporter/exporter/requirements.txt (1)

1-3: Consider pinning exact versions for reproducible container builds.

The >= constraints allow flexibility but can lead to non-reproducible builds when pip install runs during container builds. While this pattern is consistent with other contrib tools in the repository, pinning exact versions or adding a lock file would ensure the container image remains identical across rebuilds and deployments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/exporter/requirements.txt` around lines 1 - 3,
Replace the floating `>=` version constraints in
contrib/prometheus-exporter/exporter/requirements.txt with exact pinned versions
(e.g., set flask, prometheus-client, gunicorn to specific versions) or
alternatively add a generated lock file (pip-tools requirements.txt/.lock or
pipfile.lock) and update CI/container build steps to install from that lock file
so container builds are reproducible; target the package lines for flask,
prometheus-client, and gunicorn in the file when making the change.
contrib/prometheus-exporter/runner/datasets/dataset.jsonl (1)

1-1: Update the license placeholder values.

The MIT license text contains [year] and [fullname] placeholders that should be replaced with actual values for proper attribution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/runner/datasets/dataset.jsonl` at line 1, The
license field in the dataset entry contains placeholder tokens "[year]" and
"[fullname]" which must be replaced with real attribution values; update the
"license" string in the JSON object (the entry with "name": "synthetic-data") to
substitute [year] with the current copyright year and [fullname] with the
project or author name so the MIT license text is complete and accurate.
contrib/prometheus-exporter/base/deployment.yaml (2)

14-66: Consider adding security hardening to container specs.

Static analysis flags missing security context settings. While not blocking for a contrib tool, adding these settings would improve security posture and serve as a good example for users customizing this deployment.

🔒 Suggested security context additions
     spec:
       containers:
         - name: exporter
           image: quay.io/openshift-psap/llm-load-test-exporter:latest
           imagePullPolicy: Always
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            capabilities:
+              drop:
+                - ALL
           ports:
             - name: web
               containerPort: 8080

Apply the same securityContext block to the runner container as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/base/deployment.yaml` around lines 14 - 66, Add
container-level security hardening to both containers ("exporter" and "runner"):
set securityContext with runAsNonRoot: true and a non-root runAsUser (e.g.,
1000), set allowPrivilegeEscalation: false, set readOnlyRootFilesystem: true
where feasible, and drop all capabilities (capabilities.drop: ["ALL"]); also
consider adding a minimal podSecurityContext (fsGroup/runAsUser) consistent with
the container runAsUser and ensure the serviceAccountName (llm-load-test-sa) has
no elevated permissions. Update the "exporter" container block to include these
securityContext settings and apply the same securityContext to the "runner"
container.

17-17: Consider pinning image tags instead of using :latest.

Using :latest tags can lead to unpredictable deployments when images are updated. For reproducibility, consider using specific version tags or SHA digests, especially when documenting deployment instructions.

Also applies to: 48-48

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/base/deployment.yaml` at line 17, The deployment
uses an unpinned image tag
"quay.io/openshift-psap/llm-load-test-exporter:latest" which can cause
unpredictable deployments; update the image field(s) (e.g., the "image:" entries
for the container in the Deployment manifest and the other occurrence noted) to
a specific version tag or an immutable SHA digest (quay pullspec@sha256:...) so
deployments are reproducible and stable.
contrib/prometheus-exporter/README.md (1)

94-188: Add a language specifier to the fenced code block.

The example metrics output block lacks a language specifier. Use text or promql to satisfy the linter and improve syntax highlighting.

✏️ Suggested fix
-```
+```text
 # HELP llm_load_test_tpot_mean_ms Mean Time Per Output Token (ms)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/README.md` around lines 94 - 188, The fenced
example metrics block in contrib/prometheus-exporter/README.md lacks a language
specifier, causing linter/syntax-highlighting issues; update the opening
triple-backticks for that example metrics block to include a language (e.g., add
"text" or "promql") so the block becomes ```text and the linter stops
complaining and highlighting works correctly.
contrib/prometheus-exporter/exporter/exporter.py (1)

131-132: Use public clear() method instead of accessing private _metrics attribute.

The prometheus_client.Gauge class provides a documented clear() method that safely empties all labelsets. Change gauge._metrics.clear() to gauge.clear() to avoid relying on implementation details that may change across library versions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/exporter/exporter.py` around lines 131 - 132,
Replace direct access to the private attribute with the public API: instead of
calling gauge._metrics.clear() in the loop over ALL_GAUGES, call gauge.clear()
so you use the documented prometheus_client.Gauge.clear() method; update the
loop that iterates ALL_GAUGES and each gauge variable to invoke clear()
(referencing ALL_GAUGES and the Gauge instances) to avoid relying on the private
_metrics attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@contrib/prometheus-exporter/runner/runner.py`:
- Around line 198-249: discover_and_test_models currently iterates pods and runs
load tests per pod which causes duplicate tests for the same (model_name,
namespace); deduplicate by tracking seen (model_name, namespace) pairs before
building cfg and calling run_load_test. Modify discover_and_test_models to
maintain a set (e.g., seen_models) of tuples (model_name, namespace), skip
processing if the tuple is already present, and only add to active_files and
call get_auth_token, build_config, and run_load_test for the first occurrence;
keep existing logging and fallback URL logic unchanged.

---

Nitpick comments:
In `@contrib/prometheus-exporter/base/deployment.yaml`:
- Around line 14-66: Add container-level security hardening to both containers
("exporter" and "runner"): set securityContext with runAsNonRoot: true and a
non-root runAsUser (e.g., 1000), set allowPrivilegeEscalation: false, set
readOnlyRootFilesystem: true where feasible, and drop all capabilities
(capabilities.drop: ["ALL"]); also consider adding a minimal podSecurityContext
(fsGroup/runAsUser) consistent with the container runAsUser and ensure the
serviceAccountName (llm-load-test-sa) has no elevated permissions. Update the
"exporter" container block to include these securityContext settings and apply
the same securityContext to the "runner" container.
- Line 17: The deployment uses an unpinned image tag
"quay.io/openshift-psap/llm-load-test-exporter:latest" which can cause
unpredictable deployments; update the image field(s) (e.g., the "image:" entries
for the container in the Deployment manifest and the other occurrence noted) to
a specific version tag or an immutable SHA digest (quay pullspec@sha256:...) so
deployments are reproducible and stable.

In `@contrib/prometheus-exporter/exporter/Containerfile`:
- Around line 1-12: The image runs as root; add a non-root user and switch to
it: create a system group/user (e.g., "app"), ensure /app is owned by that user
(chown) after copying files and installing deps, and then set USER app before
the CMD so gunicorn runs unprivileged; keep WORKDIR, EXPOSE and CMD unchanged
but perform pip install as root (or use a temporary root step) then drop
privileges by switching to the created user (reference symbols: WORKDIR /app,
COPY, RUN pip install, and CMD ["gunicorn", "wsgi:app"...], and add USER
<name>).

In `@contrib/prometheus-exporter/exporter/exporter.py`:
- Around line 131-132: Replace direct access to the private attribute with the
public API: instead of calling gauge._metrics.clear() in the loop over
ALL_GAUGES, call gauge.clear() so you use the documented
prometheus_client.Gauge.clear() method; update the loop that iterates ALL_GAUGES
and each gauge variable to invoke clear() (referencing ALL_GAUGES and the Gauge
instances) to avoid relying on the private _metrics attribute.

In `@contrib/prometheus-exporter/exporter/requirements.txt`:
- Around line 1-3: Replace the floating `>=` version constraints in
contrib/prometheus-exporter/exporter/requirements.txt with exact pinned versions
(e.g., set flask, prometheus-client, gunicorn to specific versions) or
alternatively add a generated lock file (pip-tools requirements.txt/.lock or
pipfile.lock) and update CI/container build steps to install from that lock file
so container builds are reproducible; target the package lines for flask,
prometheus-client, and gunicorn in the file when making the change.

In `@contrib/prometheus-exporter/README.md`:
- Around line 94-188: The fenced example metrics block in
contrib/prometheus-exporter/README.md lacks a language specifier, causing
linter/syntax-highlighting issues; update the opening triple-backticks for that
example metrics block to include a language (e.g., add "text" or "promql") so
the block becomes ```text and the linter stops complaining and highlighting
works correctly.

In `@contrib/prometheus-exporter/runner/datasets/dataset.jsonl`:
- Line 1: The license field in the dataset entry contains placeholder tokens
"[year]" and "[fullname]" which must be replaced with real attribution values;
update the "license" string in the JSON object (the entry with "name":
"synthetic-data") to substitute [year] with the current copyright year and
[fullname] with the project or author name so the MIT license text is complete
and accurate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36352909-cef1-467d-bf2e-86f67ef75f9a

📥 Commits

Reviewing files that changed from the base of the PR and between b27aad1 and 1a00118.

📒 Files selected for processing (19)
  • contrib/prometheus-exporter/README.md
  • contrib/prometheus-exporter/base/clusterrole.yaml
  • contrib/prometheus-exporter/base/clusterrolebinding.yaml
  • contrib/prometheus-exporter/base/deployment.yaml
  • contrib/prometheus-exporter/base/files/llm-load-test-config.env
  • contrib/prometheus-exporter/base/files/uwl_metrics_list.yaml
  • contrib/prometheus-exporter/base/kustomization.yaml
  • contrib/prometheus-exporter/base/service.yaml
  • contrib/prometheus-exporter/base/serviceaccount.yaml
  • contrib/prometheus-exporter/base/servicemonitor.yaml
  • contrib/prometheus-exporter/exporter/Containerfile
  • contrib/prometheus-exporter/exporter/exporter.py
  • contrib/prometheus-exporter/exporter/requirements.txt
  • contrib/prometheus-exporter/exporter/wsgi.py
  • contrib/prometheus-exporter/grafana/grafana-llm-load-test-dashboard.json
  • contrib/prometheus-exporter/runner/Containerfile
  • contrib/prometheus-exporter/runner/datasets/dataset.jsonl
  • contrib/prometheus-exporter/runner/requirements.txt
  • contrib/prometheus-exporter/runner/runner.py

Comment on lines +198 to +249
def discover_and_test_models() -> None:
"""Discover KServe InferenceService models and run load tests."""
try:
model_pods = v1.list_pod_for_all_namespaces(
label_selector="serving.kserve.io/inferenceservice"
)
except Exception as exc:
LOG.error("Failed to list model pods: %s", exc)
return

active_files: set[str] = set()

for pod in model_pods.items:
model_name = pod.metadata.labels.get(
"serving.kserve.io/inferenceservice", "unknown"
)
namespace = pod.metadata.namespace

# Only test pods that are Running and opted-in
gather = pod.metadata.labels.get("gather_llm_metrics")
if pod.status.phase != "Running" or not gather:
LOG.debug(
"Skipping %s/%s (phase=%s, gather_llm_metrics=%s)",
namespace, model_name, pod.status.phase, gather,
)
continue

active_files.add(f"{model_name}_{namespace}.json")

# Check if token auth is required
annotations = pod.metadata.annotations or {}
enable_auth = (
annotations.get("security.opendatahub.io/enable-auth") == "true"
)
auth_token = get_auth_token(model_name, namespace) if enable_auth else None

host_url = _discover_service_url(model_name, namespace)
if host_url is None:
host_url = f"https://{model_name}.{namespace}.svc.cluster.local"
LOG.warning("No predictor service found for %s/%s, falling back to %s",
namespace, model_name, host_url)

LOG.info("Running load test for model %s in namespace %s (url=%s)",
model_name, namespace, host_url)

cfg = build_config(model_name, host_url, namespace, auth_token)
run_load_test(cfg)

LOG.info("Completed load test for model %s in namespace %s",
model_name, namespace)

_remove_stale_files(active_files)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duplicate load tests run for models with multiple replicas.

The code iterates over all pods matching the serving.kserve.io/inferenceservice label. If an InferenceService has multiple replicas, each replica pod triggers a separate load test, all writing to the same output file. This wastes resources and produces non-deterministic results.

Deduplicate by (model_name, namespace) before running tests.

🔧 Suggested fix
 def discover_and_test_models() -> None:
     """Discover KServe InferenceService models and run load tests."""
     try:
         model_pods = v1.list_pod_for_all_namespaces(
             label_selector="serving.kserve.io/inferenceservice"
         )
     except Exception as exc:
         LOG.error("Failed to list model pods: %s", exc)
         return

     active_files: set[str] = set()
+    seen_models: set[tuple[str, str]] = set()

     for pod in model_pods.items:
         model_name = pod.metadata.labels.get(
             "serving.kserve.io/inferenceservice", "unknown"
         )
         namespace = pod.metadata.namespace

         # Only test pods that are Running and opted-in
         gather = pod.metadata.labels.get("gather_llm_metrics")
         if pod.status.phase != "Running" or not gather:
             LOG.debug(
                 "Skipping %s/%s (phase=%s, gather_llm_metrics=%s)",
                 namespace, model_name, pod.status.phase, gather,
             )
             continue

+        # Skip if we've already processed this model
+        model_key = (model_name, namespace)
+        if model_key in seen_models:
+            LOG.debug("Skipping duplicate pod for %s/%s", namespace, model_name)
+            continue
+        seen_models.add(model_key)
+
         active_files.add(f"{model_name}_{namespace}.json")
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 204-204: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/prometheus-exporter/runner/runner.py` around lines 198 - 249,
discover_and_test_models currently iterates pods and runs load tests per pod
which causes duplicate tests for the same (model_name, namespace); deduplicate
by tracking seen (model_name, namespace) pairs before building cfg and calling
run_load_test. Modify discover_and_test_models to maintain a set (e.g.,
seen_models) of tuples (model_name, namespace), skip processing if the tuple is
already present, and only add to active_files and call get_auth_token,
build_config, and run_load_test for the first occurrence; keep existing logging
and fallback URL logic unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant