[Feature Idea] Generic Prometheus Metric Plugin for Dynamic Weigher/Filter Logic

_While working on energy awareness the following idea came up._
### Description

**Summary**

Add support for a "Generic Metric" type that allows users to define complex Prometheus queries directly in the Datasource CR and use the results for weighing or filtering hosts during scheduling decisions. This would enable building weigher/filter logic purely in Prometheus queries without requiring custom Go code for each new metric.

---

### Motivation

Currently, adding a new Prometheus-based weigher or filter requires:
1. Creating a new typed metric struct in `internal/knowledge/datasources/plugins/prometheus/types.go`
2. Implementing a new extractor in `internal/knowledge/extractor/plugins/`
3. Implementing a new weigher/filter in `internal/scheduling/<domain>/plugins/weighers/`
4. Registering the new plugin in the index

This is a significant amount of boilerplate for what is sometimes a simple "query Prometheus -> map to hosts -> apply weight/filter" pattern. A generic implementation would allow operators to:
- Rapidly prototype new scheduling heuristics
- Use complex PromQL aggregations without code changes
- Experiment with different metrics without redeploying cortex

---

### Use Case Examples

#### Example 1: Weigher - Prefer Hosts with Low CPU Usage

**Goal:** Prefer hosts where the CPU is mostly idle (high idle ratio).

**Prometheus Query:**
```promql
1 - (
  avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))
  /
  avg by (instance) (rate(node_cpu_seconds_total[5m]))
)
```

This returns a value between 0 and 1, where 1 = fully idle, 0 = fully utilized.

**Desired Behavior:**
- Higher values (more idle) -> higher weight
- This would be configured as a weigher in the pipeline

**Example Datasource CR:**
```yaml
apiVersion: cortex.cloud/v1alpha1
kind: Datasource
metadata:
  name: node-cpu-idle-ratio
spec:
  schedulingDomain: nova
  type: prometheus
  prometheus:
    query: |
      1 - (
        avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))
        /
        avg by (instance) (rate(node_cpu_seconds_total[5m]))
      )
    alias: node_cpu_idle_ratio
    type: generic  # new
```

#### Example 2: Filter - Exclude Hosts Under IRQ Pressure

**Goal:** Filter out hosts experiencing significant IRQ stall pressure.

**Prometheus Query:**
```promql
rate(node_pressure_irq_stalled_seconds_total[5m]) > bool 0.005
```

This returns `1` (true) for hosts under pressure, `0` (false) for healthy hosts.

**Desired Behavior:**
- Hosts returning `1` -> filtered out (excluded from scheduling)
- Hosts returning `0` -> kept as valid candidates

**Example Datasource CR:**
```yaml
apiVersion: cortex.cloud/v1alpha1
kind: Datasource
metadata:
  name: node-irq-pressure-filter
spec:
  schedulingDomain: nova
  type: prometheus
  prometheus:
    query: |
      rate(node_pressure_irq_stalled_seconds_total[5m]) > 0.005
    alias: node_irq_pressure
    type: generic  # new
```

---

### Key Challenge: Label-to-Subject Mapping

Prometheus metrics use labels like `instance`, `node`, `host` or custom labels to identify targets. However Nova (and other OpenStack services) use their own naming conventions for compute hosts (e.g., `compute_host` from openstack API).

**Example Mismatch:**
| Prometheus Label | Nova Compute Host |
|------------------|-------------------|
| `instance="10.0.0.5:9100"` | `compute-node-01` |
| `node="worker-1.internal"` | `nova-compute-worker-1` |
| `hostsystem="esxi-host-42.vcenter.local"` | `vc-a-0-runq42` |

**Current State:**
Cortex already solves this for vROps metrics using mapping Knowledges like `vmware-resolved-hostsystems` that translate vROps hostsystem names to Nova compute hosts.

**Needed Solution:**
The generic metric implementation needs a flexible way to map Prometheus labels to scheduling subjects. Some ideas:

1. **Direct Mapping:** Label value directly matches subject name (simplest case)
2. **Mapping Knowledge Reference:** Reference an existing Knowledge CR that contains the label-to-subject mapping
3. **Label Transformation Template:** Apply a transformation (e.g., strip suffix, regex extract) [text/template](https://pkg.go.dev/text/template)

---

### Proposed API (Ideas Welcome!)

```yaml
apiVersion: cortex.cloud/v1alpha1
kind: Pipeline
metadata:
  name: nova-external-scheduler
spec:
  schedulingDomain: nova
  type: filter-weigher
  filters:
  - name: generic
    params:
      datasource: node-irq-pressure-filter
      mapping:
        # Option A: Label value directly matches subject name
        label: "instance"
        
        # Option B: Reference an existing mapping Knowledge CR
        knowledgeRef: "prometheus-to-nova-mapping"
        
        # Option C: Apply a transformation template to the label
        transform: "{{ trimSuffix \":9100\" .instance }}"
  weighers:
  - name: generic
    weight: 1.0
    params:
      datasource: node-cpu-idle-ratio
      mapping:
        knowledgeRef: "prometheus-to-nova-mapping"
```

---

### Questions for Discussion

0. **General Interest**: Do you consider this feature broadly useful? In my case, it would simplify energy-aware weighing.
1. **Mapping Strategy:** Which approach (or combination) for label-to-subject mapping makes the most sense?
2. **Query Execution:** Should the generic metric always be evaluated as an instant query (single point in time), with all temporal logic expressed in PromQL itself?
3. **Knowledge Integration** The current workflow is build around Knowledge CR. Should we use a Knowledge CR for each Datasource, or a single shared one?

Prometheus Label	Nova Compute Host
`instance="10.0.0.5:9100"`	`compute-node-01`
`node="worker-1.internal"`	`nova-compute-worker-1`
`hostsystem="esxi-host-42.vcenter.local"`	`vc-a-0-runq42`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Idea] Generic Prometheus Metric Plugin for Dynamic Weigher/Filter Logic #497

Description

Motivation

Use Case Examples

Example 1: Weigher - Prefer Hosts with Low CPU Usage

Example 2: Filter - Exclude Hosts Under IRQ Pressure

Key Challenge: Label-to-Subject Mapping

Proposed API (Ideas Welcome!)

Questions for Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Idea] Generic Prometheus Metric Plugin for Dynamic Weigher/Filter Logic #497

Description

Description

Motivation

Use Case Examples

Example 1: Weigher - Prefer Hosts with Low CPU Usage

Example 2: Filter - Exclude Hosts Under IRQ Pressure

Key Challenge: Label-to-Subject Mapping

Proposed API (Ideas Welcome!)

Questions for Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions