infrawatch · ayefimov-1 · Feb 3, 2026 · Apr 16, 2026 · Apr 20, 2026 · Feb 3, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
 *.pyc
 .idea/
+.ansible/
diff --git a/roles/telemetry_chargeback/.gitignore b/roles/telemetry_chargeback/.gitignore
@@ -0,0 +1 @@
+.ansible/
diff --git a/roles/telemetry_chargeback/README.md b/roles/telemetry_chargeback/README.md
@@ -1,11 +1,11 @@
 telemetry_chargeback
-=========
+
 The **`telemetry_chargeback`** role is designed to test the **RHOSO Cloudkitty** feature. These tests are specific to the Cloudkitty feature. Tests that are not specific to this feature (e.g., standard OpenStack deployment validation, basic networking) should be added to a common role.
 
 The role performs two main functions:
 
 1. **CloudKitty Validation** - Enables and configures the CloudKitty hashmap rating module, then validates its state.
-2. **Synthetic Data Generation** - Generates synthetic Loki log data for testing chargeback scenarios using a Python script and Jinja2 template.
+2. **Synthetic Data Generation & Analysis** - Generates synthetic Loki log data for testing chargeback scenarios and calculates metric totals. The role automatically discovers and processes all scenario files matching `test_*.yml` in the `files/` directory. For each scenario it runs: generate synthetic data, compute syn-totals, ingest to Loki, flush Loki ingester memory, and get cost via CloudKitty rating summary (using begin/end from syn-totals). Retrieve-from-Loki is included in the load_loki_data flow. After all scenarios, the role runs cleanup (`cleanup_ck.yml`) to remove the local flush cert directory.
 
 Requirements
 ------------
@@ -15,48 +15,113 @@ It relies on the following being available on the target or control host:
 * The **OpenStack CLI client** must be installed and configured with administrative credentials.
 * Required Python libraries for the `openstack` CLI (e.g., `python3-openstackclient`).
 * Connectivity to the OpenStack API endpoint.
-* **Python 3** with the following libraries for synthetic data generation:
+* **Python 3** with the following libraries for synthetic data generation and analysis:
   * `PyYAML`
   * `Jinja2`
 
 It is expected to be run **after** a successful deployment and configuration of the following components:
 
 * **OpenStack:** A functional OpenStack cloud (RHOSO) environment.
 * **Cloudkitty:** The Cloudkitty service must be installed, configured, and running.
+* **Loki / OpenShift (for ingest and flush):** When using ingest and flush tasks, the control host must have `oc` CLI access, and the Cloudkitty Loki stack (route, certificates, ingester) must be deployed. The role sets Loki push/query URLs and extracts certificates via `setup_loki_env.yml`.
 
 Role Variables
 --------------
 The role uses the following variables to control the testing environment and execution.
 
 ### User-Configurable Variables (defaults/main.yml)
 
+These variables can be overridden when importing the role or set at the play level. Users can customize these based on their deployment environment and test requirements.
+
 | Variable | Default Value | Description |
 |----------|---------------|-------------|
 | `openstack_cmd` | `openstack` | The command used to execute OpenStack CLI calls. This can be customized if the binary is not in the standard PATH. |
+| `cloudkitty_debug` | `false` | Enable debug mode for the role. |
+| `logs_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/logs` | Directory for log files. |
+| `artifacts_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/artifacts` | Directory for generated artifacts. |
+| `cert_dir` | `{{ ansible_user_dir }}/ck-certs` | Local directory for extracted ingest/query certs. |
+| `local_cert_dir` | `{{ ansible_env.HOME }}/ci-framework-data/flush_certs` | Local directory for flush certs (removed by cleanup_ck.yml after the run). |
+| `remote_cert_dir` | `osp-certs` | Directory inside the OpenStack pod for certs. |
+| `cert_secret_name` | `cert-cloudkitty-client-internal` | OpenShift secret name for client certificates. |
+| `client_secret` | `secret/cloudkitty-lokistack-gateway-client-http` | Secret for flush client certs. |
+| `ca_configmap` | `cm/cloudkitty-lokistack-ca-bundle` | ConfigMap for CA bundle. |
+| `logql_query` | `{service="cloudkitty"}` (overridable via `loki_query`) | LogQL query for Loki. |
+| `cloudkitty_namespace` | `openstack` | OpenShift namespace for Cloudkitty/Loki resources. |
+| `openstackpod` | `openstackclient` | OpenStack client pod name for exec/cp. |
+| `lookback` | `6` | Days lookback for Loki query time range. |
+| `limit` | `50` | Limit for Loki query results. |
+
+**Example: Overriding variables when importing the role**
+```yaml
+- name: "Run chargeback tests"
+  ansible.builtin.import_role:
+    name: telemetry_chargeback
+  vars:
+    cloudkitty_namespace: "my-custom-namespace"
+    lookback: 10
+    cloudkitty_debug: true
+```
 
-### Internal Variables (vars/main.yml)
+### Synthetic Data Scripts
 
-These variables are used internally by the role and typically do not need to be modified.
+**gen_synth_loki_data.py** — Generates Loki-format JSON from a scenario YAML and template. The role invokes it with `-r` so that timestamps in the output are in **reverse** order (youngest first, oldest last). When run manually you can omit `-r` for chronological order (oldest first, youngest last).
 
-| Variable | Default Value | Description |
-|----------|---------------|-------------|
-| `logs_dir_zuul` | `/home/zuul/ci-framework-data/logs` | Remote directory for log files. |
-| `artifacts_dir_zuul` | `/home/zuul/ci-framework-data/artifacts` | Directory for generated artifacts. |
-| `cloudkitty_synth_script` | `{{ role_path }}/files/gen_synth_loki_data.py` | Path to the synthetic data generation script. |
-| `cloudkitty_data_template` | `{{ role_path }}/templates/loki_data_templ.j2` | Path to the Jinja2 template for Loki data format. |
-| `ck_data_config` | `{{ role_path }}/files/test_static.yml` | Path to the scenario configuration file. |
-| `ck_output_file_local` | `{{ artifacts_dir_zuul }}/loki_synth_data.json` | Local path for generated synthetic data. |
-| `ck_output_file_remote` | `{{ logs_dir_zuul }}/gen_loki_synth_data.log` | Remote destination for synthetic data. |
+| Option | Description |
+|--------|--------------|
+| `--tmpl` | Path to the Jinja2 template (e.g. `loki_data_templ.j2`). |
+| `-t`, `--test` | Path to the scenario YAML (e.g. `test_dyn_basic.yml`). |
+| `-o`, `--output` | Path to the output JSON file. |
+| `-p`, `--project-id` | Optional; overrides `groupby.project_id` in every log entry. |
+| `-u`, `--user-id` | Optional; overrides `groupby.user_id` in every log entry. |
+| `-r`, `--reverse` | Reverse timestamp order in JSON output (youngest first, oldest last). |
+| `--debug` | Enable debug logging. |
+
+**gen_db_summary.py** — Parses Loki-style JSON (streams or `data.result`), sorts entries by timestamp, and writes a YAML summary. This script is invoked by the role for **both** synthetic totals (in `gen_synth_loki_data.yml`) and Loki-retrieved totals (in `retrieve_loki_data.yml`). It applies rate calculations with support for `factor`, `offset`, and `mutate` transformations.
+
+| Option | Description |
+|--------|--------------|
+| `-j`, `--json` | Path to the input JSON file (required). |
+| `-o`, `--output` | Path to the output YAML file (default: `<input_stem>_total.yml`). |
+| `--debug` | Directory to write debug output (`<stem>_diff.txt` with one `[ts,log]` JSON per line). |
+
+Output YAML structure:
+
+* **time** — `begin_step` / `end_step`, each with `nanosec` (nanosecond timestamp), `begin`, `end` (ISO window strings from the log payload). The `nanosec` values are used for Loki query time range in `retrieve_loki_data.yml`.
+* **data_log** — `total_timesteps`, `metrics_per_step`, `log_count`.
+* **rate** — `by_types` (per-type `Rate` calculated as `Σ((qty_mutated * factor + offset) * price)`) and `total.Rating` (sum of all rates).
+
+### Dynamically Set Variables
+
+Set in **main.yml** from the OpenStack CLI (`openstack project show admin` / `openstack user show admin`):
+
+| Variable | Description |
+|----------|-------------|
+| `cloudkitty_project_id` | ID of the OpenStack project named `admin` (empty string if not found). Passed as `-p` to the synthetic data generator when non-empty. |
+| `cloudkitty_user_id` | ID of the OpenStack user named `admin` (empty string if not found). Passed as `-u` to the synthetic data generator when non-empty. |
+
+Set in **gen_synth_loki_data.yml** for each scenario file during the loop:
+
+| Variable | Description |
+|----------|-------------|
+| `cloudkitty_data_file` | Local path for generated JSON data (`{{ artifacts_dir_zuul }}/{{ scenario_name }}-synth_data.json`) |
+| `cloudkitty_synth_totals_file` | Local path for calculated metric totals (`{{ artifacts_dir_zuul }}/{{ scenario_name }}-synth_metrics_summary.yml`) |
+| `cloudkitty_test_file` | Path to the scenario configuration file (`{{ role_path }}/files/{{ scenario_name }}.yml`) |
 
 Scenario Configuration
 ----------------------
-The synthetic data generation is controlled by a YAML configuration file (`files/test_static.yml`). This file defines:
+The synthetic data generation is controlled by YAML configuration files in the `files/` directory. Any file matching `test_*.yml` will be automatically discovered and processed. Files whose names start with an underscore (e.g. `_test_*.yml`) are **not** discovered by the role; they can be used as reference or for manual runs.
+
+Each scenario file defines:
+
+* **generation** — Time range configuration (days, step_seconds).
+* **log_types** — List of log type definitions. Each entry has **type** (identifier and value in output), unit, description, qty, price, groupby, and metadata. The **groupby** dict typically includes dimension keys (e.g. id, user_id, project_id, tenant_id); the generator merges **date_fields** into groupby at run time.
+* **required_fields** — Top-level keys required for each log type (e.g. type, unit, qty, price, groupby, metadata).
+* **date_fields** — Date field names to merge into groupby (week_of_the_year, day_of_the_year, month, year).
+* **loki_stream** — Loki stream configuration (service name).
+
+**groupby.id** should be consistent by metric type across scenario files so that the same type always uses the same id.
 
-* **generation** - Time range configuration (days, step_seconds)
-* **log_types** - List of log type definitions with name, type, unit, qty, price, groupby, and metadata
-* **required_fields** - Fields required for validation
-* **date_fields** - Date fields to add to groupby (week_of_the_year, day_of_the_year, month, year)
-* **loki_stream** - Loki stream configuration (service name)
+Scenario files matching `test_*.yml` in the `files/` directory are automatically discovered and processed. Files whose names start with an underscore are not auto-discovered.
 
 Dependencies
 ------------

diff --git a/roles/telemetry_chargeback/defaults/main.yml b/roles/telemetry_chargeback/defaults/main.yml
@@ -1,2 +1,32 @@
 ---
+# OpenStack CLI command
 openstack_cmd: "openstack"
+
+# Debug mode
+cloudkitty_debug: false
+
+# Directory paths
+logs_dir_zuul: "{{ ansible_env.HOME }}/ci-framework-data/logs"
+artifacts_dir_zuul: "{{ ansible_env.HOME }}/ci-framework-data/artifacts"
+cert_dir: "{{ ansible_user_dir }}/ck-certs"
+local_cert_dir: "{{ ansible_env.HOME }}/ci-framework-data/flush_certs"
+remote_cert_dir: "osp-certs"
+
+# Cloudkitty certificates and secrets
+cert_secret_name: "cert-cloudkitty-client-internal"
+client_secret: "secret/cloudkitty-lokistack-gateway-client-http"
+ca_configmap: "cm/cloudkitty-lokistack-ca-bundle"
+
+# LogQL Query
+logql_query: "{{ loki_query | default('{service=\"cloudkitty\"}') }}"
+
+# OpenShift/Kubernetes settings
+cloudkitty_namespace: "openstack"
+openstackpod: "openstackclient"
+
+# Time window settings
+lookback: 6
+limit: 50
+
+# List of test scenario files to run
+cloudkitty_test_scenarios: []
diff --git a/roles/telemetry_chargeback/files/gen_db_summary.py b/roles/telemetry_chargeback/files/gen_db_summary.py
@@ -119,7 +119,7 @@ def _apply_mutate(qty: float, mutate: str) -> float:
         return math.floor(qty)
     elif mutate_upper == "NUMBOOL":
         # If qty equals 0, leave it at 0. Else, set it to 1.
-        return 0.0 if qty == 0 else 1.0
+        return 0.0 if abs(qty) < 1e-9 else 1.0
     elif mutate_upper == "NOTNUMBOOL":
         # If qty equals 0, set it to 1. Else, set it to 0.
         return 1.0 if qty == 0 else 0.0
@@ -175,8 +175,9 @@ def _parse_numeric(value: Any, default: float = 0) -> float:
 
 def aggregate_rates_by_type(
     pairs: list[tuple[str, str]],
-) -> tuple[dict, float]:
-    sums: defaultdict[str, float] = defaultdict(float)
+) -> tuple[dict, float, dict]:
+    rate_sums: defaultdict[str, float] = defaultdict(float)
+    qty_sums: defaultdict[str, float] = defaultdict(float)
     for _, log_str in pairs:
         try:
             entry = json.loads(log_str)
@@ -196,17 +197,26 @@ def aggregate_rates_by_type(
         except (TypeError, ValueError):
             continue
 
-        # Apply mutate transformation
+        # Track raw qty sum (before any transformation)
+        qty_sums[mtype] += qty
+
+        # Apply mutate transformation for rating calculation
         qty_mutated = _apply_mutate(qty, mutate)
 
         # Apply factor and offset
         qty_rate = qty_mutated * factor + offset
 
         # Calculate rate
-        sums[mtype] += qty_rate * price
-    by_types = {k: {"Rate": round(v, 4)} for k, v in sorted(sums.items())}
-    total = sum(sums.values())
-    return by_types, total
+        rate_sums[mtype] += qty_rate * price
+
+    by_types = {
+        k: {"Rate": round(v, 4)} for k, v in sorted(rate_sums.items())
+    }
+    qty_by_types = {
+        k: {"qty_sum": round(v, 4)} for k, v in sorted(qty_sums.items())
+    }
+    total = sum(rate_sums.values())
+    return by_types, total, qty_by_types
 
 
 def build_summary(pairs: list[tuple[str, str]]) -> dict[str, Any]:
@@ -237,17 +247,35 @@ def build_summary(pairs: list[tuple[str, str]]) -> dict[str, Any]:
         empty = {"nanosec": None, "begin": None, "end": None}
         time_block = {"begin_step": empty.copy(), "end_step": empty.copy()}
 
-    by_types, total_r = aggregate_rates_by_type(pairs)
+    # Get aggregated data by type
+    by_types, total_r, qty_by_types = aggregate_rates_by_type(pairs)
+
+    # Get overall time range for by_type entries
+    begin_time = first.get("start") if pairs else None
+    end_time = last.get("end") if pairs else None
+
+    # Build flat list of entries
+    rate_list = []
+    for type_name in sorted(by_types.keys()):
+        entry = {
+            "Begin": begin_time,
+            "End": end_time,
+            "Qty": qty_by_types.get(type_name, {}).get("qty_sum", 0.0),
+            "Rate": by_types[type_name]["Rate"],
+            "Type": type_name,
+        }
+        rate_list.append(entry)
+
     return {
         "time": time_block,
-        "data_log": {
+        "data_summary": {
             "total_timesteps": n_ts,
             "metrics_per_step": mps,
             "log_count": log_count,
+            "total_rating": round(total_r, 4),
         },
-        "rate": {
-            "by_types": by_types,
-            "total": {"Rating": round(total_r, 4)},
+        "by_type": {
+            "rate": rate_list,
         },
     }
 
@@ -267,7 +295,8 @@ def write_yaml(path: Path, doc: dict[str, Any]) -> None:
 def main() -> None:
     parser = argparse.ArgumentParser(
         description=(
-            "Summarize Loki JSON log entries to YAML (time, data_log, rate)."
+            "Summarize Loki JSON log entries to YAML "
+            "(time, data_summary, by_type)."
         ),
     )
     parser.add_argument(
@@ -310,7 +339,7 @@ def main() -> None:
     doc = build_summary(pairs)
     write_yaml(out_path, doc)
 
-    if doc["data_log"]["metrics_per_step"] == "ERROR":
+    if doc["data_summary"]["metrics_per_step"] == "ERROR":
         per_ts = Counter(ts for ts, _ in pairs)
         exp = next(iter(per_ts.values()), 0)
         for ts in sorted(per_ts, key=int):