Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to propose separately, since it is unrelated to this change and needs some additional review/discussion.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed in #353 against this branch

Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
*.pyc
.idea/
.ansible/
1 change: 1 addition & 0 deletions roles/telemetry_chargeback/.gitignore
Comment thread
ayefimov-1 marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.ansible/
105 changes: 85 additions & 20 deletions roles/telemetry_chargeback/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
telemetry_chargeback
=========

The **`telemetry_chargeback`** role is designed to test the **RHOSO Cloudkitty** feature. These tests are specific to the Cloudkitty feature. Tests that are not specific to this feature (e.g., standard OpenStack deployment validation, basic networking) should be added to a common role.

The role performs two main functions:

1. **CloudKitty Validation** - Enables and configures the CloudKitty hashmap rating module, then validates its state.
2. **Synthetic Data Generation** - Generates synthetic Loki log data for testing chargeback scenarios using a Python script and Jinja2 template.
2. **Synthetic Data Generation & Analysis** - Generates synthetic Loki log data for testing chargeback scenarios and calculates metric totals. The role automatically discovers and processes all scenario files matching `test_*.yml` in the `files/` directory. For each scenario it runs: generate synthetic data, compute syn-totals, ingest to Loki, flush Loki ingester memory, and get cost via CloudKitty rating summary (using begin/end from syn-totals). Retrieve-from-Loki is included in the load_loki_data flow. After all scenarios, the role runs cleanup (`cleanup_ck.yml`) to remove the local flush cert directory.

Requirements
------------
Expand All @@ -15,48 +15,113 @@ It relies on the following being available on the target or control host:
* The **OpenStack CLI client** must be installed and configured with administrative credentials.
* Required Python libraries for the `openstack` CLI (e.g., `python3-openstackclient`).
* Connectivity to the OpenStack API endpoint.
* **Python 3** with the following libraries for synthetic data generation:
* **Python 3** with the following libraries for synthetic data generation and analysis:
* `PyYAML`
* `Jinja2`

It is expected to be run **after** a successful deployment and configuration of the following components:

* **OpenStack:** A functional OpenStack cloud (RHOSO) environment.
* **Cloudkitty:** The Cloudkitty service must be installed, configured, and running.
* **Loki / OpenShift (for ingest and flush):** When using ingest and flush tasks, the control host must have `oc` CLI access, and the Cloudkitty Loki stack (route, certificates, ingester) must be deployed. The role sets Loki push/query URLs and extracts certificates via `setup_loki_env.yml`.

Role Variables
--------------
The role uses the following variables to control the testing environment and execution.

### User-Configurable Variables (defaults/main.yml)

These variables can be overridden when importing the role or set at the play level. Users can customize these based on their deployment environment and test requirements.

| Variable | Default Value | Description |
|----------|---------------|-------------|
| `openstack_cmd` | `openstack` | The command used to execute OpenStack CLI calls. This can be customized if the binary is not in the standard PATH. |
| `cloudkitty_debug` | `false` | Enable debug mode for the role. |
| `logs_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/logs` | Directory for log files. |
| `artifacts_dir_zuul` | `{{ ansible_env.HOME }}/ci-framework-data/artifacts` | Directory for generated artifacts. |
| `cert_dir` | `{{ ansible_user_dir }}/ck-certs` | Local directory for extracted ingest/query certs. |
| `local_cert_dir` | `{{ ansible_env.HOME }}/ci-framework-data/flush_certs` | Local directory for flush certs (removed by cleanup_ck.yml after the run). |
| `remote_cert_dir` | `osp-certs` | Directory inside the OpenStack pod for certs. |
| `cert_secret_name` | `cert-cloudkitty-client-internal` | OpenShift secret name for client certificates. |
| `client_secret` | `secret/cloudkitty-lokistack-gateway-client-http` | Secret for flush client certs. |
| `ca_configmap` | `cm/cloudkitty-lokistack-ca-bundle` | ConfigMap for CA bundle. |
| `logql_query` | `{service="cloudkitty"}` (overridable via `loki_query`) | LogQL query for Loki. |
| `cloudkitty_namespace` | `openstack` | OpenShift namespace for Cloudkitty/Loki resources. |
| `openstackpod` | `openstackclient` | OpenStack client pod name for exec/cp. |
| `lookback` | `6` | Days lookback for Loki query time range. |
| `limit` | `50` | Limit for Loki query results. |

**Example: Overriding variables when importing the role**
```yaml
- name: "Run chargeback tests"
ansible.builtin.import_role:
name: telemetry_chargeback
vars:
cloudkitty_namespace: "my-custom-namespace"
lookback: 10
cloudkitty_debug: true
```

### Internal Variables (vars/main.yml)
### Synthetic Data Scripts

These variables are used internally by the role and typically do not need to be modified.
**gen_synth_loki_data.py** — Generates Loki-format JSON from a scenario YAML and template. The role invokes it with `-r` so that timestamps in the output are in **reverse** order (youngest first, oldest last). When run manually you can omit `-r` for chronological order (oldest first, youngest last).

| Variable | Default Value | Description |
|----------|---------------|-------------|
| `logs_dir_zuul` | `/home/zuul/ci-framework-data/logs` | Remote directory for log files. |
| `artifacts_dir_zuul` | `/home/zuul/ci-framework-data/artifacts` | Directory for generated artifacts. |
| `cloudkitty_synth_script` | `{{ role_path }}/files/gen_synth_loki_data.py` | Path to the synthetic data generation script. |
| `cloudkitty_data_template` | `{{ role_path }}/templates/loki_data_templ.j2` | Path to the Jinja2 template for Loki data format. |
| `ck_data_config` | `{{ role_path }}/files/test_static.yml` | Path to the scenario configuration file. |
| `ck_output_file_local` | `{{ artifacts_dir_zuul }}/loki_synth_data.json` | Local path for generated synthetic data. |
| `ck_output_file_remote` | `{{ logs_dir_zuul }}/gen_loki_synth_data.log` | Remote destination for synthetic data. |
| Option | Description |
|--------|--------------|
| `--tmpl` | Path to the Jinja2 template (e.g. `loki_data_templ.j2`). |
| `-t`, `--test` | Path to the scenario YAML (e.g. `test_dyn_basic.yml`). |
| `-o`, `--output` | Path to the output JSON file. |
| `-p`, `--project-id` | Optional; overrides `groupby.project_id` in every log entry. |
| `-u`, `--user-id` | Optional; overrides `groupby.user_id` in every log entry. |
Comment thread
ayefimov-1 marked this conversation as resolved.
| `-r`, `--reverse` | Reverse timestamp order in JSON output (youngest first, oldest last). |
| `--debug` | Enable debug logging. |

**gen_db_summary.py** — Parses Loki-style JSON (streams or `data.result`), sorts entries by timestamp, and writes a YAML summary. This script is invoked by the role for **both** synthetic totals (in `gen_synth_loki_data.yml`) and Loki-retrieved totals (in `retrieve_loki_data.yml`). It applies rate calculations with support for `factor`, `offset`, and `mutate` transformations.

| Option | Description |
|--------|--------------|
| `-j`, `--json` | Path to the input JSON file (required). |
| `-o`, `--output` | Path to the output YAML file (default: `<input_stem>_total.yml`). |
| `--debug` | Directory to write debug output (`<stem>_diff.txt` with one `[ts,log]` JSON per line). |

Output YAML structure:

* **time** — `begin_step` / `end_step`, each with `nanosec` (nanosecond timestamp), `begin`, `end` (ISO window strings from the log payload). The `nanosec` values are used for Loki query time range in `retrieve_loki_data.yml`.
* **data_log** — `total_timesteps`, `metrics_per_step`, `log_count`.
* **rate** — `by_types` (per-type `Rate` calculated as `Σ((qty_mutated * factor + offset) * price)`) and `total.Rating` (sum of all rates).

### Dynamically Set Variables

Set in **main.yml** from the OpenStack CLI (`openstack project show admin` / `openstack user show admin`):

| Variable | Description |
|----------|-------------|
| `cloudkitty_project_id` | ID of the OpenStack project named `admin` (empty string if not found). Passed as `-p` to the synthetic data generator when non-empty. |
| `cloudkitty_user_id` | ID of the OpenStack user named `admin` (empty string if not found). Passed as `-u` to the synthetic data generator when non-empty. |

Set in **gen_synth_loki_data.yml** for each scenario file during the loop:

| Variable | Description |
|----------|-------------|
| `cloudkitty_data_file` | Local path for generated JSON data (`{{ artifacts_dir_zuul }}/{{ scenario_name }}-synth_data.json`) |
| `cloudkitty_synth_totals_file` | Local path for calculated metric totals (`{{ artifacts_dir_zuul }}/{{ scenario_name }}-synth_metrics_summary.yml`) |
| `cloudkitty_test_file` | Path to the scenario configuration file (`{{ role_path }}/files/{{ scenario_name }}.yml`) |

Scenario Configuration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the scenario generation should be in docs/ IMHO, not in README, which is a high level overview of the role.

----------------------
The synthetic data generation is controlled by a YAML configuration file (`files/test_static.yml`). This file defines:
The synthetic data generation is controlled by YAML configuration files in the `files/` directory. Any file matching `test_*.yml` will be automatically discovered and processed. Files whose names start with an underscore (e.g. `_test_*.yml`) are **not** discovered by the role; they can be used as reference or for manual runs.

Each scenario file defines:

* **generation** — Time range configuration (days, step_seconds).
* **log_types** — List of log type definitions. Each entry has **type** (identifier and value in output), unit, description, qty, price, groupby, and metadata. The **groupby** dict typically includes dimension keys (e.g. id, user_id, project_id, tenant_id); the generator merges **date_fields** into groupby at run time.
Comment thread
ayefimov-1 marked this conversation as resolved.
* **required_fields** — Top-level keys required for each log type (e.g. type, unit, qty, price, groupby, metadata).
* **date_fields** — Date field names to merge into groupby (week_of_the_year, day_of_the_year, month, year).
* **loki_stream** — Loki stream configuration (service name).

**groupby.id** should be consistent by metric type across scenario files so that the same type always uses the same id.

* **generation** - Time range configuration (days, step_seconds)
* **log_types** - List of log type definitions with name, type, unit, qty, price, groupby, and metadata
* **required_fields** - Fields required for validation
* **date_fields** - Date fields to add to groupby (week_of_the_year, day_of_the_year, month, year)
* **loki_stream** - Loki stream configuration (service name)
Scenario files matching `test_*.yml` in the `files/` directory are automatically discovered and processed. Files whose names start with an underscore are not auto-discovered.

Dependencies
------------
Expand Down
30 changes: 30 additions & 0 deletions roles/telemetry_chargeback/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,32 @@
---
# OpenStack CLI command
openstack_cmd: "openstack"

# Debug mode
cloudkitty_debug: false

# Directory paths
logs_dir_zuul: "{{ ansible_env.HOME }}/ci-framework-data/logs"
artifacts_dir_zuul: "{{ ansible_env.HOME }}/ci-framework-data/artifacts"
cert_dir: "{{ ansible_user_dir }}/ck-certs"
local_cert_dir: "{{ ansible_env.HOME }}/ci-framework-data/flush_certs"
remote_cert_dir: "osp-certs"

# Cloudkitty certificates and secrets
cert_secret_name: "cert-cloudkitty-client-internal"
client_secret: "secret/cloudkitty-lokistack-gateway-client-http"
ca_configmap: "cm/cloudkitty-lokistack-ca-bundle"

# LogQL Query
logql_query: "{{ loki_query | default('{service=\"cloudkitty\"}') }}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can skip defining this, and set the default for loki_query here instead.
I would examine where where both these are used, and consider if they can be consolidated into one change


# OpenShift/Kubernetes settings
cloudkitty_namespace: "openstack"
openstackpod: "openstackclient"

# Time window settings
lookback: 6
limit: 50

# List of test scenario files to run
cloudkitty_test_scenarios: []
59 changes: 44 additions & 15 deletions roles/telemetry_chargeback/files/gen_db_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def _apply_mutate(qty: float, mutate: str) -> float:
return math.floor(qty)
elif mutate_upper == "NUMBOOL":
# If qty equals 0, leave it at 0. Else, set it to 1.
return 0.0 if qty == 0 else 1.0
return 0.0 if abs(qty) < 1e-9 else 1.0
elif mutate_upper == "NOTNUMBOOL":
# If qty equals 0, set it to 1. Else, set it to 0.
return 1.0 if qty == 0 else 0.0
Expand Down Expand Up @@ -175,8 +175,9 @@ def _parse_numeric(value: Any, default: float = 0) -> float:

def aggregate_rates_by_type(
pairs: list[tuple[str, str]],
) -> tuple[dict, float]:
sums: defaultdict[str, float] = defaultdict(float)
) -> tuple[dict, float, dict]:
rate_sums: defaultdict[str, float] = defaultdict(float)
qty_sums: defaultdict[str, float] = defaultdict(float)
for _, log_str in pairs:
try:
entry = json.loads(log_str)
Expand All @@ -196,17 +197,26 @@ def aggregate_rates_by_type(
except (TypeError, ValueError):
continue

# Apply mutate transformation
# Track raw qty sum (before any transformation)
qty_sums[mtype] += qty

# Apply mutate transformation for rating calculation
qty_mutated = _apply_mutate(qty, mutate)

# Apply factor and offset
qty_rate = qty_mutated * factor + offset

# Calculate rate
sums[mtype] += qty_rate * price
by_types = {k: {"Rate": round(v, 4)} for k, v in sorted(sums.items())}
total = sum(sums.values())
return by_types, total
rate_sums[mtype] += qty_rate * price

by_types = {
k: {"Rate": round(v, 4)} for k, v in sorted(rate_sums.items())
}
qty_by_types = {
k: {"qty_sum": round(v, 4)} for k, v in sorted(qty_sums.items())
}
total = sum(rate_sums.values())
return by_types, total, qty_by_types


def build_summary(pairs: list[tuple[str, str]]) -> dict[str, Any]:
Expand Down Expand Up @@ -237,17 +247,35 @@ def build_summary(pairs: list[tuple[str, str]]) -> dict[str, Any]:
empty = {"nanosec": None, "begin": None, "end": None}
time_block = {"begin_step": empty.copy(), "end_step": empty.copy()}

by_types, total_r = aggregate_rates_by_type(pairs)
# Get aggregated data by type
by_types, total_r, qty_by_types = aggregate_rates_by_type(pairs)

# Get overall time range for by_type entries
begin_time = first.get("start") if pairs else None
end_time = last.get("end") if pairs else None

# Build flat list of entries
rate_list = []
for type_name in sorted(by_types.keys()):
entry = {
"Begin": begin_time,
"End": end_time,
"Qty": qty_by_types.get(type_name, {}).get("qty_sum", 0.0),
"Rate": by_types[type_name]["Rate"],
"Type": type_name,
}
rate_list.append(entry)

return {
"time": time_block,
"data_log": {
"data_summary": {
"total_timesteps": n_ts,
"metrics_per_step": mps,
"log_count": log_count,
"total_rating": round(total_r, 4),
},
"rate": {
"by_types": by_types,
"total": {"Rating": round(total_r, 4)},
"by_type": {
"rate": rate_list,
},
}

Expand All @@ -267,7 +295,8 @@ def write_yaml(path: Path, doc: dict[str, Any]) -> None:
def main() -> None:
parser = argparse.ArgumentParser(
description=(
"Summarize Loki JSON log entries to YAML (time, data_log, rate)."
"Summarize Loki JSON log entries to YAML "
"(time, data_summary, by_type)."
),
)
parser.add_argument(
Expand Down Expand Up @@ -310,7 +339,7 @@ def main() -> None:
doc = build_summary(pairs)
write_yaml(out_path, doc)

if doc["data_log"]["metrics_per_step"] == "ERROR":
if doc["data_summary"]["metrics_per_step"] == "ERROR":
per_ts = Counter(ts for ts, _ in pairs)
exp = next(iter(per_ts.values()), 0)
for ts in sorted(per_ts, key=int):
Expand Down
Loading
Loading